Caching

From Jonathan Gardner's Tech Wiki
Jump to: navigation, search

Introduction

People start developing their web app or service, and then discover things are too slow. So they decide to use caching. And then things get really complicated.

Caching is Just Data

Caching isn't a new concept or a concept that is explicitly tied to computer science. It is something we do from the moment we are born. What we keep in our heads is a cache of sorts.

Caching is nice because it saves time. It is much quicker to know that 6 × 9 = 54 than it is to have to look it up in a table or use a calculator to calculate it.

Caching suffers from the same problems that data suffers from.

  • What happens when you cache the wrong value?
  • What happens when you cache the right value, but after a while, it is no longer right anymore?
  • How do you keep things consistent with caches all over the place?

Proposal

  1. Start with PostgreSQL.
  2. Implement Materialized Views.
  3. Get multi-master replication working, along with master-slave replication.
  4. Have PostgreSQL intelligently figure out what to put in a materialized view, and have queries redirect there appropriately.

None of the above is easy. Instead, what people typically do:

  1. Start with MySQL.
  2. Add in Memcached.
  3. Try to scale. Fail miserably or succeed but at a huge cost.

Why Not?

We have algorithms to write code better than humans can, and call them compilers.

Why can't we have algorithms to store data better than humans can? Why can't this technology be ubiquitous? In today's age of cloud computing, it makes no sense that we cannot have structured data stored in the cloud and replicated and cached and materialized views of it all, all at the same time, at a tiny fraction of cost.