The Ultimate Caching Definition: Invalidation, Optimization, and Layers
If you’re a software engineer, you’ve probably heard of caching before. It makes things faster, right? But what exactly is it and what role should it play in your life?
Let’s start with a definition:
A cache is a non-authoritative representation of data maintained for performance.
That’s a mouthful! In this post, we’ll cover the basics as well as all the complex implications of this seemingly-simple concept. Let’s unpack this definition a fragment at a time:
“Non-authoritative”
Let’s say your app has some users (good for you, by the way) and those users have usernames that are displayed when they log in.
When Fred signs up for your app and chooses the username discgolfking93
, how should we keep track of that? Your natural inclination is likely to store it somewhere like a Postgres database:
UPDATE USERS set username = 'discgolfking93' WHERE id = 2743
By saving Fred’s username this way, we’ve made this database the authority for this value. And, in practice, there is only a single authority for any value. Consider, we don’t write Fred’s username to both Postgres and MySQL. If we did, one write could succeed while the other fails. How would we resolve the resultant disagreement?
Meanwhile, a cache is a second location, but we use it for non-authoritative reasons—almost always to increase performance and/or scalability.
Revisiting our app and “discgolfking93,” let’s assume our Postgres database is running on ancient server hardware, and so SELECT
queries are really slow. It’s taking too long to render Fred’s username in the app! In this case, we might cache a second copy of Fred’s username in Redis, which has super speedy read performance, and our app speeds up.
“Performance”
Why is Redis so magically fast? Unlike our database, it doesn’t store its values on disk. It keeps all its values in RAM. And everyone knows RAM is far faster than disks!
You probably know, however, that if our server crashes, all the values in RAM are lost. But this is acceptable because those values are non-authoritative. Since the cached values are derived from the database, when our server restarts, we can just reload all the usernames from the database back into Redis.
This is one common way caches make things faster: utilize faster classes of hardware. However, caches often provide performance benefits in other ways. Consider your web browser’s cache—it’s stored on disk, too. There’s no guarantee that your laptop’s disk is any faster than the one in the server running any given website! So why does your browser use it?
In this case, the server’s hardware speed isn’t the issue. What matters is the physical distance to the server and the bandwidth available between you and it. It takes some time to connect to that server and to transmit the site’s content to your browser—perhaps 500 milliseconds or more. But your laptop’s SSD can read the content from the browser’s on-disk cache in just a few milliseconds.
“Representation”
While “representation” might sound a little vague, it alludes to another performance benefit caches sometimes give us: reducing CPU time.
Let’s pretend that our app’s database storage is infinitely fast, and the server is right in our garage. Surely, caches are needless now? Not necessarily, because caches can also store a transformation derived from a combination of many authoritative values, thereby improving performance by eliminating redundant computation.
For example, imagine our app’s profile page summarizes our users’ recent activity. All their recent posts, how many days they have a “streak” of engaging with our app, their most upvoted comments, and so on. For a feature like this, we often need to fetch lots of data from our database and perform combinations, like JOIN
ing posts to comments using SQL. And we might have expensive calculations, like finding the maximum value out of a large series of numbers.
Since this profile page always depends on the same combined and calculated data, instead of directly caching individual database records, we can simply cache the aggregated/combined values as a representation of the underlying authoritative records. Then, loading discgolfking93’s profile page over and over again will be speedier and cheaper without the need for faster hardware or faster networks.
“Maintain”
Unfortunately, caches are not just set-it-and-forget-it. Because they’re derived from authoritative data, cached values need to be updated when that authoritative data changes. Otherwise, the cached values the app uses will eventually be incorrect. This need is called “cache invalidation”: updating cached data when they are outdated.
To examine cache invalidation fully, we’ll first need to reconsider a familiar concept: time. In computing, what does time mean, anyway, man? Very little, it ends up.
Because very seldom do computers give us a guarantee that two or more values appear to change at exactly the same time. The formal name for this guarantee is atomicity. Commonly, we only encounter atomicity in ACID database transactions, with database systems carefully engineered to provide it.
But we’re talking about caching here, and there is basically never any atomicity between a cache and the authoritative store behind it! So the cached value is updated… later.
“Later” can mean anything, but shorter is generally better. Suppose that discgolfking93 receives a new direct message in our app, and we’re utilizing a cache to speed up the rendering of his most recent messages. Ideally, he’d like to see this new message in the app as soon as possible so he can respond to his friend in a timely manner.
So, stating this more formally, caches are only useful in practice when cached values reintegrate changing authoritative values over time. And ideally, we minimize this time.
How do we make that happen? Here are two of the most common strategies:
1. The cache polls for new values
The simplest approach is for the cache to invalidate the value periodically and speculatively check the authoritative store to see if there’s something new. This approach is generally called “polling” in computing.
For example, HTTP uses cache-control
headers to tell browsers how long to wait before checking if an updated resource is behind a URL. Effectively, this server-provided cache policy establishes the polling interval, and your browser will use its cached value for this amount of time. Then, the next request for this resource triggers the browser to “phone home” to the server and get an updated response. It’s possible no change happened, and the resource is the same—but the browser checks just in case.
A huge benefit of periodic polling is that it’s really simple! However, it isn’t particularly effective at getting new derived values into caches in a timely fashion, and stale values may be used for a long time.
It’s also resource inefficient. To minimize how stale values might be, it’s tempting to choose a short polling interval. But this can result in many needless requests to the authoritative store when the cached value is still perfectly valid.
2. The authoritative store pushes new values
In a push invalidation strategy, the authoritative store, like the database in our app—which obviously knows when its values have changed—tells the cache when the change happens!
This design ensures the cache serves updated values as soon as possible. However, it requires a more complex protocol between the cache and the authoritative store and a persistent communication channel so the update can be propagated.
A well-known example of push-based invalidation is found in the use of React, when we combine useState
and useMemo
. When useMemo
creates derived values from useState
-managed data, the React framework ensures that any changes to the latter automatically trigger a recalculation in the former.
Isn’t cache invalidation one of the “hard things”?
You may have heard the famous quip:
There are only two hard things in Computer Science: cache invalidation, naming things, and off by one errors
While this article won’t help you very much with topic #2 or topic #NaN, we can definitely explore what’s challenging about cache invalidation.
Thundering herds
Clearly, a cache exists to provide a performance benefit, and this benefit often enables a higher scale than possible without the caching. But what actually happens when a cached value is invalidated?
Well, one possibility is a huge number of requests for that cached value come in, far more than the authoritative store can handle. (Otherwise, did you really need a cache?)
So, this thundering herd scenario occurs when a cache invalidation suddenly triggers a ton of requests to pass through to the backing authoritative store. Even if each request intends to update the cache with the new response from this store, you may find the backing store so overloaded by this sudden surge in traffic that it struggles to respond at all!
Strategies to handle this include having only one requestor act as the agent to fetch the new value for all other concurrent and future requests. Then, you can either block concurrent requests or feed them the stale value until the authoritative store returns the new one. (The latter approach you might have heard of: it’s sometimes called stale-while-revalidate).
Even still, the request to the backing store could fail—or worse—stall indefinitely. So, you need to ensure you handle failures and timeouts appropriately and retry until the cache can be updated.
Caching hierarchies
The truth is that there is never only one layer of caching beneath you in computing. Instead, a whole stack of caches is involved in every interaction: browser caches, CDNs, database buffer pools, operating system page caches, storage hardware caches, memory caches, and CPU instruction caches.
The challenging part about such a deep stack is reasoning about the cumulative behavior of all the layers below you. How far away are you truly from the authoritative value? What are the updating semantics of all of the caches between you and it? Do you really have the control you need to ensure freshness? What are the consequences of update delays for your app?
Cache consistency
Finally, a particularly insidious problem is cache consistency.
In our messaging app, let’s imagine messages can be “liked” by those who receive them. Then, in the upper corner of the app, next to users' avatars, we display a miniature scoreboard showing how many total “likes” they’ve accumulated.
Gamification! Engagement! Our app has exploded in popularity, and we once again rely on Redis to cache more fetches from the database. In this case, recent messages, their liked status, and the total number of likes are cached.
Then a weird bug appears: discgolfking93 sends a new, flattering message to their friend birdiesonly97, which is quickly and deservedly “liked.” But while discgolfking93 is pleased to see a heart appear next to their sent message, they’re puzzled that their total like count didn’t increase. WTF?
What happened here is that the cached value for recent-messages-with-hearts updated before the total-hearts count—despite both values updating atomically in the authoritative ACID database. To the app, which used the values in Redis, these values didn’t change at the same time. Therefore, to the user, the app is broken.
This scenario is very common. The synchronization protocol between a cache and its underlying authoritative store is seldom sophisticated enough to capture which cached values should change atomically. And while this is a nasty problem for simple values, it’s an even bigger headache when you’re caching aggregated or transformed values!
More insidious still is when a user or an app takes action in response to these inconsistent values. They may misunderstand the meaning of the inconsistency or seek to reconcile it. Then, you’ve gone from bad to worse and created a kind of butterfly effect that can lead to all manner of confusing contradictions in your databases and caches over time.
What’s a dev to do?
Okay! At this point, we have a pretty solid grasp of caching:
A cache is a non-authoritative representation of data maintained for performance reasons.
We can now whip that definition out at parties and impress our friends. But what about some practical advice for approaching caching on our projects?
- Don’t think about caching until you’ve proven you have no other choice. Until you know you have a performance problem you can fix no other way, don’t worry about caching. Caching is indeed one of the most difficult optimization methods to reason about and test confidently. Less code means fewer bugs!
- If you know you need caching, raise your expectations of the platform/authoritative layer. Because the authoritative layer knows when dependent data changes, it should be expected to do everything possible to provide a protocol encapsulating the semantics of freshness and atomicity to any caches above it. Ideally, it uses that protocol to cache automatically wherever possible so you can go back to Option 1. If your platform doesn’t provide this, consider switching to one that does. Also, if your platform only allows poll-based strategies, it hasn’t actually solved your problem. Periodic invalidation and stale-while-revalidate, are flawed strategies and are mostly broken by design. They can only reliably be used when applications have a very loose expectation of consistency and correctness.
- If you have to own invalidation yourself, don’t think “cache invalidation;” think “data dependencies.” The real key to getting a handle on cache invalidation is creating robust frameworks with rules that express “when record A changes, data B, C, and D are affected.” Then, you need to think through how to maintain this expiration logic over time as your app grows, your team grows, and people forget the relationship between cached data and authoritative data. If you can do all this, then executing invalidation simply involves traversing this dependency graph and invalidating affected cached values as necessary.
Convex is the sync platform with everything you need to build your full-stack project. Cloud functions, a database, file storage, scheduling, search, and realtime updates fit together seamlessly.