The Achilles Heel of LRU Caches

Ever since we released LMDB, our advice to software designers has been "don't do application level caching. Let the kernel handle it for you." This advice is based on over a dozen years experience, wrestling with cache management in previous releases of OpenLDAP, BerkeleyDB, and other related systems. But there was one lesson we learned in those dozen years that isn't always covered in depth, and is a persistent problem in cache management.

The problem has to do with full DB scans, or any query where the working set is larger than the size of the cache. These can be pretty common in an LDAP search, if administrative limits haven't been set to prevent them. With a straightforward Least Recently Used cache replacement strategy, new records are placed into the cache and when the cache is full, the oldest records are purged to make room for new records. But if a single query touches more records than can fit into the cache, the cache is rendered useless.

For example, assume a cache size of 3 records. If a query comes in that touches 3 records or less, the cache works well. E.g., with a query for 3 records A, B, and C, the cache contents would just be [A] [B] [C]. If a subsequent query asks for any of A, B, or C, the query can immediately be served from the cache.

But if the query touches more than 3 records, the cache is neutralized, useless. E.g. with a query for 4 records A, B, C, and D, the cache contents would first fill in with [A] [B] [C] as before. But when the record for D is processed, it will evict the oldest entry [A], to insert [D]. The cache then contains [B] [C] [D]. Any subsequent query for A, B, C, and D will perform as if there is no cache at all. The request for the first record A will not hit the cache, and A will enter the cache, making its contents [C] [D] [A]. Then the next record B will be processed, bumping the cache again [D] [A] [B]. Then the next record C will be processed similarly, making the cache [A] [B] [C], and finally the record D will be processed, making the cache [B] [C] [D] again. For this query, none of the 4 requested records could be served from the cache. The cache effectiveness is zero and all of the memory consumed by the cache is essentially wasted.

There are many complex cache management algorithms out there that attempt to address this scenario, and somehow preserve the cache's effectiveness in the face of these pathological queries. We explored a number of these in OpenLDAP back in 2007, but found they were all too heavyweight to be practical. The solution we came up with was actually quite simple: when we see that the current query will process more records than fit in the cache, we simply stop putting records into the cache.

So for the 3-record cache example, and a query that touches 4 records, the processing handles the first 3 records as usual, filling the cache with [A] [B] [C] as before. Then when we process the 4th record D, we send it to the client but then free it immediately, instead of pushing it into the cache. So on a subsequent query, we can still serve records A, B, and C directly from the cache at basically zero cost. For those three records, the cache effectiveness is 100%.

Note that this problem remains the same no matter how large a cache you configure. For a 1,000,000 entry cache, if a query touched 1,000,001 or more records it would invalidate the entire cache, using plain LRU.

With LMDB, OpenLDAP no longer maintains any application level entry cache, so its performance depends entirely on the cache management of the underlying kernel, and the lesson we learned that was described above is just a bit of no longer useful information. But for situations where you're still unfortunate enough to be managing your own caches, it's worth knowing. It is indeed one example of how the higher level application can know more about the workload than the low level system, and can thus manage the cache better itself.