Tuesday, April 24, 2012

Memcache vs Database for simple selects

Will using memcache for caching database query results of simple single record selects improve performance?

Let's lay down the facts before we make any judgement:
  • Memcache is a fast, out of process, in-memory cache that can be used to store and retrieve values based on keys.
  • Database also has its own implementation of query and result caches.
  • Both memcache and the database calls involve IPC and data serialization.
  • Database additionally has to provide data consistency when updates happen to the cached records.
Both look to be very similar, except that database may have an overhead of checks for data consistency. Using memcache will be beneficial only if that overhead is substantial. Note that we are considering only "simple selects". Joins, updates, index searches can not be applied to this comparison.

Here is a set of test programs to measure performance of memcache vs. database. The sources should be self explanatory. Essentially, we create 150000 records in the database, each with 11 CHAR columns of 32 bytes each. Similarly, we populate a memcache instance with 150000 records of same structure. For memcache out data is a direct memcpy of the C structure holding the data. Then we fetch the records multiple times and measure time taken for both cases (database and memcache).

With a properly sized database (MySQL with sufficient query cache size to hold all records) and memcache (enough memory to hold all records), here are the results:


MEMCACHE:
real 2m14.924s
user 0m17.373s
sys 0m43.588s

DB:
real 2m7.675s
user 0m23.906s
sys 0m26.638s

This is a very crude measurement. There are quite a few factors that may be different in a real life scenario and affecting performance. E.g. when the database is located on the same machine, the DB client library may be using a faster IPC mechanism than memcache that was on TCP/IP. However, all such influences may not be substantial. A well tuned database can in-fact be better performing than memcache for simple record fetches. Joins, updates and index searches can however require substantial database processing, and may be beneficial to cache.

Memcache is meant to be used for storing processed data, or data that is difficult to fetch (e.g. remote APIs, file reads etc.). Using it for caching data that is easy to fetch, or is already cached elsewhere is just unnecessary and adds to overheads.

No comments: