A Quick Five Minute Rule Update for In-memory Databases

6/26/2014: I fixed the calculation… I had an error in my spreadsheet. Sorry. The 2012 break-even point is 217 minutes. – Rob

Following on to my blog on the Five Minute Rule and in-memory databases here I decided to quickly and informally recalculate the 4KB break-even point based on current technology (rather than use the 2007 numbers) The results are as follows:

  • A 1TB SATA Disk with 4.2ms average latency and 126MB/s max transfer rate costs $100 here
  • A 4GB DDR3 ECC memory card costs $33 here (I picked fairly expensive ECC memory… I could have gone with the $18 average price mentioned here)
  • Apply the Gray/Putzolu formula: Break Even Interval = (Pages per MB of RAM/Accesses per Second per Disk) * ($ per Disk Drive/$ per MB of RAM)

And we find that today the break-even point for a 4KB block of data is 217 minutes…

Again… this means that for any 4KB block of data… or for any database table where there are 4KB blocks that are touched… within a 3+ hour window it is more cost-effective to keep the data in-memory than to move it back and forth from disk. If the data is compressed the duration increases with the compression so that a table with 2X compression should reside in-memory if accessed on the average every 110 minutes.

The Five Minute Rule and In-memory Databases

I was recently reminded of a couple of papers written by Jim Gray and Gianfranco Putzolu  that calculated the cost of keeping data in memory vs the cost of paging it in from disk. I was happy to see that the thread was being kept alive by Goetz Graefe.

These papers used the cost of each media to determine how “hot” data needed to be to be cost-effectively stored in-memory. The 1987  five minute rule (click here to reference the original papers) was so named because at that time and based on the relative costs of CPU, Memory, and Disk; a 1KB  record that was accessed every five minutes could be effectively stored in memory and a 4KB block of data broke-even at two minutes.

In 2009, with CPU prices coming down but the number of instructions executed per second going up, and with memory and prices down, the break-even point between keeping 4KB in memory or on a SATA disk was 90 minutes.

Let’s be clear about what this means. Based solely on the cost of CPUs, RAM, and SATA drives; any data that is accessed more frequently than each 90 minutes should be kept in memory. This does not include any ROI based on the business benefits of a speedy response. It does not adjust for data compression which allows more than 4KB of user data to use 4KB of RAM. Just pure IT economics gets us to this point.

So… if you have data in a data warehouse or a mart that is touched by a query at least once every 90 minutes… it is wasteful to store it on disk. If you have an in-memory database than can compress the data 2X and use it in its compressed form, then the duration goes up to 180 minutes. You do not have to look any further than this to find the ROI for an in-memory data base (IMDB).

 

%d bloggers like this: