The current release of HANA requires that all of the data required to satisfy a query be in-memory to run the query. Let’s think about what this means:
HANA compresses tables into bitmap vectors… and then compresses the vectors on write to reduce disk I/O. Disk I/O with HANA? Yup.
Once this formatting is complete all tables and partitions are persisted to disk… and if there are updates to the tables then logs are written to maintain ACIDity and at some interval, the changed data is persisted asynchronously as blocks to disk. When HANA cold starts no data is in-memory. There are options to pre-load data at start-up… but the default is to load data as it is used.
When the first query begins execution the data required to satisfy the query is moved into memory and decompressed into vectors. Note that the vector format is still highly compressed and the execution engine operates on this compressed vector data. Also, partition elimination occurs during this data move… so only the partitions required are loaded. The remaining data is on disk until required.
Let us imagine that after several queries all of the available memory is consumed… but there is still user data out-of-memory on peripheral storage… and a new query is submitted that requires this data. At this point HANA frees enough storage to satisfy the new query and processes it. Note that, in the usual DW case (write-once/read-many), the data flushed from memory does not need to be written back… the data is already persisted… otherwise HANA will flush any unwritten changed blocks…
If a query is submitted that performs a cartesian product… or that requires all of the data in the warehouse at once… in other words where there is not enough memory to fit all of the vectors in memory even after flushing everything else out… the query fails. It is my understanding that this constraint will be fixed in a next release and data will stream into memory and be processed in-stream instead of in-whole. Note that in other databases a query that consumes all of the available memory may never complete, or will seriously affect all other running queries, or will lock the system… so the HANA approach is not all bad… but as noted there is room for improvement and the constraint is real.
This note should remove several silly arguments leveled by HANA’s competitors:
- HANA, and most in-memory databases, offer full ACID-compliance. A system failure does not result in lost data.
- HANA supports more data than will fit in-memory and it pages data in-and-out in a smart fashion based on utilization. It is not constrained to only data that fits in-memory.
- HANA is not useless when it runs out of memory. HANA has a constraint when there is more data than memory… it does not crash the system… but lets be real… if you page data to disk and run out of disk you are in trouble… and we’ve all seen our DBMS‘s hit this wall. If you have an in-memory DBMS then you need to have enough memory to support your workload… if you have a DB2 system you better not run out of temp space or log space on disk… if you have Teradata you better not run out of spool space.
I apologize… there is no public reference I know of to support the features I described. It is available to HANA customers in the HANA Blue Book. It is my understanding that a public version of the Blue Book is being developed.
3 thoughts on “HANA Memory Utilization”
I have a few differences in understanding, which probably emphasizes the need for better public information.
First, yes HANA decompresses only when required, which is not for example require for a count. Thus counts on a column are particularly fast. Exadata has to decompress blocks, which is much less efficient. I suspect BLU and HANA are architected similarly here.
Preload is more subtle than you suggest. Row store tables are always preloaded and cannot be unloaded. These usually only reside on the master host and are small so the practical implication of this is minor.
For column tables, most HANA apps define preload conditions for various tables, e.g. BW or ERP. But you talk about partition elimination and I think you misunderstand how that works. Preload conditions (see page 151 of the SP06 developer guide) are defined on columns and when you access a column, it loads all partitions of a column into memory. Partition elimination occurs agains data residing in-memory and partitions that are not in the query semantics are never scanned.
Thus in practice, the working set of columns used in HANA is required to fit into half the available memory. It is possible of course to partition prune using logical partitions of physical tables and this is exactly how BW achieves this with SPOs.
I think most of this is is moot point because IBM are specifically going after the <10TB Oracle data mart market with BLU. They go with Netezza for bigger instances. This is because BLU does not scale.
For those solutions we can generally easily fit them in a 1-2TB HANA appliance which is cost-effective.
Thanks for the clarification, John.
I was thinking only about columns, not about rows, and should have pointed that out. I actually double-checked with a couple of associates the point about partition elimination on load… So let me triple-check and then I’ll update the post…
Apologies for getting this wrong – here’s the real deal as per my tests!
I’ve just run a definitive set of tests on this and I can tell you exactly how HANA behaves. First, HANA pre-loads by partition and not by column. So, if you define partitions in HANA (and you should, for several reasons) then HANA will load only those partitions that are required e.g. by year, customer group etc. That means that historic data won’t be loaded if it’s not queried.
Second, HANA dumps partitions out of memory on a least-recently-used basis, once its reaches about 80% of RAM used. In a HANA system you should have all your hot data fitting into 50% of your main memory, so this should never be a problem.
Third, you can load and unload data manually, but only down to a column or table. This is mostly for the purposes of dumping data out of memory, for instance in a data warehouse where you want to load data into HANA, process it but then it isn’t needed. SAP BW uses this extensively, for example, for objects which are marked as “inactive”.
Comments are closed.