Once this formatting is complete all tables and partitions are persisted to disk… and if there are updates to the tables then logs are written to maintain ACIDity and at some interval, the changed data is persisted asynchronously as blocks to disk. When HANA cold starts no data is in-memory. There are options to pre-load data at start-up… but the default is to load data as it is used.
When the first query begins execution the data required to satisfy the query is moved into memory and decompressed into vectors. Note that the vector format is still highly compressed and the execution engine operates on this compressed vector data. Also, partition elimination occurs during this data move… so only the partitions required are loaded. The remaining data is on disk until required.
Let us imagine that after several queries all of the available memory is consumed… but there is still user data out-of-memory on peripheral storage… and a new query is submitted that requires this data. At this point HANA frees enough storage to satisfy the new query and processes it. Note that, in the usual DW case (write-once/read-many), the data flushed from memory does not need to be written back… the data is already persisted… otherwise HANA will flush any unwritten changed blocks…
If a query is submitted that performs a cartesian product… or that requires all of the data in the warehouse at once… in other words where there is not enough memory to fit all of the vectors in memory even after flushing everything else out… the query fails. It is my understanding that this constraint will be fixed in a next release and data will stream into memory and be processed in-stream instead of in-whole. Note that in other databases a query that consumes all of the available memory may never complete, or will seriously affect all other running queries, or will lock the system… so the HANA approach is not all bad… but as noted there is room for improvement and the constraint is real.
This note should remove several silly arguments leveled by HANA’s competitors:
- HANA, and most in-memory databases, offer full ACID-compliance. A system failure does not result in lost data.
- HANA supports more data than will fit in-memory and it pages data in-and-out in a smart fashion based on utilization. It is not constrained to only data that fits in-memory.
- HANA is not useless when it runs out of memory. HANA has a constraint when there is more data than memory… it does not crash the system… but lets be real… if you page data to disk and run out of disk you are in trouble… and we’ve all seen our DBMS‘s hit this wall. If you have an in-memory DBMS then you need to have enough memory to support your workload… if you have a DB2 system you better not run out of temp space or log space on disk… if you have Teradata you better not run out of spool space.
I apologize… there is no public reference I know of to support the features I described. It is available to HANA customers in the HANA Blue Book. It is my understanding that a public version of the Blue Book is being developed.