This is a rehash of my post for SAP here… I thought you might find it interesting as it describes the architecture HANA uses to support OLTP and BI against a single table.
A couple of points to think about:
- If you have only one database structure you can optimize for only one query; e.g. the OLTP query is fast against a OLTP structure but slow against a BI structure… or visa versa.
- If you have two structures you have to ETL the data between the two at some cost. There is cost in keeping a replica of the data, cost in developing, administering, and executing the ETL process. In addition there is a lost opportunity cost hidden in the latency of the data. You cannot see the current state of the business by querying the BI data as some data has not yet been ETL’d across.
- OLTP performance is normally paramount; so the perfect system would not compromise that performance or compromise it only a little.
Let’s look at the HANA approach to this at a high level.
HANA provides a single view of a table to an application or a user, but under-the-covers each table includes a OLTP optimized part, a BI optimized part, and a mechanism for moving data from one part to the other
When a transaction hits the system; inserts, updates, and deletes are processed in the OLTP part with no performance penalty. The read portion of the OLTP query accesses the read-optimized internal structure with no performance penalty. Note that reading a single column in a column store, which is the key for the transaction, is roughly equivalent to reading an index structure on top of a standard disk-based DBMS. Except the column is always in-memory which means I/O is never required. This provides the HANA system with an advantage over a disk-based system. Disk I/O is 120+ times slower than memory access so even an index is unlikely to beat in-memory. See here for some numbers you should know.
After the transaction is committed into the internal, OLTP-optimized part, a process starts that moves the data to the BI optimized part. This is called a delta merge as the OLTP portion holds all of the changes, the delta, in the data set.
When a BI query starts it can limit the scan to only partitions in the BI optimized part, or if real-time data is required it can scan both parts. The small portion of the scan that accesses the OLTP/delta portion is sub-optimal when compared to the scan of the BI part, but not slow at all as the data is all in-memory.
We can tease the performance apart as follows:
- There is a OLTP insert/update/delete “write” portion… and HANA executes this like any OLTP database, as fast as an OLTP RDBMS, with a commit after a write-to-log;
- There is a OLTP select “read” portion… and HANA performs this in the in-memory column store faster than many OLTP databases… and scans the delta structure as fast as any OLTP database;
- There is a delta merge from the OLTP write-optimized part to the BI read-optimized column store that is hundreds to tens of thousands of times faster than any ETL tool; and
- There is a BI select portion that scans the in-memory column store hundreds to thousands of times faster than a disk-based BI database.
- If the BI query requires access to real-time data then an in-memory scan of the delta file is required… there is no analogy to this in a system with separate OLTP and BI tables.
Nice.