Numbers Every Database Professional Should Know

Latency Estimates in a Processor

In this post I will setup the next post by reminding you of these numbers every programmer should know. The picture shows the latency to access data across the three levels of cache in a modern processor, and across the memory bus to DRAM, and then across to an SSD or spinning disk drive. 

These numbers are the key to in-memory database performance. L1 cache is expensive and small. All data and all instructions are fed to the core through the L1 cache and typically the instruction and the data it will operate on are each fetched independently. If the data (or instruction… but I will just talk about the data from here on out) is not found in the small L1 cache the L2 cache is searched. Likewise, if the data is not in the L2 cache the L3 cache is searched and then DRAM. As you can see there is a 200X performance difference between a fetch from L1 cache and a fetch from DRAM. If the data must be fetched from SSD or disk, there is a 1000X penalty. 

Databases work to mitigate the 1000X penalty by pre-fetching data into DRAM. Modern processors mitigate the penalty between cache and DRAM by trying to anticipate the data required and pre-fetching it into L3 cache. This is a gamble and the likelihood of winning the pre-fetch bet varies with your DBMS and with the workload. More simultaneous queries against many tables puts pressure on the memory and lessens the effectiveness of data pre-fetch. 

Databases also mitigate the 1000X penalty by compressing data so that each fetch of a block from disk captures more data. For tabular (not columnar) data each block must be decompressed before it can be used. Decompression steals CPU cycles from query processing but mitigates some of the cost of I/O to disk. Columnar databases fetch compressed data and, if the databases use modern vector processing techniques, they operate directly on the compressed vector data without decompression. This is especially powerful as these vectorized columnar databases can push compressed data into cache and use that memory more effectively. 

Vectorized data can use the supercomputing instruction set that includes SIMD instructions. SIMD stands for Single Instruction; Multiple Data and these instructions fetch a single instruction from L1 cache and reuse it on the vectorized data over and over with pausing to fetch new instructions. Note that in a modern processor waiting for instructions or data to be fetched from cache or DRAM appears as CPU busy time. When the processor is stalled waiting for data to be fetched from disk the query/job gives up the CPU and dispatches a new query/job. 

Finally, for the longer-running queries associated with analytics and business intelligence (where long-running means a few seconds or more… “long” in CPU instruction time), it is highly likely that the L1 and L2 cache in each core will be flushed and new data will be fetched. It is even likely that all the data in L3 cache will be flushed. In this case most of the goodness associated with compression and vectorization is flushed with it. The vectors must be fetched again from L3 and DRAM. Further, note that when you run in a virtual machine or in the cloud you may think that you are the only user but other virtual machines other containers, or other serverless processes may be flushing you out of memory all the time. 

This is important. Until we swapped out a query for a new one it looked like in-memory vector processing might be 200X-1000X faster than a tabular query; but in an environment where tens or hundreds of queries are running concurrently the pressure on memory keeps flushing the cache and the advantage of vectorized query processing is reduced. Reduced, not eliminated. We would still expect the advantages of compressed data in cache and SIMD supercomputing instructions to provide much more than a 10X speedup. 

As mentioned, this post is a detailed review of material I covered years ago when HANA was introduced. In the next post I will add some new thinking. 

Here is more information on processor architecture and cache usage.

How DBMS Vendors Admit to an Architectural Limitation: Part 2 – Teradata Intelligent Memory

This is the second post (see Part 1 here) on how vendors adjust their architecture without admitting that the previous architecture was flawed. This time we’ll consider Teradata and in-memory….

When SAP HANA appeared Teradata went on the warpath with a series of posts and statements that were pointed but oddly miscued (see the references below). According to the posts in-memory was unnecessary and SAP was on a misguided journey.

Then Teradata announced Intelligent Memory and in-memory was cool. This is pretty close to an admission that SAP was right and Teradata was wrong. The numbers which drove Teradata here are compelling… 100K-200K ns to access an SSD device or 100 ns to access DRAM… a 1000X reduction… and the latency to disk is 100X worse than SSD.

Intelligent Memory was announced shortly after the release of Teradata’s columnar table type. Column-orientation is important because you need a powerful approach to compression to effectively use an expensive memory resource… and columnar provides this. But Teradata, like Greenplum, extended a row-based engine to support columns in order to get to market quick… they hoped to get 80% of the effectiveness of in-memory with only 20% of the engineering effort. The other 20% comes when you develop a new engine that fully exploits the advantages of a columnar architecture. These advanced exploits allow HANA, DB2 BLU, and Oracle 12c to execute directly on columnar data thereby avoiding decompression, fully utilizing the processor caches, and allowing sets to be operated on by super-computing vector-processing instructions. In fact, Teradata really applied the 50/20 rule… they gained 50%, maybe only 40%, of the benefits with their columnar and Intelligent Memory features… but it was easy to deploy what is in-effect an in-memory cache over their existing relational engine.

Please don’t jump to the wrong conclusion here… Intelligent Memory is a strong product. If you were to put hot data in memory, cool data in Teradata-on-SSD-or-Disk, and cold data in Hadoop and manage them as one EDW you could deploy a very cost-effective platform (see here).

Still, Teradata with Intelligent Memory is not likely to compete effectively against HANA, BLU, or 12c for raw performance… so there will be some marketing foam attached and an appeal for Teradata shops to avoid database apostasy and stick with them. You can see some of the foam in the articles below.

A quick aside here… generally a DBMS should win or lose based on price/performance. The ANSI standard makes a products features nearly, not completely but nearly, irrelevant. If you cannot win on price/performance then you blow foam. When any vendor starts talking about things like TCO you should grab your wallets… it is an appeal to foaminess to hide a weakness. I’m not calling out Teradata here… this is general warning that applies to every software vendor.

Intelligent Memory is a smart move. While it may not win in a head-to-head POC… it will be close-ish… close enough to keep the congregation in their pews. As readers know, I am not a big fan of technical religiosity… being a “Teradata-shop” is lazy… engineers we should pick the best solution and learn it. The tiered approach mentioned three paragraphs up is a good solution and non-Teradata shops should be considering it… but Teradata shops should be open to new technology as well. Still, we should pick new technology with a sensitivity to the cost of a migration… and in many cases Intelligent Memory will save business for Teradata by getting just close enough to make migration a bad trade-off. This is why it was so smart.

Back to the theme of these posts… Teradata back-tracked on the value of in-memory… and in the process admitted-without-admitting a shortcoming in their architecture. So it goes…

Next we will consider whether you should be building data warehouses on z/OS using DB2 or the DB2 Analytics Accelerator aka Netezza.

References

Database Fog Blog

Other

Key Values and Key-Value Stores and In-memory Databases

Back to more geeky topics… although my Mom loved the videos…

When very high performance is required to return key performance metrics derived from large volumes of data to a very large number of clients… in other words when volume and velocity are factors and the results are to be delivered to thousands of users (I suppose that I could conjure up a clever V here… but the cleverness would come from the silliness of the semantic stretch… so I’ll leave it to you all to have some pun); the conventional approach has been to pre-compute the results into a set of values that can be fetched by key. In other words, we build a pre-aggregated and pre-joined result table that provides answers to a single query template. Conventionally building this query-specific result table has been the only way to solve these big problems.

Further, we conventionally store these key-value results in a relational DBMS and fetch a row at a time providing pretty darn good performance. Sometimes pretty darn good is not good enough. So there are new options. Key-value data stores may well offer a solution that provides performance and scale at a price well below what has been conventional to date. This is well and good.

But I would like to challenge the conventional thinking a little. The process of joining and aggregating volumes of data into results is a BI process. For the last twenty years BI practitioners have been building pre-aggregated tables and data marts to solve these same problems… maybe not at scale… and this practice has proven to be very expensive, the opposite of agile, and unsustainable. The people costs to develop and support multiple pre-computed replicas is exorbitant. The lack of flexibility that comes from imposing a longish development project over what is essentially an aggregate BI query is constraining our enterprise and our customers.

A better approach is to use the new high performance database products: HANA, BLU, Oracle 12c, or maybe Spark; to aggregate on-demand. Included in this approach is a requirement to use the new high performance database computing platforms available to house the databases.

Consider this… an in-memory DBMS can aggregate 12M rows/sec/core. It can scan 3MB/msec/core. Companies like SGI and HP are ganging processors together so that you can buy a single node that contains 32, 64, or 128 cores… and this number will go up. A 64-core server will aggregate 768M rows/sec and scan 19.2TB/sec… and you can gang a small number of nodes together and scale out.

Providing an extensible BI platform for big data is so much easier than building single-query key-value clusters… there is much less risk… and the agility and TCO make it close to a no-brainer. We just have to re-think the approach we’ve used for 20 years and let the new software and hardware do the work.

Some Database Performance Concepts

I’m working on a new idea… it may or may not pan out… but here are some concepts for your consideration… with some thoughts on their performance implications.

First a reminder… a reality check. In my experience if you POC two databases at about the same price point…and one is 30% faster than the other, 1.3X, then 50% of the time the faster DBMS will win the business. If one DBMS is 2X faster… then it will win the business 90% of the time. The 10% where the faster product loses will be because of internal politics or, for an existing application, due to the migration costs. Note that IMO it is silly to do a POC if you know up front that you will not pick the winner.

Now to the concepts… Note that these are ballpark numbers to help you think about trade-offs…

102312_2108_HopFestP1Ex1.png

The latency to start fetching data from DRAM is 100 ns… from disk it is 10M ns. If we assume that a smart RDBMS pre-fetches 80% of the data into DRAM then we can assume that an in-memory DBMS has a 200,000X performance advantage over a disk-based system.

The latency to a Flash/SSD device is 100K-200K ns. With the same 80% pre-fetch assumption an in-memory DBMS will be 20,000X faster.

Note that neither of these models include data transfer times which will favor in-memory databases even more.

If we have a hybrid system with both disk and SSD and we assume that 90% of the reads hit the SSD and that both layers in the storage hierarchy achieve 80% pre-fetch then then the in-memory system will be 38,000X faster. If fewer than 90% of the reads hit the SSD, then the latency goes up quickly.

These numbers form the basis for selecting in-memory caches like Teradata’s Intelligent Memory option as well as in-memory offerings from IBM, Microsoft, Oracle and SAP.

For typical data warehouse workloads column compression will provide around a 2.5X performance boost over row compression. This has two implications: you will get 2.5X better performance using column storage and you will get 2.5X more data into the faster levels of your storage hierarchy… more in SSD and more in-memory.

If we assume that a typical query only touches 10% of the columns in the tables addressed… then column projection provides a 9X performance boost over a row store. Exadata does not support column projection in the storage layer… and other hybrid row-or-column systems provide it only for columnar tables.

If we assume that the average latency from the processor caches is 10ns (.5ns L1, 7ns, L2, 15ns L3) and the latency to DRAM is 100ns then an in-memory system which pre-fetches data effectively into the processor caches will be 10X faster than one which goes to DRAM. If we assume that a standard RDBMS which processes uncompressed standard data types (no vector processing) gets a 20% cache hit ratio then the advantage to a cache aware RDBMS which loads full cache lines is around 8X. HANA, BLU, and the Oracle in-memory products are cache aware and get this boost with some caveats.

BLU and the Oracle in-memory option are hybrid systems that often convert data to a row form for processing (see here for some data on Oracle). If we assume that they use the full columnar in-memory vector-based structures 50% of the time then these products will see a 4X performance boost. HANA recommends that all data be stored in a columnar form so it would often see the full 8x boost.

These vector-based processes also avoid the cost of decompression… and since they process compressed vector data they can fit more information into each cache line. There is another 20%-200% (1.2X-2X) boost here but I cannot estimate it closer than that.

Finally, the vector based processes use the high performance computing instruction sets (AVX2) offered on modern CPUs… and this provides another 10X+ boost. Again, BLU and Oracle will utilize the vector form less often than HANA so they will see a boost over products like Teradata… but not see as large a boost as HANA.

There are other features at play here… some products, like HANA, shard data in-memory to get all of the cores busy on each query. I have not been able to determine if BLU or Oracle in-memory are there yet? Note that this powerful feature will allow a single query to run as fast as possible… but the benefit is mitigated when there is a workload of multiple concurrent queries (if I have 4 cores and 4 queries running concurrently, one query per core, then the 4 queries will take only a little more time than if I run the 4 queries serially with each query using all 4 cores).

It is important to note that the Oracle In-memory option does not run in the storage component of an Exadata cluster… only on the RAC layer. It is unclear how this works with the in-memory option.

The bottom line is that in-memory systems are not all alike. You can sort of add up the multipliers to get in the ballpark on how much faster Teradata will be with the Intelligent Memory option… how much faster than that a hybrid row and vector-column system like BLU or Oracle In-Memory… and how much faster a pure in-memory system might be. One thing is for sure… these in-memory options will always make the difference in a POC… in every case including them or not will blow away the 2X rule I started with… and in every case the performance benefit will outweigh the extra cost… the price/performance is very likely to be there.

I know that is skipped my usual referencing so that you can see where I pulled these numbers from… but most of this information is buried in posts here and there on my blog… and as I stated up front… these are ballpark numbers. Hopefully you can see the sense behind them… but if you think I’m off please comment back and I’ll try to adjust…

Part 6: How Hadooped is HANA?

Now for HANA plus Hadoop… to continue this thread on RDBMS-Hadoop integration (Part 1, Part 2, Part 3, Part 4, Part 5) I have suggested that we could evaluate integration architecture using three criteria:

  1. How parallel are the pipes to move data between the RDBMS and the parallel file system;
  2. Is there intelligence to push down predicates; and
  3. Is there more intelligence to push down joins and other relational operators?

As a preface I need to include a note/apology. As you will see HANA may well have the best RDBMS-Hadoop integration in the market. I try hard not to blow foam about HANA in this blog… and I hope that the objective criteria I have devised to evaluate all of the products will keep this post credible… but please look at this post harder than most and push back if you think that I overstep.

First… surprisingly, HANA’s first release has only a single pipe to the Hadoop side. This is worrisome but easily fixed. It will negatively impact performance when large tables/files have to be moved up for processing.

But HANA includes Hadoop as a full partner in a federated data architecture using the Smart Data Access (SDA) engine inside the HANA address space. As a result, HANA not only pushes predicates but it uses cost-based optimization to determine what to push down and what to pull up. HANA interrogates the Hadoop system to gather statistics and uses the HANA optimizer to develop smart execution plans with awareness of both the speed of in-memory and the limited memory resources. When data in HANA is joined with data in Hadoop SDA effectively uses semi-joins to minimize the data pulled up.

Finally, HANA can develop execution plans that executes joins in Hadoop. This includes both joins between two Hadoop tables and joins where small in-memory tables are pushed down to execute the joins in Hadoop. The current limitation is that Hadoop files must be defined as Hive tables.

Here is the HANA execution plan for TPC-H query 19. HANA has pushed down all of the steps behind the Remote Row Scan step… so in this case the entire query including a nested loop join was pushed down. In other queries HANA will push only parts of the plan to Hadoop.

TPCH Q19 Plan

So HANA possesses a very sophisticated integration with Hadoop… with capabilities that minimize the amount of data moved based on the cost of the movement. This is where all products need to go. But without parallel pipes this sophisticated capability provides only a moderate advantage (see Part 5),

Note that this is not the ultimate in integration… there is another level… but I’ll leave some ideas for extending integration even further for my final post in the series.

Next… Part 7… considering Greenplum…

HANA, BLU, Hekaton, and Oracle 12c vs. Teradata and Greenplum – November 2013

Catch Me If You Can (musical)
(Photo credit: Wikipedia)

I would like to point out a very important section in the paper on Hekaton on the Microsoft Research site here. I will quote the section in total:

2. DESIGN CONSIDERATIONS 

An analysis done early on in the project drove home the fact that a 10-100X throughput improvement cannot be achieved by optimizing existing SQL Server mechanisms. Throughput can be increased in three ways: improving scalability, improving CPI (cycles per instruction), and reducing the number of instructions executed per request. The analysis showed that, even under highly optimistic assumptions, improving scalability and CPI can produce only a 3-4X improvement. The detailed analysis is included as an appendix. 

The only real hope is to reduce the number of instructions executed but the reduction needs to be dramatic. To go 10X faster, the engine must execute 90% fewer instructions and yet still get the work done. To go 100X faster, it must execute 99% fewer instructions. This level of improvement is not feasible by optimizing existing storage and execution mechanisms. Reaching the 10-100X goal requires a much more efficient way to store and process data. 

This is important because it confirms the difference in a Level 3 and a Level 2 columnar implementation as described here. It is just not possible for a Level 2 implementation with a row-based join engine to achieve the performance of a Level 3 implementation. This will allow the Level 3 implementations: HANA, BLU, Hekaton, and Oracle 12c to distance themselves from the Level 2 products: Teradata and Greenplum; by more than 10X… and this is a very significant advantage.

Related articles

CPUs and HW for HANA, BLU, Hekaton, and Oracle 12c

CPUs from retired computers waiting for recycl...
(Photo credit: Wikipedia)

This short post is intended to provide a quick warning regarding in-memory columnar and cpu requirements… with a longer post to follow.

When a row is inserted or bulk-loaded into a DBMS, if there are no indexes, the amount of cpu required is very small. The majority of the time is spent committing a transaction is the time to write a log record to persist the data.

When the same record is reformatted into a column the amount of processing required is significantly higher. The data must be parsed into columns, the values must be compressed, dictionaries may be updated, and the breadcrumbs that let the columnar data be regenerated into rows must be laid. Further, if the columnar structure are to be optimized then the data must be ordered… with a sort or some kind of index structure. I have seen academic papers that suggest that for an insert columnar processing may be 100X more than row processing… and you can see why this could be true (I apologize for not finding the reference… I’ll dig it up… as I recall I read it in a post some time back by Daniel Abadi).

Now let’s think about this… several vendors are suggesting that you can deploy their columnar features with no changes required… no new hardware… in-place. But this does not ring true if the new columnar feature requires 100X extra CPU cycles per row… or 50X… or 10X… unless you are running your database on an empty server.

This claim is a shot at SAP who, more honestly, suggests new hardware with high-end processors for their in-memory columnar product… but methinks it is marketing, not architecture, from these other folks.

BLU Meanies: Data In-memory

Cover of "Sgt. Pepper's Lonely Hearts Clu...
Cover via Amazon

IBM is presenting a DB2 Tech Talk that compares the BLU Accelerator to HANA. There are several mistakes and some odd thinking in the pitch so let me address the issues as a way to explain some things about HANA and about BLU. This blog will consider what data needs to be in-memory.

IBM like several others, continues to repeat a talking point along the lines of: “We believe that you should not have to fit all of you active data in memory…”. Let’s think about this…

Note that in the current release HANA has a constraint that all of the data in a single column, the entire vector that represents the data in that column, must be in-memory before it can be operated on. If the table is partitioned and partition-elimination is applied then the data in the partition for the column must be loaded in-memory. This is a real constraint that will be removed in a subsequent release… but it is not a very severe constraint if you think about it.

But let’s be clear… HANA does not require all data to be in-memory… it will read data from peripheral devices in and out as required just as BLU does.

Now what does this mean? Let’s walk through some scenarios.

First, let’s imagine a customer with 10TB of user data, per the scenario IBM discusses. Let’s not get into a whose product compresses better discussion and assume that both BLU and HANA will get 4X compression… so there is 2.5TB of user data to be processed.

Now let’s imagine a system with only a very little memory available for data. In other words, let’s configure both BLU and HANA so that they are full columnar databases, but not in-memory databases. In this case BLU would operate by doing constant I/O without constraint and HANA would fail whenever it could not fit a required column in memory. Note that HANA might not fail at all… it would depend on whether there was a large single un-partitioned column that was required.

This scenario is really silly though… HANA is an in-memory database, designed to keep data in-memory from the start… so SAP would not support this imaginary configuration. The fact that you could make BLU work out of memory is not really relevant as nowhere does IBM position, or reference, BLU as a disk-based column store add-on… you would just use DB2.

Now let’s configure a system to IBM’s specification with 400GB of memory. IBM does not really say how much of this memory is available to BLU for data… but for the sake of argument let’s ignore the system requirements and assume that BLU uses one-half, 200GB, as work space to process queries so that 200GB is available to store data in-memory. As you will see it does not really matter in this argument whether I am spot on here or not. So using IBM’s recommendation there is now a 200GB cache that can be used as data is paged in and out. Anyone who has ever used a data warehouse knows that caching does not work well for BI queries as each query touches large enough volumes of the data to flush the cache… so BLU will effectively be performing I/O for most queries and is back to being an out-of-memory columnar database. Note that this flushing issue is why the in-memory capabilities from Oracle and Teradata pin certain tables into memory. In this scenario HANA will operate exactly as BLU does with the constraint that any single column that in a compressed form exceeds 200GB will not be able to be processed.

Finally let’s configure a system with 5TB of memory per SAP’s recommendation for HANA. In this case BLU and HANA both fit all of the data in-memory… with 2.5TB of compressed user data in and 2.5TB of work space… and there is no I/O. This is an in-memory DBMS.

But according to the IBM Power 770 spec (here) there is no way to get 5TB of memory on a single p770 node… so to match HANA and eliminate all I/O they would require two nodes… but BLU cannot be deployed on a cluster… so on they would have to deploy on a single node and perform I/O on 20% of the data. The latency for SSD I/O is 200Kns and for disk it is 10Mns… for DRAM it is 100ns and HANA loads full cache lines so that the average latency is under 20ns… so the penalty paid by BLU is severe and it will never keep up with HANA.

There is more bunk around recommendations for the number of cores but I can make no sense of it at all so I do not know where to begin to debunk it. SAP recommends high-end Intel servers to run HANA. In the scenario above we would recommend multiple servers… soon enough there will be Haswell servers with 6TB of DRAM and this case will run on one node.

I have stated repeatedly that anytime a vendor presents a slide comparing their product to their competitors you should immediately throw them out… it will always be twisted. Don’t trust them. And don’t trust me as I work for SAP. But hopefully you can see some logic in my case. If you need an IMDB then you need memory. If you are short of memory then the IMDB operates like a columnar RDBMS with a memory cache. If you are running a BI query workload then you need to pin data in the cache or the system will thrash. Because of this SAP recommends that you get enough memory to get all of the data in… we recommend that you operate our in-memory database product in-memory…

This really the point of the post. The Five Minute Rule informs us about what data should be in-memory (see here). An in-memory database is designed from the bottom up to manage hot data in-memory. The in-memory add-ons being offered over legacy systems are very capable and should not be ignored… and as the price of memory drops the Five Minute Rule will suggest that data in-memory will account for and ever larger percentage of your EDW. But to offer an in-memory capability and recommend that you should keep the bulk of the data on disk is silly… and to state that your product has a competitive advantage because you do not recommend that all of the data managed by your in-memory feature be kept in-memory is silliness squared.

IBM BLU and SAP HANA

Weird blue dot
Weird blue dot (Photo credit: awshots)

As I noted here, I think that the IBM BLU Accelerator is a very nice piece of work. Readers of this blog are in the software business where any feature developed by any vendor can be developed in a relatively short period of time by any other vendor… and BLU certainly moves DB2 forward in the in-memory database space led by HANA… it narrowed the gap. But let’s look at the gap that remains.

First, IBM is touting the fact that BLU requires no proprietary hardware and suggests that HANA does. I do not really understand this positioning? HANA runs on servers from a long list of vendors and each vendor spins the HANA reference architecture a little differently. I suppose that the fact that there is a HANA reference architecture could be considered limiting… and I guess that there is no reference for BLU… maybe it runs anywhere… but let’s think about that.

If you decide to run BLU and put some data in-memory then certainly you need some free memory to store it. Assuming that you are not running on a server with excess memory this means that you need to buy more. If you are running on a blade that only supports 128GB of DRAM or less, then this is problematic. If you upgrade to a 256GB server then you might get a bit of free memory for a little data. If you upgrade to a fat server that supports 512GB of DRAM or more, then you would likely be within the HANA reference architecture set. There is no magic here.

One of the gaps is related: you cannot cluster BLU so the amount of data you can support in-memory is limited to a single node per the paragraphs above. HANA supports shared-nothing clustering and will scale out to support petabytes of data in-memory.

This limit is not so terribly bad if you store some of your data in the conventional DB2 row store… or in a columnar format on-disk. This is why BLU is an accelerator, not a full-fledged in-memory DBMS. But if the limit means that you can get only a small amount of data resident in-memory it may preclude you from putting the sort of medium-to-large fact tables in BLU that would benefit most from the acceleration.

You might consider putting smaller dimension tables in BLU…. but when you join to the conventional DB2 row store the column store tables are materialized as rows and the row database engine executes the join. You can store the facts in BLU in columnar format… but they may not reside in-memory if there is limited availability… and only those joins that do not use row store will use the BLU level 3 columnar features (see here for a description of the levels of columnar maturity). So many queries will require I/O to fetch data.

When you pull this all together: limited available memory on a single node, with large fact tables projecting in and out of disk storage, and joins pushed to the row store you can imagine the severe constraint for a real-world data warehouse workload. BLU will accelerate some stuff… but the application has to be limited to the DRAM dedicated to BLU.

It is only software… IBM will surely add BLU clustering (see here)… and customers will figure out that they need to buy the same big-memory servers that make up the HANA reference architecture to realize the benefits…  For analytics, BLU features will converge over the next 2-3 years to make it ever more competitive with HANA. But in this first BLU release the use of in-memory marketing slogans and of tests that might not reflect a real-world workload are a little misleading.

Right now it seems that HANA might retain two architectural advantages:

  1. HANA real-time support for OLTP and analytics against a single table instance; and
  2. the performance of the HANA platform: where more application logic runs next to the DBMS, in the same address space, across a lightweight thread boundary.

It is only software… so even these advantages will not remain… and the changing landscape will provide fodder for bloggers for years to come.

References

  • Here is a great series of blogs on BLU that shows how joins with the row store materializes columns as rows…

HANA Memory Utilization

The current release of HANA requires that all of the data required to satisfy a query be in-memory to run the query. Let’s think about what this means:

HANA compresses tables into bitmap vectors… and then compresses the vectors on write to reduce disk I/O. Disk I/O with HANA? Yup.

Once this formatting is complete all tables and partitions are persisted to disk… and if there are updates to the tables then logs are written to maintain ACIDity and at some interval, the changed data is persisted asynchronously as blocks to disk. When HANA cold starts no data is in-memory. There are options to pre-load data at start-up… but the default is to load data as it is used.

When the first query begins execution the data required to satisfy the query is moved into memory and decompressed into vectors. Note that the vector format is still highly compressed and the execution engine operates on this compressed vector data. Also, partition elimination occurs during this data move… so only the partitions required are loaded. The remaining data is on disk until required.

Let us imagine that after several queries all of the available memory is consumed… but there is still user data out-of-memory on peripheral storage… and a new query is submitted that requires this data. At this point HANA frees enough storage to satisfy the new query and processes it. Note that, in the usual DW case (write-once/read-many), the data flushed from memory does not need to be written back…  the data is already persisted… otherwise HANA will flush any unwritten changed blocks…

If a query is submitted that performs a cartesian product… or that requires all of the data in the warehouse at once… in other words where there is not enough memory to fit all of the vectors in memory even after flushing everything else out… the query fails. It is my understanding that this constraint will be fixed in a next release and data will stream into memory and be processed in-stream instead of in-whole. Note that in other databases a query that consumes all of the available memory may never complete, or will seriously affect all other running queries, or will lock the system… so the HANA approach is not all bad… but as noted there is room for improvement and the constraint is real.

This note should remove several silly arguments leveled by HANA’s competitors:

  • HANA, and most in-memory databases, offer full ACID-compliance. A system failure does not result in lost data.
  • HANA supports more data than will fit in-memory and it pages data in-and-out in a smart fashion based on utilization. It is not constrained to only data that fits in-memory.
  • HANA is not useless when it runs out of memory. HANA has a constraint when there is more data than memory… it does not crash the system… but lets be real… if you page data to disk and run out of disk you are in trouble… and we’ve all seen our DBMS‘s hit this wall. If you have an in-memory DBMS then you need to have enough memory to support your workload… if you have a DB2 system you better not run out of temp space or log space on disk… if you have Teradata you better not run out of spool space.

I apologize… there is no public reference I know of to support the features I described. It is available to HANA customers in the HANA Blue Book. It is my understanding that a public version of the Blue Book is being developed.