A Short DW DBMS Market History: HANA, Oracle, DB2, Netezza, Teradata, & Greenplum

Here is a quick review of tens years of data warehouse database competition… and a peek ahead…

Maybe ten years ago Netezza shook up the DW DBMS market with a parallel database machine that could compete with Teradata.

About six years ago Greenplum entered the market with a commodity-based product that was competitive… and then added column store to make it a price/performance winner.

A couple of years later Oracle entered with Exadata… a product competitive enough to keep the Oracle faithful on an Oracle product… but nothing really special otherwise.

Teradata eventually added a columnar feature that matched Greenplum… and Greenplum focussed away from the data warehouse space. Netezza could not match the power of columnar and could not get there so they fell away.

At this point Teradata was more-or-less back on top… although Greenplum and the other chipped away based on price. In addition, Hadoop entered the market and ate away at Teradata’s dominance in the Big Data space. The impact of Hadoop is well documented in this blog.

Three-to-four years ago SAP introduced HANA and the whole market gasped. HANA was delivering 1000X performance using columnar formats, memory to eliminate I/O, and bare-metal techniques that effectively loaded data into the processor in full cache lines.

Unfortunately, SAP did not take advantage of their significant lead in the general database markets. They focussed on their large installed base of customers… pricing HANA in a way that generated revenue but did not allow for much growth in market share. Maybe this was smart… maybe not… I was not privy to the debate.

Now Oracle has responded with in-memory columnar capability and IBM has introduced BLU. We might argue over which implementation is best… but clearly whatever lead SAP HANA held is greatly diminished. Further, HANA pricing makes it a very tough sell outside of its implementation inside the SAP Business Suite.

Teradata has provided a memory-based cache under its columnar capabilities… but this is not at the same level of sophistication as the HANA, 12c, BLU technologies which compute directly against compressed columnar data.

Hadoop is catching up slowly and we should expect that barring some giant advance from the commercial space that they will reach parity in the next 5 years or so (the will claim parity sooner… but if we require all of the capabilities offered to be present there is just no way to produce mature software any faster than 5 years).

Interestingly there is one player who seems to be advancing the state of the art. Greenplum has rolled out a best-in-class optimizer with Orca… and now has acquired Quickstep which may provide the state-of-the-art in bare metal columnar computing. When these come together Greenplum could once again bounce to the top of the performance, and the price/performance, stack. In addition, Greenplum has skinnied down and is running on an open source business model. They are very Hadoop-friendly.

It will be interesting to see if this open-source business model provides the revenue to drive advanced development… there is not really a “community” behind Greenplum development. It will also be interesting to see if the skinny business model will allow for the deployment of an enterprise-level sales force… but it just might. If Pivotal combines this new technology with a focus on the large EDW market… they may become a bigger player.

Note that was sort of dumb-luck that I posted about how Hadoop might impact revenues of big database players like Teradata right before Teradata posted a loss… but do not over think this and jump to the conclusion that Teradata is dying. They are the leader in their large space. They have great technology and they more-or-less keep up with the competition. But skinnier companies can afford to charge less and Teradata, who grew up in the days of big enterprise software, will have to skinny down like Greenplum. It will be much harder for Teradata than it was for Greenplum… and both companies will struggle with profitability for a while. But it is these technology and market dynamics that give us all something to think about, blog about, and talk about over beers…

Hadoop and Company Financial Performance

I have posted several times about the impact of the Hadoop eco-system on a several companies (here, here, here, for example). The topic cam up in a tweet thread a few weeks back… which prompts this quick note.

Fours years ago the street price for a scalable, parallel, enterprise data warehouse platform was $US25K-$US35K per terabyte. This price point provided vendors like Teradata, Netezza, and Greenplum reasonable, lucrative, margins. Hadoop entered the scene and captured the Big Data space from these vendors by offering 20X slower performance at 1/20th the price: $US1K-$US5K per terabyte. The capture was immediate and real… customers who were selecting these products for specialized, very large, 1PB and up deployments switched to Hadoop as fast as possible.

Now, two trends continue to eat at the market share of parallel database products.

First, relational implementations on HDFS continue to improve in performance and they are now 4X-10X slower than the best parallel databases at 1/10th-1/20th the street price. This puts pressure on prices and on margins for these relational vendors and this pressure is felt.

In order to keep their installed base of customers in the fold these vendors have built ever more sophisticated integration between their relational products and Hadoop. This integration, however, allows customers to significantly reduce expense by moving large parts of their EDW to an Annex (see here)… and this trend has started. We might argue whether an EDW Annex should store the coldest 80% or the coldest 20% of the data in your EDW… but there is little doubt that some older data could satisfy SLAs by delivering 4X-10X slower performance.

In addition, these trends converge. If you can only put 20% of your old, cold data in an Annex that is 10X slower than your EDW platform then you might put 50% of your data into an Annex that is only 4X slower. As the Hadoop relational implementations continue to add columnar, in-memory, and other accelerators… ever more data could move to a Hadoop-based EDW Annex.

I’ll leave it to the gamblers who read this to guess the timing and magnitude of the impact of Hadoop on the relational database markets and on company financial performance. I cannot see how it cannot have an impact.

Well, actually I can see one way out. If the requirement for hot data that requires high performance accelerates faster than the high performance advances of Hadoop then the parallel RDBMS folks will hold their own or advance. Maybe the Internet of Things helps here…. but I doubt it.

More Database Supercomputing Technology

Last year two associates from Greenplum suggested that I read a very smart academic paper titled “Efficiently Compiling Efficient Query Plans for Modern Hardware” by Thomas Neumann. Having reiterated the idea of database supercomputing in my last blog (here)… I can now suggest this paper to you.

In short this paper suggests that the classic approach to building query plans using an iterator, an approach that assumed that I/O was the bottleneck, misses a significant opportunity for optimization at the hardware level. He suggests that an approach designed to keep data in the hardware registers as long as possible and push instructions to the data provides a significant performance boost. Further, he suggests that this approach extends the advantages of vector processing. The paper is here and a set of slides on the topic are here.

My friends subsequently started-up a little company named Vitesse Data (vitessedata.com) and implemented the technique over Postgres. Check out their site… the benchmark numbers are pretty cool and they certainly prove the paper. My guess is that this might be a next step in the database architecture race.

FYI… here is a link to more information on the LLVM compiler framework… another very cool bit of software.

One last note… some folks see optimization to the bare metal as an odd approach in a cloudy world where even the database is abstracted away from the bare metal by virtualization. But this thinking misses the point. At some point the database program executes in real hardware and these optimizations matter. What really happens is that bare metal optimization exposes more of the inefficiency of virtualization.

We are already starting to see the emergence of clouds that deploy on bare metal. I expect that we will shortly get to the point where things like databases are deployed on bare-metal cloud IaaS to squeeze every drop of performance out… while other programs are deployed in virtualized IaaS…

Key Values and Key-Value Stores and In-memory Databases

Back to more geeky topics… although my Mom loved the videos…

When very high performance is required to return key performance metrics derived from large volumes of data to a very large number of clients… in other words when volume and velocity are factors and the results are to be delivered to thousands of users (I suppose that I could conjure up a clever V here… but the cleverness would come from the silliness of the semantic stretch… so I’ll leave it to you all to have some pun); the conventional approach has been to pre-compute the results into a set of values that can be fetched by key. In other words, we build a pre-aggregated and pre-joined result table that provides answers to a single query template. Conventionally building this query-specific result table has been the only way to solve these big problems.

Further, we conventionally store these key-value results in a relational DBMS and fetch a row at a time providing pretty darn good performance. Sometimes pretty darn good is not good enough. So there are new options. Key-value data stores may well offer a solution that provides performance and scale at a price well below what has been conventional to date. This is well and good.

But I would like to challenge the conventional thinking a little. The process of joining and aggregating volumes of data into results is a BI process. For the last twenty years BI practitioners have been building pre-aggregated tables and data marts to solve these same problems… maybe not at scale… and this practice has proven to be very expensive, the opposite of agile, and unsustainable. The people costs to develop and support multiple pre-computed replicas is exorbitant. The lack of flexibility that comes from imposing a longish development project over what is essentially an aggregate BI query is constraining our enterprise and our customers.

A better approach is to use the new high performance database products: HANA, BLU, Oracle 12c, or maybe Spark; to aggregate on-demand. Included in this approach is a requirement to use the new high performance database computing platforms available to house the databases.

Consider this… an in-memory DBMS can aggregate 12M rows/sec/core. It can scan 3MB/msec/core. Companies like SGI and HP are ganging processors together so that you can buy a single node that contains 32, 64, or 128 cores… and this number will go up. A 64-core server will aggregate 768M rows/sec and scan 19.2TB/sec… and you can gang a small number of nodes together and scale out.

Providing an extensible BI platform for big data is so much easier than building single-query key-value clusters… there is much less risk… and the agility and TCO make it close to a no-brainer. We just have to re-think the approach we’ve used for 20 years and let the new software and hardware do the work.

Some Database Performance Concepts

I’m working on a new idea… it may or may not pan out… but here are some concepts for your consideration… with some thoughts on their performance implications.

First a reminder… a reality check. In my experience if you POC two databases at about the same price point…and one is 30% faster than the other, 1.3X, then 50% of the time the faster DBMS will win the business. If one DBMS is 2X faster… then it will win the business 90% of the time. The 10% where the faster product loses will be because of internal politics or, for an existing application, due to the migration costs. Note that IMO it is silly to do a POC if you know up front that you will not pick the winner.

Now to the concepts… Note that these are ballpark numbers to help you think about trade-offs…

102312_2108_HopFestP1Ex1.png

The latency to start fetching data from DRAM is 100 ns… from disk it is 10M ns. If we assume that a smart RDBMS pre-fetches 80% of the data into DRAM then we can assume that an in-memory DBMS has a 200,000X performance advantage over a disk-based system.

The latency to a Flash/SSD device is 100K-200K ns. With the same 80% pre-fetch assumption an in-memory DBMS will be 20,000X faster.

Note that neither of these models include data transfer times which will favor in-memory databases even more.

If we have a hybrid system with both disk and SSD and we assume that 90% of the reads hit the SSD and that both layers in the storage hierarchy achieve 80% pre-fetch then then the in-memory system will be 38,000X faster. If fewer than 90% of the reads hit the SSD, then the latency goes up quickly.

These numbers form the basis for selecting in-memory caches like Teradata’s Intelligent Memory option as well as in-memory offerings from IBM, Microsoft, Oracle and SAP.

For typical data warehouse workloads column compression will provide around a 2.5X performance boost over row compression. This has two implications: you will get 2.5X better performance using column storage and you will get 2.5X more data into the faster levels of your storage hierarchy… more in SSD and more in-memory.

If we assume that a typical query only touches 10% of the columns in the tables addressed… then column projection provides a 9X performance boost over a row store. Exadata does not support column projection in the storage layer… and other hybrid row-or-column systems provide it only for columnar tables.

If we assume that the average latency from the processor caches is 10ns (.5ns L1, 7ns, L2, 15ns L3) and the latency to DRAM is 100ns then an in-memory system which pre-fetches data effectively into the processor caches will be 10X faster than one which goes to DRAM. If we assume that a standard RDBMS which processes uncompressed standard data types (no vector processing) gets a 20% cache hit ratio then the advantage to a cache aware RDBMS which loads full cache lines is around 8X. HANA, BLU, and the Oracle in-memory products are cache aware and get this boost with some caveats.

BLU and the Oracle in-memory option are hybrid systems that often convert data to a row form for processing (see here for some data on Oracle). If we assume that they use the full columnar in-memory vector-based structures 50% of the time then these products will see a 4X performance boost. HANA recommends that all data be stored in a columnar form so it would often see the full 8x boost.

These vector-based processes also avoid the cost of decompression… and since they process compressed vector data they can fit more information into each cache line. There is another 20%-200% (1.2X-2X) boost here but I cannot estimate it closer than that.

Finally, the vector based processes use the high performance computing instruction sets (AVX2) offered on modern CPUs… and this provides another 10X+ boost. Again, BLU and Oracle will utilize the vector form less often than HANA so they will see a boost over products like Teradata… but not see as large a boost as HANA.

There are other features at play here… some products, like HANA, shard data in-memory to get all of the cores busy on each query. I have not been able to determine if BLU or Oracle in-memory are there yet? Note that this powerful feature will allow a single query to run as fast as possible… but the benefit is mitigated when there is a workload of multiple concurrent queries (if I have 4 cores and 4 queries running concurrently, one query per core, then the 4 queries will take only a little more time than if I run the 4 queries serially with each query using all 4 cores).

It is important to note that the Oracle In-memory option does not run in the storage component of an Exadata cluster… only on the RAC layer. It is unclear how this works with the in-memory option.

The bottom line is that in-memory systems are not all alike. You can sort of add up the multipliers to get in the ballpark on how much faster Teradata will be with the Intelligent Memory option… how much faster than that a hybrid row and vector-column system like BLU or Oracle In-Memory… and how much faster a pure in-memory system might be. One thing is for sure… these in-memory options will always make the difference in a POC… in every case including them or not will blow away the 2X rule I started with… and in every case the performance benefit will outweigh the extra cost… the price/performance is very likely to be there.

I know that is skipped my usual referencing so that you can see where I pulled these numbers from… but most of this information is buried in posts here and there on my blog… and as I stated up front… these are ballpark numbers. Hopefully you can see the sense behind them… but if you think I’m off please comment back and I’ll try to adjust…

A Sidebar on the IoT: Using new Things smartly

This is a sidebar to some thinking on an architecture for the Internet of Things here… – Rob

I was recently prompted to think about a Big Data problem that is in the US papers… the issues around processing US Veterans through the Veteran’s Administration (VA) bureaucracy. I imagine that there are really two problems… I will outline them… but the point will be to try to get you to think about how to use the IoT creatively to mitigate seemingly intractable problems.

One requires some obvious, if not easy-to-implement, information technology… there has to be some fool-proof way to ensure that VA staff cannot game the system and hide problems. This could be solved with an audit system that looks for anomalies in processing patterns in much the same way that other fraud detection software operates.

But fraud is not the problem… it was meant to mask the problem. The fact is that there are more Veterans requiring medical assistance than there are funds and doctors to provide that assistance. All the fraud detection would have accomplished is to show how many Veterans are going without care.

So… the IT folks at the VA are tasked with the impossible task of servicing too many Veterans with too few doctors… But military doctors are used to this issue and the word “triage” comes from the vocabulary of war… not the vocabulary of medicine.

This leads me towards the point… there are new things, in the IoT sense, coming out that can make triage of Veterans who are not hospitalized possible. Rumors are that the new iWatch contains several sensors that can monitor the pulse, temperature, and maybe more, of wearers. What the VA needs to do is immediately put an iWatch on every Veteran who applies for medical help. They then need to monitor the vitals of all of the veterans and schedule in those with the weakest signs. They need to notify individuals to come in immediately when vitals turn bad… Further, they need to track these vitals against medical records and outcomes to make the triage ever more efficient over time.

By using internet Things to triage the VA could use their limited medical resource much more efficiently… and mitigate some or all of the problem.

The IoT affords us all opportunities to change the game for our employers. There will be opportunities to disrupt some existing markets. But to take full advantage will require us to be creative and smart. I, of one, am anxious to start… Now back to considering an architecture for things.

 

The Greenplum ORCA Optimizer

In January Greenplum rolled out a new query optimizer. This is very cool and very advanced stuff.

Query optimization is a search problem… in a perfect world you would search through the space of all possible plans for any query and choose the least expensive plan. But the time required to iterate through all possible plans would take more time than most queries… so optimizers use rules to cut down the space searched. The rules have been built up over the years and are designed to prune the space quickly to keep performance high for simple queries. But these rules can break down when complex queries are introduced… so Greenplum made the significant investment to build a new optimizer from scratch.

Florian Waas, the leader of this program for Greenplum (now off on another venture) explained it to me this way. If the large rectangle in Figure 1 represents the total search space for a query, a modern query optimizer only searches the area in the small gray square… it looks for the best plan in that small space.

DBFog Query Search Space Fig1You may be surprised to learn that the optimizers used by every major DBMS product are single-threaded… they use only one core of a multi-core processor to search the space and produce a plan. There is no way to effectively search more with a faster single processor (even though you could search more the amount of time you spend as a percentage of the query execution time would stay the same… because the query execution would speed up as well)… so if the optimizer is to search more of the space it will have to use multiple cores and search the space in parallel… and this is exactly what Greenplum has accomplished.

The benchmark results for this are impressive (see here)… several queries in the TPC-DS suite run hundreds of times faster.

ORCA is available to early support customers now and the results map to the benchmark… some queries see an extreme performance boost, while others run significantly slower. This is to be expected from any first release optimizer.

But Greenplum have built another advanced technology into ORCA to reduce the time it will take to mature the software. ORCA includes AMPERe, an optimizer debugging facility that captures the state necessary to recreate problems and fix them. Together these capabilities: parallel search and specialized debugging have advanced the state of the art significantly.

What does it mean to you? It will take some time to shake out ORCA… and HAWQ is still very slow when compared to other analytic databases… and very very slow when compared to the in-memory databases available… and in-memory products like Spark are coming to the Hadoop eco-system. But at the price point HAWQ is a bargain. If you need an inexpensive batch engine that crunches numbers offline then in the next year, as ORCA matures, it may be worth a look.

As a side note… this topic introduces one of the issues related to in-memory databases… when even a very complex query completes with a sub-optimal plan in under a second how much time can you spend searching the plan space? I suspect that applying the parallel optimization principles developed by the Greenplum team will yield similar or even better improvements for in-memory… and these techniques will be a requirement very soon in that space.

References

 

Logical Data Warehouses and the Basics of Database Federation

This post will consider the implications of a full database federation as would be required by a Logical Data Warehouse. I’ll build on the concepts introduced in the posts on RDBMS-Hadoop integration (Part1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8).

Figure 1 summarizes those earlier concepts from simple to advanced.

2 Tier Federation Maturity
Figure 1. 2 Tier Federation Maturity

But the full federation required to implement a logical data warehouse requires a significant step up from this. Simple federation will be a disaster and Basic federation will not be much better. Here is why.

Let’s add a database and use Figure 2 to consider the possibilities when we submit a query that joins Table A.One to A.Two to B.One to C.One. Note that in this picture we have included a Governor to execute the federated queries that is independent of any of the DBMSs… this is the usual case for federation.

In the simple case where the Governor executes the entire plan all of the data must come to the Governor. This is clearly unacceptable. Consider the worse case where a SELECT is issued against only one table… still all of the data must bubble up.

In the Basic case the problem is partially mitigated… less data moves after the predicates are resolved but the overhead will still kill query performance. A Governor with basic capabilities provides the minimal features to make this work. It is useful where slow federation is better than data replication… but that is about all.

Figure 2. N-Tier Federation
Figure 2. N-Tier Federation

However, the advanced case becomes seriously more complicated. The optimizer now has to decide if table B.One should move to C to join the data or should it move to A… or should it move data to the Governor.

The problem is further complicated by any resource shortage on any node or any functional capability differences. If the cost of data movement would suggest moving B data to C… but there is no CPU resource available on C then maybe a different decision should be made? If C.One is a big table but C is a column-store and the cost of the SELECT is small because a minimum of columns are required and the cardinality of those columns is small so the data might be fetched from the dictionary then we might make a different decision. If B is a fast in-memory database but there is no memory available then the cost changes. Finally, if there are twenty databases in your logical DW then the problem increases exponentially.

The point here is clear… data federation over n-tiers is a hard problem. There will be severe performance issues when the optimizer picks wrong. This is why the independent governor model is so attractive… Many of the variables around CPU resources and database capabilities are removed… and while the performance will be poor it will be predictably poor. You should consider the implications carefully… it is just not clear that a high-performance logical data warehouse is feasible simply laid over an existing architecture. And if you build on a model with a Governor you must be sure that the Governor, and the Provincial databases can handle the load. I suspect that the Governor will have to run on a cluster and use a shared-nothing architecture to handle a true enterprise-sized logical EDW.

HANA has a twist on this that is interesting. The Governor lives inside one of the database nodes… so for data in HANA there is no data movement cost unless the optimizer decides to send the data to another node. Further, HANA is very fast… and the performance will mitigate some of the slowness inherent in federation. Finally, HANA is a shared-nothing DBMS… so it is not a problem to move lots of data to HANA in support of big tables and/or thousands of concurrent queries.

I’ll try to use this thinking: simple, basic, advanced federation over some governed federator on a an in-memory or fast shared-nothing architecture to evaluate the products on the market that provide federation. This may prove interesting as the Logical Data Warehouse concept catches on and as products like Teradata’s QueryGrid come to market.

Part 7 – How Hadooped is Greenplum, the Pivotal GPDB?

Now for Greenplum & Hadoop… to continue this thread on RDBMS-Hadoop integration (Part 1Part 2, Part 3, Part 4Part 5, Part 6) I have suggested that we could evaluate integration architecture using three criteria:

  1. How parallel are the pipes to move data between the RDBMS and the parallel file system;
  2. Is there intelligence to push down predicates; and
  3. Is there more intelligence to push down joins and other relational operators?

The Greenplum interface is architecturally similar to the Teradata interface described in Part 4. Hadoop files are defined to the DBMS as external tables and there are capable parallel pipes to effectively move data from the HDFS side to GPDB. In addition Greenplum uses their Scatter-Gather method to load data into the GPDB effectively.

There is no ability to push down predicates. When a query executes all of the relevant data is sucked through the parallel pipes into the database segments for processing. This is very inefficient and there is not even the crude capability to push down processing provided by Teradata.

Finally, there is no ability to push down joins or aggregation.

Greenplum’s offering is not very advanced. To perform with Greenplum analytics data must move between the two storage layers with no intelligence to mitigate the cost.

On to the last post in the series Part 8 on SQL Server and Polybase.

How Good Is Teradata’s Intelligent Memory?

English: On December 17, 2009 30 feet chunk of...
A 30 feet chunk of the cliff below the apartment building fell to Pacific Ocean. (Photo credit: Wikipedia)

Jason asked a great question in the comment section here… he asked… does Teradata’s Intelligent Memory erode HANA’s value proposition?  Let me answer here in a more general way that is applicable to the general database space…

Every time a vendor puts more silicon between the CPU and the disk they will improve their performance (and increase their price). Does this erode HANA’s value proposition? Sure. Every advance by any vendor erodes every other vendor’s position.

To win business a new database product has to be faster than the competition. In my experience you have to be at least 30% faster to unseat the incumbent. If you are 50% faster you will win a lot of business. If you are 2x, 100%, faster you win nearly every time.

Therefore the questions are:

  • Did the Teradata announcement eliminate a set of competitors from reaching these thresholds when Teradata is the incumbent? Yup. It is very smart.
  • Does Intelligent Memory allow Teradata to reach these thresholds when they compete against another incumbent. Yup.
  • Did it eliminate HANA from reaching these thresholds when competing with Teradata? I do not think so… in fact I’m pretty sure it is not the case… HANA should still be way over the 2x threshold… but the reasons why will require a deeper dive… stay tuned.

In the picture attached a 30 foot chunk eroded… but Exadata still stands. Will it be condemned?

Note: Here is a commercial post on the SAP HANA blog site that describes at a high level why I think HANA retains a distinct architectural advantage.