A Short DW DBMS Market History: HANA, Oracle, DB2, Netezza, Teradata, & Greenplum

Here is a quick review of tens years of data warehouse database competition… and a peek ahead…

Maybe ten years ago Netezza shook up the DW DBMS market with a parallel database machine that could compete with Teradata.

About six years ago Greenplum entered the market with a commodity-based product that was competitive… and then added column store to make it a price/performance winner.

A couple of years later Oracle entered with Exadata… a product competitive enough to keep the Oracle faithful on an Oracle product… but nothing really special otherwise.

Teradata eventually added a columnar feature that matched Greenplum… and Greenplum focussed away from the data warehouse space. Netezza could not match the power of columnar and could not get there so they fell away.

At this point Teradata was more-or-less back on top… although Greenplum and the other chipped away based on price. In addition, Hadoop entered the market and ate away at Teradata’s dominance in the Big Data space. The impact of Hadoop is well documented in this blog.

Three-to-four years ago SAP introduced HANA and the whole market gasped. HANA was delivering 1000X performance using columnar formats, memory to eliminate I/O, and bare-metal techniques that effectively loaded data into the processor in full cache lines.

Unfortunately, SAP did not take advantage of their significant lead in the general database markets. They focussed on their large installed base of customers… pricing HANA in a way that generated revenue but did not allow for much growth in market share. Maybe this was smart… maybe not… I was not privy to the debate.

Now Oracle has responded with in-memory columnar capability and IBM has introduced BLU. We might argue over which implementation is best… but clearly whatever lead SAP HANA held is greatly diminished. Further, HANA pricing makes it a very tough sell outside of its implementation inside the SAP Business Suite.

Teradata has provided a memory-based cache under its columnar capabilities… but this is not at the same level of sophistication as the HANA, 12c, BLU technologies which compute directly against compressed columnar data.

Hadoop is catching up slowly and we should expect that barring some giant advance from the commercial space that they will reach parity in the next 5 years or so (the will claim parity sooner… but if we require all of the capabilities offered to be present there is just no way to produce mature software any faster than 5 years).

Interestingly there is one player who seems to be advancing the state of the art. Greenplum has rolled out a best-in-class optimizer with Orca… and now has acquired Quickstep which may provide the state-of-the-art in bare metal columnar computing. When these come together Greenplum could once again bounce to the top of the performance, and the price/performance, stack. In addition, Greenplum has skinnied down and is running on an open source business model. They are very Hadoop-friendly.

It will be interesting to see if this open-source business model provides the revenue to drive advanced development… there is not really a “community” behind Greenplum development. It will also be interesting to see if the skinny business model will allow for the deployment of an enterprise-level sales force… but it just might. If Pivotal combines this new technology with a focus on the large EDW market… they may become a bigger player.

Note that was sort of dumb-luck that I posted about how Hadoop might impact revenues of big database players like Teradata right before Teradata posted a loss… but do not over think this and jump to the conclusion that Teradata is dying. They are the leader in their large space. They have great technology and they more-or-less keep up with the competition. But skinnier companies can afford to charge less and Teradata, who grew up in the days of big enterprise software, will have to skinny down like Greenplum. It will be much harder for Teradata than it was for Greenplum… and both companies will struggle with profitability for a while. But it is these technology and market dynamics that give us all something to think about, blog about, and talk about over beers…

How DBMS Vendors Admit to an Architectural Limitation: Part 2 – Teradata Intelligent Memory

This is the second post (see Part 1 here) on how vendors adjust their architecture without admitting that the previous architecture was flawed. This time we’ll consider Teradata and in-memory….

When SAP HANA appeared Teradata went on the warpath with a series of posts and statements that were pointed but oddly miscued (see the references below). According to the posts in-memory was unnecessary and SAP was on a misguided journey.

Then Teradata announced Intelligent Memory and in-memory was cool. This is pretty close to an admission that SAP was right and Teradata was wrong. The numbers which drove Teradata here are compelling… 100K-200K ns to access an SSD device or 100 ns to access DRAM… a 1000X reduction… and the latency to disk is 100X worse than SSD.

Intelligent Memory was announced shortly after the release of Teradata’s columnar table type. Column-orientation is important because you need a powerful approach to compression to effectively use an expensive memory resource… and columnar provides this. But Teradata, like Greenplum, extended a row-based engine to support columns in order to get to market quick… they hoped to get 80% of the effectiveness of in-memory with only 20% of the engineering effort. The other 20% comes when you develop a new engine that fully exploits the advantages of a columnar architecture. These advanced exploits allow HANA, DB2 BLU, and Oracle 12c to execute directly on columnar data thereby avoiding decompression, fully utilizing the processor caches, and allowing sets to be operated on by super-computing vector-processing instructions. In fact, Teradata really applied the 50/20 rule… they gained 50%, maybe only 40%, of the benefits with their columnar and Intelligent Memory features… but it was easy to deploy what is in-effect an in-memory cache over their existing relational engine.

Please don’t jump to the wrong conclusion here… Intelligent Memory is a strong product. If you were to put hot data in memory, cool data in Teradata-on-SSD-or-Disk, and cold data in Hadoop and manage them as one EDW you could deploy a very cost-effective platform (see here).

Still, Teradata with Intelligent Memory is not likely to compete effectively against HANA, BLU, or 12c for raw performance… so there will be some marketing foam attached and an appeal for Teradata shops to avoid database apostasy and stick with them. You can see some of the foam in the articles below.

A quick aside here… generally a DBMS should win or lose based on price/performance. The ANSI standard makes a products features nearly, not completely but nearly, irrelevant. If you cannot win on price/performance then you blow foam. When any vendor starts talking about things like TCO you should grab your wallets… it is an appeal to foaminess to hide a weakness. I’m not calling out Teradata here… this is general warning that applies to every software vendor.

Intelligent Memory is a smart move. While it may not win in a head-to-head POC… it will be close-ish… close enough to keep the congregation in their pews. As readers know, I am not a big fan of technical religiosity… being a “Teradata-shop” is lazy… engineers we should pick the best solution and learn it. The tiered approach mentioned three paragraphs up is a good solution and non-Teradata shops should be considering it… but Teradata shops should be open to new technology as well. Still, we should pick new technology with a sensitivity to the cost of a migration… and in many cases Intelligent Memory will save business for Teradata by getting just close enough to make migration a bad trade-off. This is why it was so smart.

Back to the theme of these posts… Teradata back-tracked on the value of in-memory… and in the process admitted-without-admitting a shortcoming in their architecture. So it goes…

Next we will consider whether you should be building data warehouses on z/OS using DB2 or the DB2 Analytics Accelerator aka Netezza.

References

Database Fog Blog

Other