Part 2: Evaluating Exadata… Does it stack up with RDBMS-Hadoop systems?

In my blog yesterday (Part 1) I suggested that we could evaluate RDBMS-Hadoop integration architecture using three criteria:

  1. How parallel are the pipes to move data between the RDBMS and the parallel file system;
  2. Is there intelligence to push down predicates; and
  3. Is there more intelligence to push down joins and other relational operators?

But Exadata is a split RDBMS with a parallel file system backing it… how does it measure up by these criteria?

There are effective parallel pipes between the Oracle RAC RDBMS and the Exadata Storage Subsystem… so Exadata passes the first test. Further, Exadata is smart about pushing scan and projection both down to the Storage layer.

Unfortunately there is a fairly severe imbalance between the number of nodes on the RAC side and the number of nodes on the Storage side and this creates a bottleneck. We cannot give Exadata full marks here… but as far as parallel pipes goes it stacks up pretty well.

The ability to push down predicates goes a long way towards solving this as the predicate push-down reduces the amount of data that has to move over the bottleneck. But in every data warehouse there will be queries that return lots of rows from the early execution steps… and Exadata cannot join data in the Storage Subsystem so it tries to pull data up sparingly and push down semi-joins whenever possible… it just cannot be done in every case (Note: in Exadata POCs Oracle will try to ensure that no queries are included that pull lots of data up to the RAC layer… and competitors will try to include queries that expose this weakness…).

So… Oracle also includes some intelligence to push some data down to reduce data movement. There is no way to choose to move data from the RAC layer to the Storage Subsystem and execute the query there… the Storage Subsystem can only scan and project… so again we cannot give Exadata full marks… but it is pretty smart as you will see when we start looking at alternative implementations.

Finally, Exadata cannot effectively split a single query plan across both layers… so no marks at all here.

So Exadata is pretty good… but it has weak spots that will be severe for an important set of DW queries in any implementation.

Now, on to the Hadoop implementations… to Part 3

Pivotal GPDB and the 2013 Forrester Wave EDW Report

The last wave of the summer, 2008
A small wave. (Photo credit: Боби Димитров)

Forrester regularly provides fodder for bloggers when they report on the EDW space (see Curt Monash’s review of their last report here). They have a 2013 report out now that is quite mysterious (see here).

They report that Pivotal is up there with the leading EDW vendors and positioned to move further up.

Here is the mystery. If you go to the Pivotal site and search on “data warehouse” you get ten hits:

  • Eight talk about analytic data warehouses, not enterprise data warehouses;
  • One talks about using Hive as a data warehouse; and
  • One talks about data and sandboxing.

There are no hits on the term “enterprise data warehouse” and one hit on the term “EDW” which refers to why you should move data off of the EDW to an analytic platform.

As I’ve pointed out… Pivotal does not market into the EDW space. They are not developing product for that space.  EDW is not part of their product strategy.

The fact that their product is a capable platform for an EDW is worth noting… and readers of this blog should consider GPDB, aka Greenplum, for EDW projects. But you should be fully aware of the risk that Pivotal is not really backing this use case.

For an analyst to suggest that Pivotal has an industry-leading strategy in a space that they are not pursuing at all is very odd.

Who is How Columnar? Exadata, Teradata, and HANA – Part 2: Column Processing

In my last post here I suggested that there were three levels of maturity around column orientation and described the first level, PAX, which provides columnar compression. This apparently is the level Exadata operates at with its Hybrid Columnar Compression.

In this post we will consider the next two levels of maturity: early materialized column processing and late materialized column processing which provide more I/O avoidance and some processing advantages.

In the previous post I suggested a five-column table and depicted each of those columns oriented on disk in separate file structures. This orientation provides the second level of maturity: columnar projection.

Imagine a query that selects only 4 of the five columns in the table leaving out the EmpFirst column. In this case the physical structure that stores EmpFirst does not have to be accessed; 20% less data is read, reducing the I/O overhead by the same amount. Somewhere in the process the magic has to be invoked that returns the columns to a row orientation… but just maybe that overhead costs less than the saving from the reduced I/O?

Better still, imagine a fact table with 100 columns and a query that accesses only 10 of the columns. This is a very common use case. The result is a 9X reduction in the amount of data that has to be read and a 9X reduction in the cost of weaving columns into rows. This is columnar projection and the impact of this far outweighs small advantage offered by PAX (PAX may provide a .1X-.5X, 10%-50%, compression advantage over full columnar tables). This is the advantage that lets most of the columnar databases beat Exadata in a fair fight.

But Teradata and Greenplum stop here. After data is projected and selected the data is decompressed into rows and processed using their conventional row-based database engines. The gains from more maturity are significant.

The true column stores read compressed columnar data into memory and then operate of the columnar data directly. This provides distinct advantages:

  • Since data remains compressed DRAM is used more efficiently
  • Aggregations against a single column access data in contiguous memory improving cache utilization
  • Since data remains compressed processor caches are used more efficiently
  • Since data is stored in bit maps it can be processed as vectors using the super-computing instruction sets available in many CPUs
  • Aggregations can be executed using multiplication instead of table scans
  • Distinct query optimizations are available when columnar dictionaries are available
  • Column structures behave as built-in indexes, eliminating the need for separate index structures

These advantages can provide 10X-50X performance improvements over the previous level of maturity.

Summary

  • Column Compression provides approximately a 4X performance advantage over row compression (10X instead of 2.5X). This is Column Maturity Level 1.
  • Columnar Projection includes the advantages of Column Compression and provides a further 5X-10X performance advantage (if your queries touch 1/5-1/10 of the columns). This is Column Maturity Level 2.
  • Columnar Processing provides a 10X+ performance improvement over just compression and projection. This is Column Maturity Level 3.

Of course your mileage will vary… If your workload tends to touch more than 80% of the columns in your big fact tables then columnar projection will not be useful… and Exadata may win. If your queries do not do much aggregation then columnar processing will be less useful… and a product at Level 2 may win. And of course, this blog has not addressed the complexities of joins and loading and workload management… so please do not consider this as a blanket promotion for Level 3 column stores… but now that you understand the architecture I hope you will be better able to call BS on the marketing…

Included is a table that outlines the maturity level of several products:

Product

Columnar Maturity Level

Notes

Teradata

2

 Columnar tables, Row Engine
Exadata

1

 PAX only
HANA

3

 Full Columnar Support
Greenplum

2

 Columnar tables, Row Engine
DB2

3

 BLU Hybrid
SQL Server

2

 I think… researching…
Vertica

3

 Full Columnar Support
Paraccel

3

 Full Columnar Support
Netezza

n/a

 No Columnar Support
Hadapt

2

 I think… researching…

Who is How Columnar? Exadata, Teradata, and HANA – Part 1: Column Compression

Basic Table

There are three forms of columnar-orientation currently deployed by database systems today. Each builds upon the next. The simplest form uses column-orientation to provide better data compression. The next level of maturity stores columnar data in separate structures to support columnar projection. The most mature implementations support a columnar database engine that performs relational algebra on column-oriented data. Let me explain…

Imagine a simple table with 1M rows… with the schema and the first several rows depicted in Figure 1. Conceptually, a row-orientation deploys data on disk and in-memory as depicted in Figure 2 and a column-orientation deploys data on disk and in-memory as depicted in Figure 3. The actual deployment may be significantly different, as we will see.

Note that I am going to throw out some indicative numbers around compression. I will suggest that applying compression to rows will provide from 1.5X to 3.5X compression with and average of 2.5X… and that applying compression to columns provides from 3X compression to 50X compression with the average around 10X. These are supportable numbers but the compression you see for any specific data set will vary.

A row oriented block

There are two powerful compression techniques that individually or combined provide most of the benefits: dictionary-encoding and run-length encoding. For the purposes of this blog I will describe only dictionary-encoding; and I will do an injustice to that by explaining it only briefly and conceptually… just enough that you get the idea.

Five column oriented blocks

Further compression is possible by encoding runs of similar values to a value plus the number of times it repeats so that the bit stream 0000000000000000 could be represented as 01111 (0 occurs 24 times).

You can now also start to see why column-orientation compresses better that a row-orientation. In the row block above there is little opportunity to encode whole rows in a dictionary… the cardinality of rows in a table is too high (note that this may not be true for a dimension table which is, in-effect, a dictionary). There is some opportunity to encode the bit runs in a row… as noted, you can expect to get 2X-2.5X from row compression for a fact table. Column-orientation allows dictionary encoding to be applied effectively to low cardinality columns… and this accounts for the advantage there.

Col Dict

Dictionary-encoding reduces data to a compressed form by building a map that provides a translation for each cardinal value in the table to a tightly compressed form. For example, if there are indeed only three values possible in the DeptID field above then we might build a dictionary for that column as depicted in Figure 4. You can see… by encoding and storing the data in the minimal number of bits required, significant storage reduction is possible… and the lower the cardinality of a column the smaller the resulting bit representation.

Note that there is no free lunch here. There is a cost to be paid in CPU cycles to compress data and to decompress data… but for a read-optimized data warehouse database compression is cool. Exactly how cool depends on the level of maturity and we will get to that as we go.

It is crucial to remember that column store databases are relational. They ingest rows and emit rows and perform relational algebra in-between. So there has to be some magic that turns tuples into columns and restores them from columns. The integrity of a row has to persist. Again I am going to defer on the details and point you at the references below… but imagine that for each row a bit map is built that, for each column, points to the entry in the column dictionary with the proper value.

There is no free lunch to column store… no free lunch anywhere, it seems. Building this bit map on INSERT is very expensive, and modifying it on UPDATE is fairly expensive. This is why column-orientation is not suitable for OLTP workloads without some extra effort. But the cost is amortized by significant performance gains for READs.

One last concept: since peripheral I/O reads blocks imagine two approaches to column compression: one applies the concepts above to an entire table breaking each table into separate column-oriented files that may be read separately; and one which applies the concepts individually to each large block in a table file. Imagine, in the first case that Figure 2 represents a picture of the first few rows in our 1M-row table. Imagine, in the second case, that Figure 2 represents the rows in one block of data re-oriented into columns.

This second, block-oriented, approach is called PAX, and it is more-or-less the approach used by Exadata. In the PAX approach each block contains its own mini-column store and a dictionary for dictionary encoding with the values in the block. Because the cardinality for columns within a block will often be less than for an entire table there are some distinct advantages to PAX compression. Compression will be higher by more than a little than for full table columnar compression.

When Exadata reads a block from disk it decompresses the data back into rows and performs row-oriented processing to complete the query. This is very cool for Exadata… a great feature. As noted, column compression may be 4X better than row compression on the average. This reduces the storage requirements and reduces the overhead of I/O by 4X… and this is a very significant improvement. But Exadata stops here. It is not a columnar-oriented DBMS and it misses the significant advantages that come from the next two levels of column-orientation… I’ll take these up in the next post.

To be clear, all of the databases that use these more mature techniques: Teradata, HANA, Greenplum, Vertica, Paraccel, DB2, and SQL Server gain from columnar compression even if the PAX approach provides some small advantage as a compression technique.

It is also worth noting that Teradata does not gain as much as others in this regard. This is not because of poor design, rather it is due to the fact that, to their credit, Teradata implemented a Teradata-specific dictionary-based compression scheme long ago. Columnar compression let others catch up to what Teradata has offered for years.

And before you ask… Netezza offers no columnar orientation… preferring to compress deeply using an FPGA co-processor to decompress… and to reduce I/O using zone maps rather than the using the mid-level column projection techniques in the next blog here.

The Fog is Getting Thicker…

English: San Francisco in fog
San Francisco in fog (Photo credit: Wikipedia)

I renamed this so that Teradata folks would not get here so often… its not really about Intelligent Memory… just prompted by it. The post on Intelligent Memory is here. – Rob

Two quick comments on Teradata’s recent announcement of Intelligent Memory.

First… very very cool. More on this to come.

Next… life is going to become very hard for my readers and for bloggers in this space. The notion of an in-memory database is becoming rightfully blurred… as is the notion of column store.

Oracle blurs the concepts with words like “database in-memory” and “hybrid column compression” which is neither an in-memory database or a column store.

Teradata blurs the concept with a strong offering that uses DRAM as a block-IO device (like the old RAM-disks we used to configure on our PCs).

Teradata and Greenplum blur the idea of a column store by adding columnar tables over their row store database engines.

I’m not a fan of the double-speak… but the ability of companies to apply the 80/20 rule to stretch their architectures and glue on new advanced technologies is a good thing for consumers.

But it becomes very hard to distinguish the products now.

In future blogs I’ll try to point out differences… but we’ll have to go a little deeper into the Database Fog.

How Good Is Teradata’s Intelligent Memory?

English: On December 17, 2009 30 feet chunk of...
A 30 feet chunk of the cliff below the apartment building fell to Pacific Ocean. (Photo credit: Wikipedia)

Jason asked a great question in the comment section here… he asked… does Teradata’s Intelligent Memory erode HANA’s value proposition?  Let me answer here in a more general way that is applicable to the general database space…

Every time a vendor puts more silicon between the CPU and the disk they will improve their performance (and increase their price). Does this erode HANA’s value proposition? Sure. Every advance by any vendor erodes every other vendor’s position.

To win business a new database product has to be faster than the competition. In my experience you have to be at least 30% faster to unseat the incumbent. If you are 50% faster you will win a lot of business. If you are 2x, 100%, faster you win nearly every time.

Therefore the questions are:

  • Did the Teradata announcement eliminate a set of competitors from reaching these thresholds when Teradata is the incumbent? Yup. It is very smart.
  • Does Intelligent Memory allow Teradata to reach these thresholds when they compete against another incumbent. Yup.
  • Did it eliminate HANA from reaching these thresholds when competing with Teradata? I do not think so… in fact I’m pretty sure it is not the case… HANA should still be way over the 2x threshold… but the reasons why will require a deeper dive… stay tuned.

In the picture attached a 30 foot chunk eroded… but Exadata still stands. Will it be condemned?

Note: Here is a commercial post on the SAP HANA blog site that describes at a high level why I think HANA retains a distinct architectural advantage.

Memory Trends and HANA

If the Gartner estimates here are correct… then DRAM prices will fall 50% per year per year over the next several years… and then in 2015 non-volatile RAM (see the related articles below) will become generally available.

It has been suggested that memory prices will fall slower than data warehouses will grow (see here). That does not seem to be the case… and the combination of cheaper memory and then non-volatile memory will make in-memory databases like SAP HANA ever more compelling. In fact, as I predicted… and to their credit, Teradata is adding more memory (see here).

Related articles

Hadoop and the EDW

Squeeze If You Feel Pain
Squeeze If You Feel Pain (Photo credit: Artotem)

Cloudera and Teradata have jointly published a nice paper here that presents an interesting perspective of how Hadoop and an EDW play together. Simply put, Hadoop becomes the staging area for “raw data streams” while the EDW stores data from “operational systems”. Hadoop then analyzes the raw data and shares the results with the EDW. Two early examples provided suggest:

  • Click stream data is analyzed to identify customer preferences that are then shared with the EDW. Note that the amount of data sent from Hadoop to the EDW would be fairly small in this case.
  • Detailed data is stored on Hadoop to build analytic models. The models are then transferred to the EDW to score sales activity data. Note that in this scenario the scored activity detail has to live in Hadoop to perform modeling… but it is unclear why it has to live in the EDW as well. I presume that scoring takes place on the EDW instead of in Hadoop for performance reasons… but maybe the data, the modeling, and the scoring should just take place in Hadoop?

The paper then positions Hadoop as an active archive. I like this idea very much. Hadoop can store archived data that is only accessed once a month or once a quarter or less often… and that data can be processed directly by Hadoop programs or shared with the EDW data using facilities such as Teradata’s SQL-H, or Greenplum‘s External Hadoop tables (not by HAWQ, though… see here), or by other federation engines connected to HANA, SQL Server, Oracle, etc.

But think about the implications on how much data has to stay in your EDW if you archive everything older than 90, or even 180, days to Hadoop. The EDW shrinks significantly and the TCO advantage to your Enterprise will be significant. This is very cool.

There is one item in the paper I disagree with, though… and another statement that I think has a very short shelf-life.

The paper suggests that indexes, materialized views, aggregate join indexes, and other tweaks are what differentiates an EDW. I believe that reliance on these structures make for a fragile EDW where only some queries can run fast. I like Teradata better when it just robustly scans fast and none of these redundant-data tuning artifacts are required (more here and here). Teradata was the original scan-fast DBMS… it is more than capable.

The paper also suggests that an EDW maintains value by including a sophisticated cost-based optimizer that uses data demographic statistics to identify an optimal query execution plan. I agree that Hadoop lacks this now… but there are several projects like Cloudera Impala that will eliminate this gap in the near term.

I believe that if you read between the lines you will see more evidence to support my belief (here) that Hadoop will squeeze the EDW vendors hard… and that the size of a squeezed EDW will then fit in an in-memory database.

Wondering About Netezza… and A Teradata Prediction Comes True…

Magic 8 Ball
Magic 8 Ball (Photo credit: Wikipedia)

If you missed the tweet… 2+ years ago I predicted here that Teradata would go away from ByNet… and lo and behold they did (see here).

In the same post I predicted that Netezza would go away from FPGAs. This has not come to pass. But I wonder if it might… or if there is a bigger change possible?

With the recent announcements of DB2 BLU and column store I suspect that DB2 will outperform Netezza when the query mix does not fall directly in Netezza’s sweet spot.

I also have a suspicion that the Netezza architecture, with its execution engine split across two different processors, is just hard to engineer. I cannot think of another reason features come so slowly there. Why, for example, is there no columnar support? Greenplum built it on the same Postgres base with less than a handful of engineers in a year. Teradata now offers columnar tables as well.

These concerns… combined with some previous notes on Netezza add up as follows:

  1. FPGAs no longer provide a performance advantage (per my link above)
  2. FPGAs limit the ability of the DBMS to use more cores (see here)
  3. FPGAs limit the ability of the DBMS to manage workload (see here… and especially the comments)
  4. FPGAs and having a 2-phase split execution environment limits the ability to extend and enhance the code base (a new conjecture)
  5. Zone Maps and CBTs provide a limited ability to solve for a wide range of queries… they are just an index (see here)
  6. DB2 Column Store provides a performance boost equal to or greater than zone maps and CBTs (a new conjecture)
  7. DB2 BLU provides a performance boost well in excess of what Netezza can provide (see here)

The Netezza architecture with FPGAs provided a distinct advantage in 2000 when CPU was the scarce commodity. But multi-core systems and the advance of Moore’s Law soon made processing abundant… and the advantage of FPGA co-processing diminished. Without a distinct advantage the split execution architecture became a disadvantage… and the complexity of that design kept Netezza from developing the advances on top of the Postgres base that were very easy to develop by others.

Architecture counts… and DB2 is a strong product. If, as I suspect, DB2 is now a more capable product than Netezza… I wonder what path IBM may take?

MPP, IMDB and Moore’s Law

In the post here I listed the units of parallelism (UoP) applied by various products on a single node. Those findings are summarized in the table below.

Product

Version/HW

Cores per Node

UoP per Node

Notes

Teradata EDW 6700H

16

32

Uses hyper-threads.
Greenplum DCA UAP Edition

16

8

Recommends 1 Segment for each 2 cores. Maybe some multi-threading per query so it could be greater than 8 on the average… and could be 16 with hyper-threads… but not more than 32 for sure.
Exadata X3

12

12-24

Maybe only 12… cannot find if they use hyper-threads.
Netezza Striper

16

16

May use hyper-threads but limited by 16 FPGAs.
HANA Any Xeon E7-4800

40

80

Uses hyper-threads.

A UoP is defined as the maximum number of  instructions that can execute in parallel on a single node for a single query. Note that in the comments there was a lively debate where some readers wanted to count threads or processes or slices that were “active” but in a wait state. Since any program can start threads that wait I do not count these as UoP (later we might devise a new measure named units of waiting that would gauge the inefficiency in any given design by measuring the amount of waiting around required to keep the CPUs fed… maybe the measure would be valuable in measuring the inefficiency of the queue at your doctor’s office or at any government agency).

On some CPUs vendors such as Intel allow two threads to execute instructions in-parallel in a core. This is called hyper-threading and, if implemented, it allows for two UoP on a single core. Rather than constantly qualify the statements for the rest of this blog when I refer to cores I mean to imply hyper-threads.

The lively comments in the blog included some discussion of the sort of techniques used by vendors to try and keep the cores in the CPU on each node fed. It is these techniques that lead to more active I/O streams than cores and more threads than cores.

For several years now Intel and the other CPU manufacturers have been building ever more cores into their products. This has allowed them to continue the trend known as Moore’s Law. Multi-core is now a fact of life and even phones, tablets, and personal computers have multi-core chips.

But if you look at the table  you can see that the database products above, even the newly announced products from Teradata and Netezza, are using CPUs with relatively few cores. The high-end Intel processors have 40 cores and the databases, with the exception of HANA, use Intel products with at most 16 cores. Further, Intel will deliver Ivy Bridge processors to the market this year with 120 cores. These vendors know this… yet they have chosen to deliver appliances with the previous generation CPUs. You might ask why?

I believe that there is an architectural reason for this (also a marketing reason covered here).

It is very hard to keep 80 cores fed with data when you have to perform block I/O. It will be nearly impossible to keep the 240 cores coming with Ivy Bridge fed. One solution is to deploy more nodes in a shared-nothing configuration with fewer cores per node… but this will be expensive requiring more power, floorspace, administration, etc. This is the solution taken by most of the vendors above. Another solution is to solve the problem without I/O with an in-memory database (IMDB) architecture. This is the solution taken by SAP with HANA.

Intel, IBM, and the rest will continue to build out using the multi-core approach for the foreseeable future. IMDB products will be able to fully utilize this product. Other products will struggle to take full advantage as we can see already… they will adapt and adjust and do what they can… but ultimately IMDB will win, I think… because there is just no other way to keep up as Moore’s Law continues to drive technology… no other way to feed the CPU engines with data fast enough.

If I am right then you will see more IMDB offerings from more vendors, including from the major vendors in the near future (note that this does not include the announcements of “database in memory” from Oracle which is not by any measure an in-memory database).

This is the underlying reason why Donald Feinberg (and Timo Elliott) are right on here. Every organization will be running in-memory… and soon.

%d bloggers like this: