Datawarehouse Appliances – Database Fog Blog

Here is an attempt to build a Price/Performance model for several data warehouse databases.

Added on February 21, 2013: This attempt is very rough… very crude… and a little too ambitious. Please do not take it too literally. In the real world Greenplum and Teradata will match or exceed the price/performance of Exadata… and the fact that the model does not show this exposes the limitations of the approach… but hopefully it will get you thinking… – Rob

For price I used some $$/Terabyte numbers scattered around the internet. They are not perfect but they are close enough to make the model interesting. I used:

Database	$$/TB
HANA	$200,000
Exadata X3	$66,000
Teradata	$66,000
Greenplum	$30,000

Of these numbers the one that may be the furthest off is the HANA number. This is odd since I work for SAP… but I just could not find a good number so I picked a big number to see how the model came out. Please, for any of these numbers provide a comment and I’ll adjust.

For each product I used the high performance product rather than the product with large capacity disks…

I used latency as a stand-in for performance. This is not perfect either… but it is not too bad. I’ll try again some other time and add data transfer time to the model. Note that I did not try to account for advantages and disadvantages that come from the software… so the latency associated with I/O to spool/work files is not counted… use of indexes and/or column store is not counted… compression is not counted. I’ll account for some of this when I add in transfer times.

I did try to account for cache hits when there is SSD cache in the configuration… but I did not give HANA credit for the work done to get most data from the processor caches instead of from DRAM.

For network latency I just assumed one round trip for each product…

For latencies I used the picture below:

The exception is that for products that use PCIe to access SSDs I cut the latency by 1/3 based on some input from a vendor. I could not find details on the latency for Teradata’s Bynet so I assumed that it is comparable with Infiniband and the newest 10GigE switches.

Here is what I came up with:

Database	Total Latency(ns)	Price/Performance	Delta
HANA	90	1,800	–
HANA (2 nodes)	1190	23,800	13x
Exadata X3	2,054,523	13,559,854	7533x
Teradata	4,121,190	27,199,854	15111x
Greenplum	10,001,190	30,003,570	16669x

I suppose that if a model seems to reflect reality then it is useful?

HANA has the lowest latency because it is in-memory. When there are two nodes a penalty is paid for crossing the network… this makes sense.

Exadata does well because the X3 product has SSD cache and I assumed an 80% hit ratio.

Teradata does a little worse because I assumed a lower hit ratio (they have less SSD per TB of data).

Greenplum does worse as they do all I/O against disks.

Note the penalty paid whenever you have to go to disk.

Let me say again… this model ignores lots of software features that would affect performance… but it is pretty interesting as a start…

I found myself wondering where did the rule-of-thumb for Exalytics that suggests that TimesTen can use 800GB of a 1TB memory space… and requires 400GB of that space for work tables leaving room for 400GB of user data… come from (it is quoted everywhere… here is an example… see question #13).

Sure enough, this rule has been around for a while in the TimesTen literature… in fact it predates Exalytics (see here).

Why is this important? The workspace per query for a TPC-A transaction is very small and the amount of time the memory is held by a TPC-A transaction is very short. But the workspace required by a TPC-H query is at least 10X the space required by a TPC-A query and the duration of a TPC-H query is at least 10X the duration of a TPC-A query. The result is at least 100X more pressure on memory utilization.

So… I suspect that the 600GB of user data I calculated here may be off by more than a little. Maybe Exalytics can support 300GB of user data or 100GB of user data or maybe 60GB?

Note that this is not bad… all of this pressure on memory is still moved to Exalytics from the Exadata RAC subsystem… where memory is dear.

As a side note… it is always important to remember that the pressure on memory is the amount of memory utilized times the duration of the utilization. This is why the data flow architecture used in modern databases like Greenplum are effective. Greenplum uses more memory per transaction but it holds the memory for less time by never (almost) writing it to disk. This is different from older database architectures like Teradata and Oracle which use disk to store intermediate results… lowering the overall amount of memory required but increasing the duration of the query. More on this here…

Sorry… this is a little geeky…

The news and blogs on Exalytics tend to say that Exalytics is an in-memory implementation with 1TB of memory. They then mention, often in the same breath, that the TimesTen product which is the foundation for Exalytics now supports Hybrid Columnar Compression which might compress your data 5X or more. This leaves the reader to conclude that an Exalytics Server can support 5TB of user data. This is not the case.

If you read the documentation (here is a summary…) a 1TB Exalytics server can allocate 800GB to TimesTen of which half may be allocated to store user data. The remainder is work space… so 400GB uncompressed is available for user data. You might now conclude that with 5X compression there is 2TB of compressed user data supported. But I am not so sure…

In Exadata Hybrid Columnar Compression is a feature of the Storage Servers. It is unknown to the RAC layer. The compression allows the Storage Servers to retrieve 5X the data with each read significantly improving the I/O performance of the subsystem. But the data has to be decompressed when it is shipped to the RAC layer.

I expect that the same architecture is implemented in TimesTen… The data is stored in-memory compressed… but decompressed when it moves to the work storage. What does this mean?

If, in a TimesTen implementation without Hybrid Columnar Compression, 400GB of work space in memory is required to support a “normal” query workload against 400GB of user data then we can extrapolate the benefits of 5X compressed data as follows:

x of user data compressed uses x/5 of memory plus x of work space in memory… all of which must fit into 800GB

This resolves to x = 667GB… a nice boost for sure… with some CPU penalty for decompressing.

So do not jump to the conclusion that Hybrid Columnar Compression in TimesTen of 5X allows you to put 5TB of user data on a 1TB Exalytics box… or even that it allows you to load 2TB into the 400GB user memory… the real number may be under 1TB.

I’ve been trying to sort through the noise around Exalytics and see if there are any conclusions to be drawn from the architecture. But this post is more about the noise. The vast majority of the articles I’ve read posted by industry analysts suggest that Exalytics is Oracle‘s answer to SAP‘s HANA. See:

But I do not see it?

Exalytics is a smart cache that holds a redundant copy of aggregated data in memory to offload aggregate queries from your data warehouse or mart. The system is a shared-memory implementation that does not scale out as the size of the aggregates increase. It does scale up by daisy-chaining Exalytics boxes to store more aggregates. It is a read-only system that requires another DBMS as the source of the aggregated data. Exalytics provides a performance boost for Oracle including for Exadata (remember, Exadata performs aggregation in the RAC layer… when RAC is swamped Exalytics can offload some processing).

HANA is a fully functional in-memory shared-nothing columnar DBMS. It does not store a copy of the data.. it stores the data. It can be updated. HANA replaces Oracle… it does not speed it up.

I’ll post more on Exalytics… and on HANA… but there is no Exalytics vs. HANA competition ahead. There will be no Exalytics vs. HANA POCs. They are completely different technologies solving different problems with the only similarity being that they both leverage the decreasing costs of RAM to eliminate the expense of I/O to disk or SSD devices. Don’t let the common phrase “in-memory” confuse you.

At this time of the year bloggers everywhere look back and reflect. Some use the timing to highlight significant achievements… and it is in the spirit that I would like to announce my choice for the best marketing in the data warehouse vendor space for 2011.

Marketing is a difficult task. Marketeers need to walk a line between reality and bull-pucky. They need to appeal to real and apparent needs yet differentiate. Often they need to generate spin to fuzz a good story by a competitors marketing or to de-emphasize some short-coming in their own product line.

Below is a picture taken on the floor of a prospect where we engaged in a competitive proof-of-concept. The customer requested that vendors ship a single rack configuration… and so we did.

But the marketing coup is that the vendor on the right, Teradata, told the customer that this is a single rack configuration and that they are in compliance. The customer has asked us if this is reasonable?

This creative marketing spin wins the 2011 award going away… against very tough competition.

I expect this marketing approach to start a trend in the space. Soon we will see warehouse appliance vendors claiming that 1TB = 50TB due to compression… or was that already done this year?

Sorry to be cynical… but I hope that the picture and story provide you with a giggle… and that the giggle helps you to start a happy holiday season.

– Rob Klopp

Hardware systems, servers and network fabric, provide the foundation upon which all shared-nothing database management systems rest. Hardware systems are a major contributor to the overall price/performance and total cost of ownership for a data warehouse platform. This blog considers the hardware strategies of EMC/Greenplum: applying the idea of using common, off-the-shelf (COTS) components to build a competitive foundation; and Teradata: developing a proprietary hardware system by tightly integrating components.

Strategies

The Teradata hardware strategy is simple to describe. They expend R&D dollars to couple low-level technologies into a tightly integrated system. Their servers are custom-designed within a set of guidelines that allows both the LINUX and Microsoft Windows operating systems to execute there. Their network fabric is highly proprietary, using cycles within the fabric to offload sort/merge data processing from the server CPU.

In other words, Teradata believes that the time and effort required to engineer an integrated proprietary offering will improve the performance of their offering enough to offset the cost.

EMC and Greenplum have taken a different approach. They have elected a strategy that leverages off-the-shelf servers offered by hardware vendors like Dell or HP, and network switches from vendors like Brocade, Arista, and Cisco. They have elected to expend few dollars on hardware design and development and to leverage the R&D investments made by these other vendors. In other words, Greenplum believes that the advantages in price and performance provided by using off-the-shelf hardware provides a sustainable advantage.

Price

The lower costs associated with Greenplum’s strategy clearly provide an advantage. Greenplum does not have to expend to design and manufacture custom hardware. The manufacturing costs may not be significant, but the staff costs required by the Teradata strategy must affect the price. Clearly the Greenplum strategy provides an advantage on the price side.

Performance

The Teradata strategy has to be about performance… so lets speculate:

How much of a performance increase might their integration provide on the server-side?
How much of a performance increase might their integration provide on the network side?

In the days before there was a microprocessor based enterprise server market, Teradata could gain substantially here. Microprocessors were built for personal computing and not designed for the high-availability and high-performance requirements in an enterprise. Teradata had much to gain from building rather than buying server.

But today, there is little to gain from a highly customized design. The requirement to run standard LINUX and Windows operating systems limit their ability to innovate and the resulting servers have to be very similar to those built for off-the-shelf enterprise servers. There is little or no performance advantage here.

On the network side, there once was a distinct advantage to Teradata’s ByNet. It was both faster than available off-the shelf switches and it offloaded cycles from the under-powered CPU. Today, however, there are plenty of cheap, fast switches… so the speed advantage has disappeared. Worse still, the introduction of multi-core CPUs have eliminated the advantage of the in-the-switch sort/merge that makes ByNet unique. CPU is inexpensive these days.

The bottom line: it is unclear if the Teradata hardware strategy affords them a performance advantage.

Cost of Ownership

An argument could be made that supporting COTS hardware is inherently less expensive than supporting a Teradata cluster. But there is a more substantial savings that is clear.

Every 2-3 years, as newer Teradata technology obsoletes your currently installed cluster the value of the current hardware goes to zero and the cost of ownership goes up significantly. The costs of this are especially high when you are required to add several nodes to accommodate growth as Teradata refreshes their technology. You may have to buy servers that are already obsolete.

With Greenplum, your current cluster is built from general-purpose servers that are re-purposed with ease. In fact, since the nodes in a Greenplum cluster are usually high-end servers, customers often cycle new technology into their data warehouse and cycle the old servers out into their server farm. The result is a higher performance warehouse and full use of all of the server technology.

A Final Word

The words “proprietary hardware” are sometimes thrown around as an insult. But Teradata’s proprietary approach is based on the belief that a tightly integrated configuration adds benefit to offset the costs. Greenplum believes that today the enterprise server and the network switch vendors have matured their products to the point where off-the-shelf technology can match or exceed the performance of custom hardware… at a significantly reduced cost. You may have an opinion or you may wait to see how the benchmarks, the proof-of-concepts, and the market decide… but its interesting to understand the differing approaches.

After years of tuning data warehouses, queries, data loads, and BI applications, I give up. In the long run it is not really possible anyway… and better still… no longer necessary. A better approach is to build your database and your hardware infrastructure to scan fast and smart. So here’s a blog on why it’s impossible to tune a warehouse… and on why it’s no longer necessary.

My argument against tuning is easy to grasp. By definition a data warehouse serves many constituencies: Marketing and Finance and Customer Support and Distribution; and these business units will each access the data from their unique perspective following a unique path through the warehouse. A designer cannot lay out the data effectively to support each access path… cannot index every column, cannot map more than one zone, cannot replicate the data again and again with aggregates and materialized views, cannot cache the entire warehouse. Even if you get it right changing business requirements will fracture your approach; or worse, the design will not support new queries and constrain your business.

Many readers will be skeptical at this point… suggesting that the software and hardware to eliminate tuning does not exist. So let’s build a model and test the state of the art.

Let us imagine and model a 25TB data warehouse with a 20TB fact table that holds 25 months of daily facts partitioned by day. The fact table is 100 columns wide and we will model two queries that reference 20 of the columns… One that touches every row and one that is date constrained and touches only 14 days of data.

Here are some hardware specs. A server with a single I/O controller can read about 1.5GB/second into the database. With two controllers can read around 2.7GB/sec. Note that these are not the theoretical limits of the hardware but real measurements taken from the current hardware on the market: Dell, HP, and SUN/Oracle.

Now let’s deploy our imaginary warehouse on a strong state of the market multi-core server with, to be conservative, a single controller. This server would scan our fact table in around 222 minutes. Partition elimination would allow the date constrained query to complete in just over 4 minutes. Note that these imaginary queries ignore the effort to join and/or aggregate data. Later I’ll have more to say on this…

If we deploy our warehouse on a shared-nothing cluster with 20 nodes the aggregate I/O bandwidth increases to 30GB/sec and the execution times for our two queries improves to 11 minutes and 12 seconds, respectively. This is the power of parallel I/O.

Now we have to factor in compression. Typical row-based compression yields approximately a 2.5x result… columnar compression varies wildly… But let’s assume 25X in our model. There is a cost to be paid to decompress the data… But since it is paid by everyone and CPU is a relatively inexpensive commodity, we’ll ignore it in our model.

For 2.5X row-based compression our big query now completes in 4.4 minutes and the smaller query completes in 4.8 seconds.

The model is a little more complicated when we throw in columnar compression so let’s consider two columnar models. For an implementation such as Exadata we get the benefit of columnar compression but not the benefit of columnar projection. 25X hybrid columnar compression will execute our two scans in 26 seconds and .5 seconds. Now we are talking! A more complete columnar implementation will only touch the columns required by our query, 20% of the data, providing another 5X improvement. This drops our scan queries to 5.2 seconds and .1 second, respectively. Smoking fast. Note that the more simple columnar compression approach will provide the same fast response when every column is touched and the more complex approach will slow down in that case… so you can make the trade off in your shop as required.

Let me remind you again… This is a full scan of 20TB with no tricks: no indices, no pre-aggregation, no materialized views, no cache and no flash, no pre-sorted zone maps. All that is required is a parallel implementation with partitioning, compression, and a columnar table type… and this implementation works. It is robust.

A note on joins… It is more difficult to model joins… and I’ll attempt a simple model in another post. But you can see that this fast scan approach has solved the costly part of the problem using parallel processing… and you can imagine that a shared-nothing massively parallel approach to joins may hold the key.

Category: Datawarehouse Appliances

Price/Performance of HANA, Exadata, Teradata, and Greenplum

More on Exalytics Capacity…

More on Exalytics: How much user data fits?

Exalytics vs. HANA: What are they thinking?

The Best Data Warehouse Spin of 2011

Greenplum and Teradata: Simliar Architecture, Different Strategies

Stop Tuning and Scan…

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: