Who is How Columnar? Exadata, Teradata, and HANA – Part 2: Column Processing

In my last post here I suggested that there were three levels of maturity around column orientation and described the first level, PAX, which provides columnar compression. This apparently is the level Exadata operates at with its Hybrid Columnar Compression.

In this post we will consider the next two levels of maturity: early materialized column processing and late materialized column processing which provide more I/O avoidance and some processing advantages.

In the previous post I suggested a five-column table and depicted each of those columns oriented on disk in separate file structures. This orientation provides the second level of maturity: columnar projection.

Imagine a query that selects only 4 of the five columns in the table leaving out the EmpFirst column. In this case the physical structure that stores EmpFirst does not have to be accessed; 20% less data is read, reducing the I/O overhead by the same amount. Somewhere in the process the magic has to be invoked that returns the columns to a row orientation… but just maybe that overhead costs less than the saving from the reduced I/O?

Better still, imagine a fact table with 100 columns and a query that accesses only 10 of the columns. This is a very common use case. The result is a 9X reduction in the amount of data that has to be read and a 9X reduction in the cost of weaving columns into rows. This is columnar projection and the impact of this far outweighs small advantage offered by PAX (PAX may provide a .1X-.5X, 10%-50%, compression advantage over full columnar tables). This is the advantage that lets most of the columnar databases beat Exadata in a fair fight.

But Teradata and Greenplum stop here. After data is projected and selected the data is decompressed into rows and processed using their conventional row-based database engines. The gains from more maturity are significant.

The true column stores read compressed columnar data into memory and then operate of the columnar data directly. This provides distinct advantages:

  • Since data remains compressed DRAM is used more efficiently
  • Aggregations against a single column access data in contiguous memory improving cache utilization
  • Since data remains compressed processor caches are used more efficiently
  • Since data is stored in bit maps it can be processed as vectors using the super-computing instruction sets available in many CPUs
  • Aggregations can be executed using multiplication instead of table scans
  • Distinct query optimizations are available when columnar dictionaries are available
  • Column structures behave as built-in indexes, eliminating the need for separate index structures

These advantages can provide 10X-50X performance improvements over the previous level of maturity.

Summary

  • Column Compression provides approximately a 4X performance advantage over row compression (10X instead of 2.5X). This is Column Maturity Level 1.
  • Columnar Projection includes the advantages of Column Compression and provides a further 5X-10X performance advantage (if your queries touch 1/5-1/10 of the columns). This is Column Maturity Level 2.
  • Columnar Processing provides a 10X+ performance improvement over just compression and projection. This is Column Maturity Level 3.

Of course your mileage will vary… If your workload tends to touch more than 80% of the columns in your big fact tables then columnar projection will not be useful… and Exadata may win. If your queries do not do much aggregation then columnar processing will be less useful… and a product at Level 2 may win. And of course, this blog has not addressed the complexities of joins and loading and workload management… so please do not consider this as a blanket promotion for Level 3 column stores… but now that you understand the architecture I hope you will be better able to call BS on the marketing…

Included is a table that outlines the maturity level of several products:

Product

Columnar Maturity Level

Notes

Teradata

2

 Columnar tables, Row Engine
Exadata

1

 PAX only
HANA

3

 Full Columnar Support
Greenplum

2

 Columnar tables, Row Engine
DB2

3

 BLU Hybrid
SQL Server

2

 I think… researching…
Vertica

3

 Full Columnar Support
Paraccel

3

 Full Columnar Support
Netezza

n/a

 No Columnar Support
Hadapt

2

 I think… researching…

Who is How Columnar? Exadata, Teradata, and HANA – Part 1: Column Compression

Basic Table

There are three forms of columnar-orientation currently deployed by database systems today. Each builds upon the next. The simplest form uses column-orientation to provide better data compression. The next level of maturity stores columnar data in separate structures to support columnar projection. The most mature implementations support a columnar database engine that performs relational algebra on column-oriented data. Let me explain…

Imagine a simple table with 1M rows… with the schema and the first several rows depicted in Figure 1. Conceptually, a row-orientation deploys data on disk and in-memory as depicted in Figure 2 and a column-orientation deploys data on disk and in-memory as depicted in Figure 3. The actual deployment may be significantly different, as we will see.

Note that I am going to throw out some indicative numbers around compression. I will suggest that applying compression to rows will provide from 1.5X to 3.5X compression with and average of 2.5X… and that applying compression to columns provides from 3X compression to 50X compression with the average around 10X. These are supportable numbers but the compression you see for any specific data set will vary.

A row oriented block

There are two powerful compression techniques that individually or combined provide most of the benefits: dictionary-encoding and run-length encoding. For the purposes of this blog I will describe only dictionary-encoding; and I will do an injustice to that by explaining it only briefly and conceptually… just enough that you get the idea.

Five column oriented blocks

Further compression is possible by encoding runs of similar values to a value plus the number of times it repeats so that the bit stream 0000000000000000 could be represented as 01111 (0 occurs 24 times).

You can now also start to see why column-orientation compresses better that a row-orientation. In the row block above there is little opportunity to encode whole rows in a dictionary… the cardinality of rows in a table is too high (note that this may not be true for a dimension table which is, in-effect, a dictionary). There is some opportunity to encode the bit runs in a row… as noted, you can expect to get 2X-2.5X from row compression for a fact table. Column-orientation allows dictionary encoding to be applied effectively to low cardinality columns… and this accounts for the advantage there.

Col Dict

Dictionary-encoding reduces data to a compressed form by building a map that provides a translation for each cardinal value in the table to a tightly compressed form. For example, if there are indeed only three values possible in the DeptID field above then we might build a dictionary for that column as depicted in Figure 4. You can see… by encoding and storing the data in the minimal number of bits required, significant storage reduction is possible… and the lower the cardinality of a column the smaller the resulting bit representation.

Note that there is no free lunch here. There is a cost to be paid in CPU cycles to compress data and to decompress data… but for a read-optimized data warehouse database compression is cool. Exactly how cool depends on the level of maturity and we will get to that as we go.

It is crucial to remember that column store databases are relational. They ingest rows and emit rows and perform relational algebra in-between. So there has to be some magic that turns tuples into columns and restores them from columns. The integrity of a row has to persist. Again I am going to defer on the details and point you at the references below… but imagine that for each row a bit map is built that, for each column, points to the entry in the column dictionary with the proper value.

There is no free lunch to column store… no free lunch anywhere, it seems. Building this bit map on INSERT is very expensive, and modifying it on UPDATE is fairly expensive. This is why column-orientation is not suitable for OLTP workloads without some extra effort. But the cost is amortized by significant performance gains for READs.

One last concept: since peripheral I/O reads blocks imagine two approaches to column compression: one applies the concepts above to an entire table breaking each table into separate column-oriented files that may be read separately; and one which applies the concepts individually to each large block in a table file. Imagine, in the first case that Figure 2 represents a picture of the first few rows in our 1M-row table. Imagine, in the second case, that Figure 2 represents the rows in one block of data re-oriented into columns.

This second, block-oriented, approach is called PAX, and it is more-or-less the approach used by Exadata. In the PAX approach each block contains its own mini-column store and a dictionary for dictionary encoding with the values in the block. Because the cardinality for columns within a block will often be less than for an entire table there are some distinct advantages to PAX compression. Compression will be higher by more than a little than for full table columnar compression.

When Exadata reads a block from disk it decompresses the data back into rows and performs row-oriented processing to complete the query. This is very cool for Exadata… a great feature. As noted, column compression may be 4X better than row compression on the average. This reduces the storage requirements and reduces the overhead of I/O by 4X… and this is a very significant improvement. But Exadata stops here. It is not a columnar-oriented DBMS and it misses the significant advantages that come from the next two levels of column-orientation… I’ll take these up in the next post.

To be clear, all of the databases that use these more mature techniques: Teradata, HANA, Greenplum, Vertica, Paraccel, DB2, and SQL Server gain from columnar compression even if the PAX approach provides some small advantage as a compression technique.

It is also worth noting that Teradata does not gain as much as others in this regard. This is not because of poor design, rather it is due to the fact that, to their credit, Teradata implemented a Teradata-specific dictionary-based compression scheme long ago. Columnar compression let others catch up to what Teradata has offered for years.

And before you ask… Netezza offers no columnar orientation… preferring to compress deeply using an FPGA co-processor to decompress… and to reduce I/O using zone maps rather than the using the mid-level column projection techniques in the next blog here.

Memory Trends and HANA

If the Gartner estimates here are correct… then DRAM prices will fall 50% per year per year over the next several years… and then in 2015 non-volatile RAM (see the related articles below) will become generally available.

It has been suggested that memory prices will fall slower than data warehouses will grow (see here). That does not seem to be the case… and the combination of cheaper memory and then non-volatile memory will make in-memory databases like SAP HANA ever more compelling. In fact, as I predicted… and to their credit, Teradata is adding more memory (see here).

Related articles

Some Unaudited HANA Performance Numbers

Fast
Fast (Photo credit: Allie’s.Dad)

The following performance numbers are being reported publicly for HANA:

  • HANA scans data at 3MB/msec/core
    • On a high-end 80-core server this translates to 240GB/sec per node
  • HANA inserts rows at 1.5M records/sec/core
    • Or 120M records/sec per node…
  • Aggregates 12M records/sec/core
    • Or 960M records per node…

These numbers seem reasonable:

  • A 100X improvement over disk-based scan (The recent EMC DCA announcement claimed 2.4GB/sec per node for Greenplum)…
  • Sort of standard OLTP insert speeds for a big server…
  • Huge performance gains for in-memory aggregation using columnar orientation and SIMD HPC instructions…

Note that these numbers are the basis for suggesting that there is a new low-TCO approach to BI that eliminates aggregate tables, materialized views, cubes, and indexes… and eliminates the operational overhead of computing these artifacts… and still provides a sub-second response for all queries.

My .2 Cents on HANA

Out of respect for my employer let me repeat what I have said before regarding HANA… HANA provides far and away the best price/performance of any data warehouse… no question. HANA will win the performance portion of every POC just based on the facts detailed in this graphic:

Latency 3

Other products database code lives thousands to tens of thousands times further away from the data than in HANA. Based on this I honestly believe that HANA will come to dominate the market… And I’ll say more on this when I post my 2 cents on HANA…

But Teradata is the leader for good reason… and in the next posts providing my 2 cents I will find good things to say about several other  products…

This personal blog is not for marketing any particular product… it is about architecture…

To their credit my employer has given me no grief over the Teradata post… but I know that some folks are wincing over the idea that I would say something good about the competition… My apologies…

A Rebuttal…

This post has been thrown at me a couple of times now… so I’ll now take the time to go through it… and try to address the junk.

It starts by suggesting that “the Germans” have started a war… but the next sentence points out that the author tossed grenades at HANA two months before the start he suggests. It also ignores the fact that the HANA post in question was a response to incorrect public statements by a Microsoft product manager about HANA (here).

The author suggests some issue with understanding clustered indexes… Note that “There are 2 implementations of xVelocity columnstore technology: 1. Non clustered index which is read only – this is the version available in SMP (single node) SQL Server 2012. 2. Columnstore as a clustered index that is updateable – This is the version available in MPP or PDW version of SQL 2012.”. The Microsoft documentation I read did not distinguish between the two and so I mistakenly attributed features of one to the other. Hopefully this clears up the confusion.

He suggests that the concept of keeping redundant versions of the data… one for OLTP and one for BI is “untrue”… I believe that the conventional way to deal with OLTP and BI is to build separate OLTP and BI databases… data warehouses and data marts. So I stand by the original comment.

The author rightfully suggests that I did not provide a reference for my claim that there are odd limitations to the SQL that require hand-coding… here they are (see the do’s and dont’s).

He criticizes my statement that shared-nothing gave us the basis for solving “big data”. I do not understand the criticism? Nearly very large database in the world is based on a shared-nothing architecture… and the SQL Server PDW is based on the same architecture in order to allow SQL Server to scale.

He is critical of the fact that HANA is optimized for the hardware and suggests that HANA does not support Intel’s Ivy Bridge. HANA is optimized for Ivy Bridge… and HANA is designed to fully utilize the hardware… If we keep it simple and suggest that using hardware-specific instruction sets and hardware-specific techniques to keep data in cache together provide a 50X performance boost [This ignores the advantages  of in-memory and focusses only on hw-specific optimizations… where data in cache is either 15X (L3) or 20X (L2) or 200X (L1) faster than data fetched from DRAM… plus 10X or more using super-computer SIMD instructions], I would ask… would you spend 50X more for under-utilized hardware if you had a choice? SAP is pursuing a distinct strategy that deserves a more thoughtful response than the author provided.

He accuses me of lying… lying… about SQL being architected for single-core x286 processors. Sigh. I am unaware of a rewrite of the SQL Server product since the 286… and tacking on support for modern processors is not re-architecting. If SQL Server was re-architected from scratch since then I would be happy to know that I was mistaken… but until I hear about a re-write I will assume the SQL Server architecture, the architecture, is unchanged from when Sybase originally developed it and licensed it to Microsoft.

He says that HANA is cobbled together from older piece parts… and points to a Wikipedia page. But he does not use the words in the article… that HANA was synthesized from other products and , as stated in the next sentence, built on: “a new application architecture“.  So he leaves the reader to believe that there is nothing new… he is mistaken. HANA is more than a synthesis of in-memory, column-store, and shared-nothing. It includes a new execution engine built on algorithms from the search space… columns in the column store are processed as vectors rather than the rote tuple-by-tuple approach from the 1980’s. It includes powerful in-database support for procedural languages with facilities that convert loops to fully parallel set-based processes. It provides, as noted above, a unique approach to supporting OLTP and BI queries in the same instance (see here)… and more. I’m not trying to hype HANA here… time and the market will determine if these new features are important… but there is no doubt that they are new.

I did not find the Business Intelligist post to be very informative or helpful. With the exception of the Wikipedia article mentioned above there is only unsubstantiated opinion in the piece… … and a degree of rudeness that is wholly uncalled for.

HANA Support for OLTP and BI In a Single Table

Aerial view of Hana, Maui
Aerial view of Hana, Maui (Photo credit: Wikipedia)

This is a rehash of my post for SAP here… I thought you might find it interesting as it describes the architecture HANA uses to support OLTP and BI against a single table.

A couple of points to think about:

  • If you have only one database structure you can optimize for only one query; e.g. the OLTP query is fast against a OLTP structure but slow against a BI structure… or visa versa.
  • If you have two structures you have to ETL the data between the two at some cost. There is cost in keeping a replica of the data, cost in developing, administering, and executing the ETL process. In addition there is a lost opportunity cost hidden in the latency of the data. You cannot see the current state of the business by querying the BI data as some data has not yet been ETL’d across.
  • OLTP performance is normally paramount; so the perfect system would not compromise that performance or compromise it only a little.

Let’s look at the HANA approach to this at a high level.

HANA provides a single view of a table to an application or a user, but under-the-covers each table includes a OLTP optimized part, a BI optimized part, and a mechanism for moving data from one part to the other

When a transaction hits the system; inserts, updates, and deletes are processed in the OLTP part with no performance penalty. The read portion of the OLTP query accesses the read-optimized internal structure with no performance penalty. Note that reading a single column in a column store, which is the key for the transaction, is roughly equivalent to reading an index structure on top of a standard disk-based DBMS. Except the column is always in-memory which means I/O is never required. This provides the HANA system with an advantage over a disk-based system. Disk I/O is 120+ times slower than memory access so even an index is unlikely to beat in-memory. See here for some numbers you should know.

After the transaction is committed into the internal, OLTP-optimized part, a process starts that moves the data to the BI optimized part. This is called a delta merge as the OLTP portion holds all of the changes, the delta, in the data set.

When a BI query starts it can limit the scan to only partitions in the BI optimized part, or if real-time data is required it can scan both parts. The small portion of the scan that accesses the OLTP/delta portion is sub-optimal when compared to the scan of the BI part,  but not slow at all as the data is all in-memory.

We can tease the performance apart as follows:

  1. There is a OLTP insert/update/delete “write” portion… and HANA executes this like any OLTP database, as fast as an OLTP RDBMS, with a commit after a write-to-log;
  2. There is a OLTP select “read” portion… and HANA performs this in the in-memory column store faster than many OLTP databases… and scans the delta structure as fast as any OLTP database;
  3. There is a delta merge from the OLTP write-optimized part to the BI read-optimized column store that is hundreds to tens of thousands of times faster than any ETL tool; and
  4. There is a BI select portion that scans the in-memory column store hundreds to thousands of times faster than a disk-based BI database.
  5. If the BI query requires access to real-time data then an in-memory scan of the delta file is required… there is no analogy to this in a system with separate OLTP and BI tables.
The implementation uses MVCC instead of locks.

Nice.

Mobile Clients Require High Performance BI Computing

English: An image of an iPad 2.
(Photo credit: Wikipedia)

I posted a blog on the SAP site here that discussed the implications of mobile clients. I want to re-emphasize the issue as it is crucial.

While at Greenplum we routinely replaced older EDW platforms and provided stunning performance. I recall one customer in particular where we were given a query that ran in 7 hours and Greenplum executed the query in seven seconds. This was exceptional… more typical were cases where we reduced run-times from several hours to under 30 minutes… to 10 minutes… to 5 minutes. I’m sure that every major competitor: Teradata, Greenplum, Netezza, and Exadata has similar stories to tell.

But 5 minutes will not cut it if you are servicing a mobile client where sub-second response to the device is a requirement… and 10 minutes is out of the question. It does not matter if it ran in 10 hours before… 10 minute response is not acceptable to a mobile device.

Today we see sub-second response delivered to our phones by custom applications built on special high-performance platforms designed specifically to service a mobile client: iPhones, iPads, and Android devices.

But what will we do about the BI applications built on commercial platforms which have just used every trick in the book to become one of the 5 minute stories mentioned above?

I think that there are only a couple of architectural choices.

  1. We can rewrite the high-value queries as custom applications using specialized infrastructure… at great expense… and leaving the vast majority of queries un-serviced.
  2. We can apply the 80/20 rule to get the easiest queries serviced with only 20% of the effort. But according to Murphy the 20% left will be the highest value queries.
  3. We can tack on expensive, specialized, accelerators to some queries… to those that can be accelerated… but again we leave too much behind.
  4. Or we can move to a general purpose high performance computing platform that can service the existing BI workload with sub-second response.

In-memory computing will play a role… Exalytics provides option #3… HANA option #4.

SSD devices may play a role… but the performance improvements being quoted by vendors who use SSD as a block I/O device is 10X or less. A 10X improvement applied to a query that was just improved to 10 minutes yields a 1 minute query… still not the expected level of service.

IT departments will have to evaluate the price/performance, not just the price, as they consider their next platform purchases. The definition of adequate response is changing… and the old adequate, at the least cost, may not cut it. Mobile clients are here to stay. The productivity gains expected from these devices is significant. High performance BI computing is going to be a requirement.

The Teradata Myth of Query Concurrency

When I was at Greenplum… and now again at SAP… I ran into a strange logic from Teradata about query concurrency. They claimed that query concurrency was a good thing and an indicator of excellent workload management. Let’s look at a simple picture of how that works.

In Figure 1 we depict a single query on a Teradata cluster. Since each node is working in parallel the picture is representative no matter how many nodes are attached. In the picture each line represents the time it takes to read a block from disk. To make the picture simple we will show I/O taking only 1/10th of the clock time… in the real world it is slower.

Image

Given this simplification we can see that a single query can only consume 10% of the CPU… and the rest of the time the CPU is idle… waiting for work. We also represented some I/O to spool files… as Teradata writes all intermediate results to disk and then reads them in the next step. But this picture is a little unfair to Greenplum and HANA as I do not represent spool I/O completely. For each qualifying row the data is read from the table on disk, written to spool, and then read from spool in the subsequent step. But this note is about concurrency… so I simplified the picture.

Figure 2 shows the same query running on Greenplum. Note that Greenplum uses a data flow architecture that pushes tuples from step to step in the execution plan without writing them to disk. As a result the query completes very quickly after the last tuple is scanned from the table.

Image

Let me say again… this story is about CPU utilization, concurrency, and workload management… I’m not trying to say that there are not optimizations that might make Teradata outperform Greenplum… or optimizations that might make Greenplum even faster still… I just want you to see the impact on concurrency of the spool architecture versus the data flow architecture.

Note that on Greenplum the processors are 20% busy in the interval that the query runs. For complex queries with lots of steps the data flow architecture provides an even more significant advantage to Greenplum. If there are 20 steps in the execution plan then Teradata will do spool I/O, first writing then reading the intermediate results while Greenplum manages all of the results in-memory after the initial reads.

In Figure 3 we see the impact of having the data in-memory as with HANA or TimeTen. Again, I am ignoring the implications of HANA’s columnar orientation and so forth… but you can clearly see the implications by removing block I/O. Image

Now let’s look at the same pictures with 2 concurrent queries. Let’s assume no workload management… just first in, first out.

In Figure 4 we see Teradata with two concurrent queries. Teradata has both queries executing at the same time. The second query is using up the wasted space made available while the CPUs wait for Query 1’s I/O to complete. Teradata spools the intermediate results to disk; which reduces the impact on memory while they wait.  This is very wasteful as described here and here (in short, the Five Minute Rule suggests that data that will be reused right away is more economically stored in memory)… but Teradata carries a legacy from the days when memory was dear.

Image

But to be sure… Teradata has two queries running concurrently. And the CPU is now 20% busy.

Figure 5 shows the two-query picture for Greenplum. Like Teradata, they use the gaps to do work and get both queries running concurrently. Greenplum uses the CPU much more efficiently and does not write and read to spool in between every step.

Image

In Figure 6 we see HANA with two queries. Since one query consumed all of the CPU the second query waits… then blasts through. There is no concurrency… but the work is completed in a fraction of the time required by Teradata.

Image

If we continue to add queries using these simple models we would get to the point where there is no CPU available on any architecture. At this point workload management comes into play. If there is no CPU then all that can be done is to either manage queries in a queue… letting them wait for resources to start… or start them and let them wastefully thrash in and out… there is really no other architectural option.

So using this very simple depiction eventually all three systems find themselves in the same spot… no CPU to spare. But there is much more to the topic and I’ve hinted about these in previous posts.

Starting more queries than you can service is wasteful. Queries have to swap in and out of memory and/or in and out of spool (more I/O!) and/or in and out of the processor caches. It is best to control concurrency… not embrace it.

Running virtual instances of the database instead of lightweight threads adds significant communications overhead. Instances often become unbalanced as the data returned makes the shards uneven. Since queries end when the slowest instance finishes it’s work this can reduce query performance. Each time you preempt a running query you have to restore state and repopulate the processor’s cache… which slows the query by 12X-20X. … Columnar storage helps… but if the data is decompressed too soon then the help is sub-optimal… and so on… all of the tricks used by databases and described in these blogs count.

But what does not count is query concurrency. When Teradata plays this card against Greenplum or HANA they are not talking architecture… it is silliness. Query throughput is what matters. Anyone would take a system that processes 100,000 queries per hour over a system that processes 50,000 queries per hour but lets them all run concurrently.

I’ve been picking on Teradata lately as they have been marketing hard… a little too hard. Teradata is a fine system and they should be proud of their architecture and their place in the market. I am proud to have worked for them. I’ll lay off for a while.

 

Teradata’s HANA Mathematics

I recently pointed out some silliness published by Teradata to several SAP prospects. There is more nonsense that was sent and I’d like to take a moment to clear up these additional claims.

In their note to HANA prospects they used the following numbers from the paper SAP published here:

# of Query Streams

1

10 20

25

# of Queries per Hour (Throughput)

6,282

36,600 48,770

52,212

Teradata makes several claims from these numbers. First they claim that the numbers demonstrate a bottleneck that is tied to either the NUMA effect or to the SMP Knee Curve. This nonsense is the subject of a previous blog here.

For any database system as you increase the number of queries to the point where there is contention the throughput decreases. This is just common sense. If you have 10 cores and 10 threads and there is no contention then all threads run at the same speed as fast as possible. If you add an 11th thread then throughput falls off, as one thread has to wait for a core. As you add more threads the throughput falls further until the system is saturated and throughput flattens. Figure 1 is an example of the saturation curve you would expect from any system as the throughput flattens.

There are some funny twists to this, though. If you are an IMDB then each query can use 100% of a core. If you are multi-threaded IMDB then each query can use 100% of all cores.  If you are a disk-based system then you give up the CPU to another query while you wait for I/O… so throughput falls. I’ll address these twists in a separate blog… but you will see a hint at the issue here.

Teradata claims that these numbers reflect a scaling issue. This is a very strange claim. Teradata tests scaling by adding hardware, data, and queries in equal amounts to see if the query performance holds constant… or they add hardware and data to look for a correlation between the number of nodes and query performance… hoping that as the nodes increase the response time decreases.  In fact Teradata scales well… as does HANA… But the hardware is constant in the HANA benchmark so there is no view into scaling at all. Let me emphasis this… you cannot say anything about scaling from the numbers above.

Teradata claims that they can extrapolate the saturation point for the system… this represents very bad mathematics. They take the four data points in the table and create an S curve like the one in Figure 1… except they invert it to show how throughput decreases as you move towards the saturation point… Figure 2 shows the problem.

If you draw a straight line through the curve using any sort of math you miss the long tail at the end. This is an approximation of the picture Teradata drew… but even in their picture you can see a tail forming… which they ignore. It is also questionable math to extrapolate from only four observations. The bottom line is that you cannot extrapolate the saturation point from these four numbers… you just don’t know how far out the tail will run unless you measure it.

To prove this is nonsense you just have to look here. It turns out that SAP publicly published these benchmark results in two separate papers and this second one has numbers out to 60 streams. Unsurprisingly at 60 streams HANA processed 112,602 queries per hour while Teradata told their customers that it would saturate well short of that… at 49,601 queries (they predicted that HANA would thrash and the number of queries/hour would fall back… more FUD).

Teradata is sending propaganda to their prospects with scary extrapolations and pronouncements of architectural bottlenecks in HANA. The mathematics behind their numbers is weak and their incorrect use of deep architectural terms demonstrates ignorance of the concepts.  They are trying to create Fear, Uncertainty, and Doubt. Bad marketing… not architecture, methinks.