How DBMS Vendors Admit to an Architectural Limitation: Part 3 – EDW on IBM z/OS

This is the 3rd and final example of a vendor admitting, without admitting, to an architectural limitation. The first two parts on Exadata and Teradata are here and here.

Teradata started to get real traction in the EDW space with a shared-nothing architecture in the late 1980’s. At that time the only real competition was DB2 on an IBM mainframe. From those days until just a couple of years ago IBM insisted that for MVS, then z/OS, customers should stick to the mainframe for their data warehouses and marts. There was some dabbling with sharded data in DB2 for z/OS… and Teradata made some in-roads… as did Netezza… but IBM insisted that there was no reason not to stay Blue. DB2 on AIX and then LINUX appeared… and both offered a better price/performance option than DB2 ob Z/OS… but the faithful stayed faithful for the most part.

Then IBM bought Netezza, a pure shared-nothing microprocessor-based machine, and the recommendation changed. Today IBM recommends the Analytics Accelerator, based on Netezza, to mainframe users who want to deploy an EDW. This is an admission, with no admission at all, that there was all along an architectural advantage to shared-nothingness.

If you search this blog for “Netezza” you can get my perspective on that technology. But to be blunt, the Analytics Accelerator is not IBM’s best EDW platform… DB2 LUW is by far… and with BLU LUW is better still.

I have made it clear in my previous posts that I consider it lazy for an IT shop to commit to a vendor or to a product. As engineers we need to embrace change. For IBM z/OS shops this means a realistic look at non-z/OS alternatives to deploy or to re-deploy an EDW. It makes no sense to build a data warehouse or a data mart directly on z/OS. Use the Analytics Accelerator or, better still, open the competition to better products like DB2 LUW, Teradata, Vertica, etc.

References

Database Fog Blog

Other

Key Values and Key-Value Stores and In-memory Databases

Back to more geeky topics… although my Mom loved the videos…

When very high performance is required to return key performance metrics derived from large volumes of data to a very large number of clients… in other words when volume and velocity are factors and the results are to be delivered to thousands of users (I suppose that I could conjure up a clever V here… but the cleverness would come from the silliness of the semantic stretch… so I’ll leave it to you all to have some pun); the conventional approach has been to pre-compute the results into a set of values that can be fetched by key. In other words, we build a pre-aggregated and pre-joined result table that provides answers to a single query template. Conventionally building this query-specific result table has been the only way to solve these big problems.

Further, we conventionally store these key-value results in a relational DBMS and fetch a row at a time providing pretty darn good performance. Sometimes pretty darn good is not good enough. So there are new options. Key-value data stores may well offer a solution that provides performance and scale at a price well below what has been conventional to date. This is well and good.

But I would like to challenge the conventional thinking a little. The process of joining and aggregating volumes of data into results is a BI process. For the last twenty years BI practitioners have been building pre-aggregated tables and data marts to solve these same problems… maybe not at scale… and this practice has proven to be very expensive, the opposite of agile, and unsustainable. The people costs to develop and support multiple pre-computed replicas is exorbitant. The lack of flexibility that comes from imposing a longish development project over what is essentially an aggregate BI query is constraining our enterprise and our customers.

A better approach is to use the new high performance database products: HANA, BLU, Oracle 12c, or maybe Spark; to aggregate on-demand. Included in this approach is a requirement to use the new high performance database computing platforms available to house the databases.

Consider this… an in-memory DBMS can aggregate 12M rows/sec/core. It can scan 3MB/msec/core. Companies like SGI and HP are ganging processors together so that you can buy a single node that contains 32, 64, or 128 cores… and this number will go up. A 64-core server will aggregate 768M rows/sec and scan 19.2TB/sec… and you can gang a small number of nodes together and scale out.

Providing an extensible BI platform for big data is so much easier than building single-query key-value clusters… there is much less risk… and the agility and TCO make it close to a no-brainer. We just have to re-think the approach we’ve used for 20 years and let the new software and hardware do the work.

Thinking about BI: Infographics is the next phase…

Infographics by The Guardian
Infographics by The Guardian (Photo credit: tripu)

I have been thinking about BI… prompted by a friend, Frank Bien, who is the CTO of Looker (you are welcome, Frank, for the plug…) but this post is about a  trend in BI that is worth exploring… and only maybe about Looker or any other tool.

BI was originally about reporting… in its very first iterations users coded SQL directly, or used a 4GL scripting language, and the BI tool was there for formatting the output. Then more focus was put on query-building to make it easier for developers to effectively get the required data.

There was lots of talk at this point about knowledge-workers and do-it-yourself BI… but it never really worked out that way. Business users requested canned reports and went to a query guru to request special reports as required.

The following was pretty normal: after finding some interesting fact in a tabular report a business user would pull the data and build a Powerpoint slide to present the results. As interfaces improved you soon could access the data directly and create Excel or Powerpoint charts without copying the data by hand. In other words data visualization was separated from reporting.

The BI vendors caught on to this, recognizing that data presentation is important, and soon all of the BI tools offered some charting options. But the next step was equally interesting. Charting is a bit of an art… so the BI tools programmed in some directives to help you select the chart that fit your data and that would be visually pleasing. So simple charting as visualization was built-in with some simple assistance to help you with the simple art of presentation.

From here vendors went in two directions, one after the other.

First, dashboards were developed that were customizable and these applications, either semi-static or dynamic, caught your eye. Red lights and limits could be built into a heads-up display. The art of presentation was pretty crude and loudness: bright colors and lots of moving dials and eye-candy won the day. But as far as presentation goes, dashboards are just multiple simple charts arranged on a screen. QlikView led this charge.

Next, a new set of visualization products were rolled out by vendors like Tableau and Pentaho. Users saw that some very powerful pictures could be drawn showing data in a time series… and showing the series changing over time. Since the presentation was more nuanced and more “artistic” the automated assistance required more sophistication and this is where the vendors are now fighting to differentiate themselves.

But an interesting thing was happening outside BI… and this is the point of this note. In the same way that PowerPoint led reporting to charting, a new presentation technique called infographics is emerging. It is the state of the art in data visualization… and Powerpoint… and art… rolled into one. And it is very impactful. I imagine that the next wave of BI tools must embrace this more advanced presentation technique.

Here is how I think this plays out.

The advanced data visualization vendors will provide a palette that directly accesses data, and big data, to allow very custom infographics… you can see some of this at Piktochart although it is more about templates than free form development. But since this is art… tools will be developed that help the art-impaired like me to build nice displays and they will do this by analyzing the data and recommending one or more meaningful infographic displays.

So, maybe in the same way that Powerpoint data presentation anticipated charting in BI tools… Infographics anticipates the next data presentation facility for BI.

I suppose that this is not really a controversial conclusion… and I imagine that if I took the time I could find several start-ups who are way ahead of me on this… but sometimes it’s more fun to daydream it up on my own… and pretend that I’m out front…

Who is How Columnar? Exadata, Teradata, and HANA – Part 2: Column Processing

In my last post here I suggested that there were three levels of maturity around column orientation and described the first level, PAX, which provides columnar compression. This apparently is the level Exadata operates at with its Hybrid Columnar Compression.

In this post we will consider the next two levels of maturity: early materialized column processing and late materialized column processing which provide more I/O avoidance and some processing advantages.

In the previous post I suggested a five-column table and depicted each of those columns oriented on disk in separate file structures. This orientation provides the second level of maturity: columnar projection.

Imagine a query that selects only 4 of the five columns in the table leaving out the EmpFirst column. In this case the physical structure that stores EmpFirst does not have to be accessed; 20% less data is read, reducing the I/O overhead by the same amount. Somewhere in the process the magic has to be invoked that returns the columns to a row orientation… but just maybe that overhead costs less than the saving from the reduced I/O?

Better still, imagine a fact table with 100 columns and a query that accesses only 10 of the columns. This is a very common use case. The result is a 9X reduction in the amount of data that has to be read and a 9X reduction in the cost of weaving columns into rows. This is columnar projection and the impact of this far outweighs small advantage offered by PAX (PAX may provide a .1X-.5X, 10%-50%, compression advantage over full columnar tables). This is the advantage that lets most of the columnar databases beat Exadata in a fair fight.

But Teradata and Greenplum stop here. After data is projected and selected the data is decompressed into rows and processed using their conventional row-based database engines. The gains from more maturity are significant.

The true column stores read compressed columnar data into memory and then operate of the columnar data directly. This provides distinct advantages:

  • Since data remains compressed DRAM is used more efficiently
  • Aggregations against a single column access data in contiguous memory improving cache utilization
  • Since data remains compressed processor caches are used more efficiently
  • Since data is stored in bit maps it can be processed as vectors using the super-computing instruction sets available in many CPUs
  • Aggregations can be executed using multiplication instead of table scans
  • Distinct query optimizations are available when columnar dictionaries are available
  • Column structures behave as built-in indexes, eliminating the need for separate index structures

These advantages can provide 10X-50X performance improvements over the previous level of maturity.

Summary

  • Column Compression provides approximately a 4X performance advantage over row compression (10X instead of 2.5X). This is Column Maturity Level 1.
  • Columnar Projection includes the advantages of Column Compression and provides a further 5X-10X performance advantage (if your queries touch 1/5-1/10 of the columns). This is Column Maturity Level 2.
  • Columnar Processing provides a 10X+ performance improvement over just compression and projection. This is Column Maturity Level 3.

Of course your mileage will vary… If your workload tends to touch more than 80% of the columns in your big fact tables then columnar projection will not be useful… and Exadata may win. If your queries do not do much aggregation then columnar processing will be less useful… and a product at Level 2 may win. And of course, this blog has not addressed the complexities of joins and loading and workload management… so please do not consider this as a blanket promotion for Level 3 column stores… but now that you understand the architecture I hope you will be better able to call BS on the marketing…

Included is a table that outlines the maturity level of several products:

Product

Columnar Maturity Level

Notes

Teradata

2

 Columnar tables, Row Engine
Exadata

1

 PAX only
HANA

3

 Full Columnar Support
Greenplum

2

 Columnar tables, Row Engine
DB2

3

 BLU Hybrid
SQL Server

2

 I think… researching…
Vertica

3

 Full Columnar Support
Paraccel

3

 Full Columnar Support
Netezza

n/a

 No Columnar Support
Hadapt

2

 I think… researching…

How Good Is Teradata’s Intelligent Memory?

English: On December 17, 2009 30 feet chunk of...
A 30 feet chunk of the cliff below the apartment building fell to Pacific Ocean. (Photo credit: Wikipedia)

Jason asked a great question in the comment section here… he asked… does Teradata’s Intelligent Memory erode HANA’s value proposition?  Let me answer here in a more general way that is applicable to the general database space…

Every time a vendor puts more silicon between the CPU and the disk they will improve their performance (and increase their price). Does this erode HANA’s value proposition? Sure. Every advance by any vendor erodes every other vendor’s position.

To win business a new database product has to be faster than the competition. In my experience you have to be at least 30% faster to unseat the incumbent. If you are 50% faster you will win a lot of business. If you are 2x, 100%, faster you win nearly every time.

Therefore the questions are:

  • Did the Teradata announcement eliminate a set of competitors from reaching these thresholds when Teradata is the incumbent? Yup. It is very smart.
  • Does Intelligent Memory allow Teradata to reach these thresholds when they compete against another incumbent. Yup.
  • Did it eliminate HANA from reaching these thresholds when competing with Teradata? I do not think so… in fact I’m pretty sure it is not the case… HANA should still be way over the 2x threshold… but the reasons why will require a deeper dive… stay tuned.

In the picture attached a 30 foot chunk eroded… but Exadata still stands. Will it be condemned?

Note: Here is a commercial post on the SAP HANA blog site that describes at a high level why I think HANA retains a distinct architectural advantage.

Memory Trends and HANA

If the Gartner estimates here are correct… then DRAM prices will fall 50% per year per year over the next several years… and then in 2015 non-volatile RAM (see the related articles below) will become generally available.

It has been suggested that memory prices will fall slower than data warehouses will grow (see here). That does not seem to be the case… and the combination of cheaper memory and then non-volatile memory will make in-memory databases like SAP HANA ever more compelling. In fact, as I predicted… and to their credit, Teradata is adding more memory (see here).

Related articles

Wondering About Netezza… and A Teradata Prediction Comes True…

Magic 8 Ball
Magic 8 Ball (Photo credit: Wikipedia)

If you missed the tweet… 2+ years ago I predicted here that Teradata would go away from ByNet… and lo and behold they did (see here).

In the same post I predicted that Netezza would go away from FPGAs. This has not come to pass. But I wonder if it might… or if there is a bigger change possible?

With the recent announcements of DB2 BLU and column store I suspect that DB2 will outperform Netezza when the query mix does not fall directly in Netezza’s sweet spot.

I also have a suspicion that the Netezza architecture, with its execution engine split across two different processors, is just hard to engineer. I cannot think of another reason features come so slowly there. Why, for example, is there no columnar support? Greenplum built it on the same Postgres base with less than a handful of engineers in a year. Teradata now offers columnar tables as well.

These concerns… combined with some previous notes on Netezza add up as follows:

  1. FPGAs no longer provide a performance advantage (per my link above)
  2. FPGAs limit the ability of the DBMS to use more cores (see here)
  3. FPGAs limit the ability of the DBMS to manage workload (see here… and especially the comments)
  4. FPGAs and having a 2-phase split execution environment limits the ability to extend and enhance the code base (a new conjecture)
  5. Zone Maps and CBTs provide a limited ability to solve for a wide range of queries… they are just an index (see here)
  6. DB2 Column Store provides a performance boost equal to or greater than zone maps and CBTs (a new conjecture)
  7. DB2 BLU provides a performance boost well in excess of what Netezza can provide (see here)

The Netezza architecture with FPGAs provided a distinct advantage in 2000 when CPU was the scarce commodity. But multi-core systems and the advance of Moore’s Law soon made processing abundant… and the advantage of FPGA co-processing diminished. Without a distinct advantage the split execution architecture became a disadvantage… and the complexity of that design kept Netezza from developing the advances on top of the Postgres base that were very easy to develop by others.

Architecture counts… and DB2 is a strong product. If, as I suspect, DB2 is now a more capable product than Netezza… I wonder what path IBM may take?

MPP, IMDB and Moore’s Law

In the post here I listed the units of parallelism (UoP) applied by various products on a single node. Those findings are summarized in the table below.

Product

Version/HW

Cores per Node

UoP per Node

Notes

Teradata EDW 6700H

16

32

Uses hyper-threads.
Greenplum DCA UAP Edition

16

8

Recommends 1 Segment for each 2 cores. Maybe some multi-threading per query so it could be greater than 8 on the average… and could be 16 with hyper-threads… but not more than 32 for sure.
Exadata X3

12

12-24

Maybe only 12… cannot find if they use hyper-threads.
Netezza Striper

16

16

May use hyper-threads but limited by 16 FPGAs.
HANA Any Xeon E7-4800

40

80

Uses hyper-threads.

A UoP is defined as the maximum number of  instructions that can execute in parallel on a single node for a single query. Note that in the comments there was a lively debate where some readers wanted to count threads or processes or slices that were “active” but in a wait state. Since any program can start threads that wait I do not count these as UoP (later we might devise a new measure named units of waiting that would gauge the inefficiency in any given design by measuring the amount of waiting around required to keep the CPUs fed… maybe the measure would be valuable in measuring the inefficiency of the queue at your doctor’s office or at any government agency).

On some CPUs vendors such as Intel allow two threads to execute instructions in-parallel in a core. This is called hyper-threading and, if implemented, it allows for two UoP on a single core. Rather than constantly qualify the statements for the rest of this blog when I refer to cores I mean to imply hyper-threads.

The lively comments in the blog included some discussion of the sort of techniques used by vendors to try and keep the cores in the CPU on each node fed. It is these techniques that lead to more active I/O streams than cores and more threads than cores.

For several years now Intel and the other CPU manufacturers have been building ever more cores into their products. This has allowed them to continue the trend known as Moore’s Law. Multi-core is now a fact of life and even phones, tablets, and personal computers have multi-core chips.

But if you look at the table  you can see that the database products above, even the newly announced products from Teradata and Netezza, are using CPUs with relatively few cores. The high-end Intel processors have 40 cores and the databases, with the exception of HANA, use Intel products with at most 16 cores. Further, Intel will deliver Ivy Bridge processors to the market this year with 120 cores. These vendors know this… yet they have chosen to deliver appliances with the previous generation CPUs. You might ask why?

I believe that there is an architectural reason for this (also a marketing reason covered here).

It is very hard to keep 80 cores fed with data when you have to perform block I/O. It will be nearly impossible to keep the 240 cores coming with Ivy Bridge fed. One solution is to deploy more nodes in a shared-nothing configuration with fewer cores per node… but this will be expensive requiring more power, floorspace, administration, etc. This is the solution taken by most of the vendors above. Another solution is to solve the problem without I/O with an in-memory database (IMDB) architecture. This is the solution taken by SAP with HANA.

Intel, IBM, and the rest will continue to build out using the multi-core approach for the foreseeable future. IMDB products will be able to fully utilize this product. Other products will struggle to take full advantage as we can see already… they will adapt and adjust and do what they can… but ultimately IMDB will win, I think… because there is just no other way to keep up as Moore’s Law continues to drive technology… no other way to feed the CPU engines with data fast enough.

If I am right then you will see more IMDB offerings from more vendors, including from the major vendors in the near future (note that this does not include the announcements of “database in memory” from Oracle which is not by any measure an in-memory database).

This is the underlying reason why Donald Feinberg (and Timo Elliott) are right on here. Every organization will be running in-memory… and soon.

MPP on HANA, Exadata, Teradata, and Netezza

6 May… There is a good summary of this post and on the comments here.  – Rob

17 April… A single unit of parallelism is a core plus a thread/process to feed it instructions plus a feed of data. The only exception is when the core uses hyper-threading… in which case 2 instructions can execute more-or-less at the same time… then a core provides 2 units of parallelism. All of the other stuff: many threads per core and many data shards/slices per thread are just techniques to keep the core fed. – Rob

16 April… I edited this to correct my loose use of the word “shard”. A shard is a physical slice of data and I was using it to represent a unit of parallelism. – Rob

I made the observation in this post that there is some inefficiency in an architecture that builds parallel streams that communicate on a single node across operating system boundaries… and these inefficiencies can limit the number of parallel streams that can be deployed. Greenplum, for example, no longer recommends deploying a segment instance per core on a single node and as a result not all of the available CPU can be applied to each query.

This blog will outline some other interesting limits on the level of parallelism in several products and on the definition of Massively Parallel Processing (MPP). Note that the level of parallelism is directly associated with performance.

On HANA a thread is built for each core… including a thread for each hyper-thread. As a result HANA will split and process data with 80 units of parallelism on a high-end 40-core Intel server.

Exadata deploys 12 cores per cell/node in the storage subsystem. They deploy 12 disk drives per node. I cannot see it clearly documented how many threads they deploy per disk… but it could not be more than 24 units of parallelism if they use hyper-threading of some sort. It may well be that there are only 12 units of parallelism per node (see here).

Updated April 16: Netezza deploys 8 “slices” per S-Blade… 8 units of parallelism… one for each FPGA core in the Twin times four (2X4) Twinfin architecture (see here). The next generation Netezza Striper will have 16-way parallelism per node with 16 Intel cores and 16 FPGA cores…

Updated April 17: Teradata uses hyper-threading (see here)… so that they will deploy 24 units of parallelism per node on an EDW 6700C (2X6X2) and  32 units of parallelism per node on an EDW 6700H (2X8X2).

You can see the different definitions of the word “massive” in these various parallel processing systems.

Note that the next generation of Xeon processors coming out later this year will have 8X15 processors or 120 cores on a fat node:

  • This will provide HANA with the ability to deploy 240 units of parallelism per node.
  • Netezza will have to find a way to scale up the FPGA cores per S-Blade to keep up. TwinFin will have to become QuadFin or DozenFin. It became HexadecaFin… see above. – Rob
  • Exadata will have to put 120 SSD/disk drive combos in each node instead of 12 if they want to maintain the same parallelism-to-disk ratio with 120 units of parallelism.
  • Teradata will have to find a way to get more I/O bandwidth on the problem if they want to deploy nodes with 120+ units of parallelism per node.

Most likely all but HANA will deploy more nodes with a smaller number of cores and pay the price of more servers, more power, more floor space, and inefficient inter-node network communications.

So stay tuned…

Some Unaudited HANA Performance Numbers

Fast
Fast (Photo credit: Allie’s.Dad)

The following performance numbers are being reported publicly for HANA:

  • HANA scans data at 3MB/msec/core
    • On a high-end 80-core server this translates to 240GB/sec per node
  • HANA inserts rows at 1.5M records/sec/core
    • Or 120M records/sec per node…
  • Aggregates 12M records/sec/core
    • Or 960M records per node…

These numbers seem reasonable:

  • A 100X improvement over disk-based scan (The recent EMC DCA announcement claimed 2.4GB/sec per node for Greenplum)…
  • Sort of standard OLTP insert speeds for a big server…
  • Huge performance gains for in-memory aggregation using columnar orientation and SIMD HPC instructions…

Note that these numbers are the basis for suggesting that there is a new low-TCO approach to BI that eliminates aggregate tables, materialized views, cubes, and indexes… and eliminates the operational overhead of computing these artifacts… and still provides a sub-second response for all queries.