The TwinFin Surf Board
The TwinFin Surf Board (Photo credit: tvanhoosear)

Since my blogs tend to be in response to some stimulus they may not reflect a holistic view on any particular product. The “My 2 Cents” series will try to provide a broader view…

Please consider this as you read on…

Summary

Netezza put a new spin on data warehousing… they made it easy. The Netezza software includes a unique clustered index feature called a zone map that is powerful and easy to use. They also use a FPGA co-processor to augment the CPUs, offloading data compression and projection. When both of these innovations combine Netezza is hard to beat.

Zone maps are powerful when they can be used in a query plan… but the hardware is only good, not great, when zone maps are not in the plan. FPGAs provided a huge boost when Netezza first came on the scene… but as discussed here they do not provide the same boost today. In addition, FPGAs may limit the ability of a Netezza cluster to handle concurrent queries (see here and especially the comments).

The IBM acquisition has opened up a market of Blue shops to Netezza… so they are selling… and as a result Netezza is here to stay.

Where They Win

Of course, Netezza will win in all-Blue shops.

Netezza wins when there is a naturally sequenced field in each big table that is also used in the predicate for most queries. For example, if data is naturally in date/time sequence and every query has a date/time constraint then Netezza is hard to beat. This is the case most often for focussed data marts or single application databases… so look for Netezza for these sort of problems.

Netezza wins when there are a relatively small number of concurrent queries… and they can win when the queries are complex… as long as the zone map is in the plan.

Netezza can win when the POC is designed such that zone maps may be used in the POC… for example when the POC models only a single data load and the data is pre-sorted… even when the real application would fragment the data (for example… data will not naturally enter the warehouse sequentially by customer number… the same customer will be represented time and again… but if you load once only for a POC then you can sort by customer number and use it in the query predicates).

Note that I am not saying that Netezza is a poor performer when zone maps are not used… it is good… but they would never win a POC if no queries used the zone map.

Where They Lose

Guess what? Netezza loses when the zone maps cannot be used or can be used for only a small fraction of the query workload. Note again that the use of a zone map depends on two factors: the data has to be in sequence over all time, and the queries must use the columns mapped in the predicate. If data enters the system out of sequence then the zone map fragments and eventually loses the ability to speed up queries (a few random out of sequence rows are OK).

This constraint makes it hard for Netezza to service data warehouses where, by definition, lots of different user constituencies come at the data from lots of different directions… rather than always using the path grooved with a zone map.

Netezza was designed when only Sybase IQ had columnar oriented tables… today columnar is in nearly every DW database and this allowed the competition to cut deeply into Netezza’s competitive, zone-map enabled, edge. Teradata columns, Greenplum columns, or the natural column stores can win even when zone maps are on target.

Bottom line: do a POC…

In the Market

I spend most of my time in the general market for data warehousing. You won’t see me offer much of an opinion on HANA for BW, for example… even though there are ten thousand plus BW warehouses I just do not see them in the places I work.

Before Netezza was acquired by IBM they were everywhere… in nearly every POC. Now… not so much. To a very large extent they seem to have been directed into the Blue-only customer base (now that I think about it the same thing happened to the Ascential Data Stage suite of ETL products).

My Guess at the Future

As I noted in the reference above… I think that Netezza will eventually go away from the co-processor strategy.

There have been rumors for several years of design that allowed multiple zone maps. This would be very important… but loading out-of-sequence data, which is the necessary the result, could be very slow.

Netezza has lost some of its edge as other technologies added columnar capabilities to their technologies… and Netezza is surely looking at this… but their architecture which includes an execution engine on the server and on the FPGA makes this more complex than you might suspect. Zone maps and two-stage optimization (one in the server and once in the FPGA) is cool… but a tight coupling of the tricks makes for a difficult time extending and adding new features.

If I were the King of Netezza and I could not find a reasonable way to extend beyond the two tricks that got me here I would go with the flow… I would position Netezza as an extremely easy-to-deploy data mart appliance and hook it tightly (i.e. build in some integration) along-side DB2 and Hadoop… and I would cede the EDW space to DB2 and the Big Data space to Hadoop.

Next up… my 2 Cents on Greenplum

May 1, 2013: Here is an update, or maybe a summary, of my view on Netezza… – Rob