How DBMS Vendors Admit to an Architectural Limitation: Part 3 – EDW on IBM z/OS

This is the 3rd and final example of a vendor admitting, without admitting, to an architectural limitation. The first two parts on Exadata and Teradata are here and here.

Teradata started to get real traction in the EDW space with a shared-nothing architecture in the late 1980’s. At that time the only real competition was DB2 on an IBM mainframe. From those days until just a couple of years ago IBM insisted that for MVS, then z/OS, customers should stick to the mainframe for their data warehouses and marts. There was some dabbling with sharded data in DB2 for z/OS… and Teradata made some in-roads… as did Netezza… but IBM insisted that there was no reason not to stay Blue. DB2 on AIX and then LINUX appeared… and both offered a better price/performance option than DB2 ob Z/OS… but the faithful stayed faithful for the most part.

Then IBM bought Netezza, a pure shared-nothing microprocessor-based machine, and the recommendation changed. Today IBM recommends the Analytics Accelerator, based on Netezza, to mainframe users who want to deploy an EDW. This is an admission, with no admission at all, that there was all along an architectural advantage to shared-nothingness.

If you search this blog for “Netezza” you can get my perspective on that technology. But to be blunt, the Analytics Accelerator is not IBM’s best EDW platform… DB2 LUW is by far… and with BLU LUW is better still.

I have made it clear in my previous posts that I consider it lazy for an IT shop to commit to a vendor or to a product. As engineers we need to embrace change. For IBM z/OS shops this means a realistic look at non-z/OS alternatives to deploy or to re-deploy an EDW. It makes no sense to build a data warehouse or a data mart directly on z/OS. Use the Analytics Accelerator or, better still, open the competition to better products like DB2 LUW, Teradata, Vertica, etc.

References

Database Fog Blog

Other

Netezza Zone Maps and I/O Avoidance

A reader recently wrote to me and asked about Netezza: “why does everyone insist that these (zone maps) tell you where ‘not to look’ when hunting for data?”. I’ll provide a direct answer… and a more meaningful answer.

Imagine that you have a list of data blocks with some metadata for each block that tells you the range of data in each block for a given column: FOO as follows:

  • Block 1: FOO 0-173
  • Block 2: FOO 174-323
  • Block 3: FOO 324-500

and a query that selects WHERE FOO=42.

If Netezza scanned the metadata and sent its read routine the list of blocks to not read… it would send 2,3. This is clearly not the case… if there are a million blocks it would not send a list of 999999 block numbers to not read… and force the read routine to figure out what was left to read. So clearly Netezza does not really tell you where ‘not to look’. This is a clever turn-of-phrase.

But I like this particular cleverness. Every DBMS is built with features designed to avoid I/O:

  • indexes are metadata that point to blocks to reduce reading unnecessary blocks;
  • partitions contain metadata to reduce reading unnecessary partitions;
  • column stores are designed to reduce reads of unnecessary columns; and
  • caches hold hot data blocks to avoid re-reading those blocks.

In fact, the highest performance DBMS will almost always be the one that most effectively minimizes I/O. This is why in-memory databases always have the highest performance.

So while zone maps do not really tell the system directly what not to read… the effect is ‘not to look’ at unnecessary data.