A reader recently wrote to me and asked about Netezza: “why does everyone insist that these (zone maps) tell you where ‘not to look’ when hunting for data?”. I’ll provide a direct answer… and a more meaningful answer.
Imagine that you have a list of data blocks with some metadata for each block that tells you the range of data in each block for a given column: FOO as follows:
- Block 1: FOO 0-173
- Block 2: FOO 174-323
- Block 3: FOO 324-500
and a query that selects WHERE FOO=42.
If Netezza scanned the metadata and sent its read routine the list of blocks to not read… it would send 2,3. This is clearly not the case… if there are a million blocks it would not send a list of 999999 block numbers to not read… and force the read routine to figure out what was left to read. So clearly Netezza does not really tell you where ‘not to look’. This is a clever turn-of-phrase.
But I like this particular cleverness. Every DBMS is built with features designed to avoid I/O:
- indexes are metadata that point to blocks to reduce reading unnecessary blocks;
- partitions contain metadata to reduce reading unnecessary partitions;
- column stores are designed to reduce reads of unnecessary columns; and
- caches hold hot data blocks to avoid re-reading those blocks.
In fact, the highest performance DBMS will almost always be the one that most effectively minimizes I/O. This is why in-memory databases always have the highest performance.
So while zone maps do not really tell the system directly what not to read… the effect is ‘not to look’ at unnecessary data.