Exalytics – Database Fog Blog

CPUs from retired computers waiting for recycl... — (Photo credit: Wikipedia)

This short post is intended to provide a quick warning regarding in-memory columnar and cpu requirements… with a longer post to follow.

When a row is inserted or bulk-loaded into a DBMS, if there are no indexes, the amount of cpu required is very small. The majority of the time is spent committing a transaction is the time to write a log record to persist the data.

When the same record is reformatted into a column the amount of processing required is significantly higher. The data must be parsed into columns, the values must be compressed, dictionaries may be updated, and the breadcrumbs that let the columnar data be regenerated into rows must be laid. Further, if the columnar structure are to be optimized then the data must be ordered… with a sort or some kind of index structure. I have seen academic papers that suggest that for an insert columnar processing may be 100X more than row processing… and you can see why this could be true (I apologize for not finding the reference… I’ll dig it up… as I recall I read it in a post some time back by Daniel Abadi).

Now let’s think about this… several vendors are suggesting that you can deploy their columnar features with no changes required… no new hardware… in-place. But this does not ring true if the new columnar feature requires 100X extra CPU cycles per row… or 50X… or 10X… unless you are running your database on an empty server.

This claim is a shot at SAP who, more honestly, suggests new hardware with high-end processors for their in-memory columnar product… but methinks it is marketing, not architecture, from these other folks.

Thoughts on in-memory columnar add-ons (dbms2.com)
Simple and fast with IBM BLU Acceleration (ibm.com)
Software Controls Cache Memory to Speed CPUs (spectrum.ieee.org)
Major in memory performance boosts for Oracle 12c (techcentral.ie)

In many of my posts I refer to the issues associated with building “extra” data structures to meet performance goals (see one of my first posts ever here). These extra structures are always a trade-off… slowing the performance of one function in order to speed up another. I thought that it might be helpful to be very clear about where I stand on this.

Indexes improve the performance of queries that address a small set of data. They also can improve join performance if your favorite optimizer can apply an index intersection to the execution plan for your queries. Indexes dramatically slow the performance of inserts, updates, and bulk data loads as they have to be maintained when data changes. You can mitigate the cost and update indexes in the background… the trade-off does not go away. Indexes are probably required for OLTP applications that pick out single rows.

Wouldn’t it be great if your favorite DBMS could resolve every query very fast without the overhead and operational effort associated with maintaining indexes? Certainly we should aspire to a read-optimized database, a data warehouse DBMS, that does not require indexes.

Vertica projections provide an optimized, materialized, view that improves the performance for a set of queries. The Vertica optimizer automatically selects the optimal projection. Vertica provides a very slick tool that builds projections based on the query set provided. I worded my post on Vertica a little vague… so let me be sure here to point out that every Vertica query runs against a projection… so it is possible to have only one. In this case there is no additional overhead. Adding projections slows the data load process and increases the storage requirements. This is the trade-off.

Other databases offer materialized views. They make the same trade-off as above.

An OLAP cube is a physical structure that pre-aggregates data so that your query workload can avoid the aggregation. The best implementations of this express the cube as a materialized view so that queries can use the pre-aggregated data without explicitly pointing at a cube structure… the optimizer picks it for you. In addition the best implementations let you drill out of the cube to the detail records. These products have the update/delete/load issues of an index plus add an extra data latency issue as the data has to be aggregated on some interval… usually hours or days. Many products do not allow joins from a cube. You can see the trade-off. The Oracle Exalytics product materializes the aggregated cube on a separate server in-memory. This provides even more performance but adds the system and operational overhead of moving data across system boundaries.

Wouldn’t it be nice if you could query raw data and perform aggregation so fast that even against terabytes of data you could run any query with 3 second or less response without the overhead of building cubes?

You may build specialized table structures and pre-join, pre-aggregate, or pre-compute data to make a set of queries run fast. The cost of building and maintaining this sort of implementation versus just querying the base tables is the trade-off. Further, this approach is sort of a trap. You cannot build these structures for every query… if you did the business would conceive another critical query the next day that required work.

You can add indexes to the structures built using the technique above and provide very fast application-specific performance to a small set of queries. This is currently the favored approach when companies build iOS or Android apps as it provides the best possible performance… at a significant price.

Wouldn’t it be great if this was unnecessary… you could just scan so fast that mobile response service levels could be met from the base data regardless of the query.

You can deploy redundant data in operational data stores, data marts, cube servers, analytic data stores, and so on… with each specialized store providing performance for some limited set of queries at the cost of development and support ongoing. Each of these copies could deploy specialized database products that speed up that set of queries a little more. Again, this surround-the-EDW approach is a trap that leads to the proliferation of data marts and of database technologies.

Please do not take that last paragraph the wrong way… I believe that the worst possible approach is to blindly standardize on one or two database products. This trade-off makes life convenient for the IT department at the expense of performance and agility in the business. It is OK to have one or two favored products but IT must always serve the business to the best of their ability as a first priority… and sometime the new start-up has just the thing (remember that once Teradata was a start-up and DB2 on the mainframe was the IT standard…).

What I wish was that one or two products could solve all of the performance and functionality problems without the cost of building “extra” stuff… one product would be better that two. I like products that make the extra stuff “free”. Netezza does a nice job of making zone maps “free”, for example. Teradata and Greenplum provide the option of row store or column store for “free”. Vertica automatically build extra projections for “cheap”… and while there is a cost to the projection it at least does not require staff to tune it up. Oracle materialized views are “cheap”.

What I dislike are products that require DBAs to work harder and harder to apply all of the techniques above to meet performance SLAs. Each of these techniques trades off performance for development and operational expense.

As I have noted before… the performance SLAs for BI are about to become severe as companies try to support BI on mobile devices. The development and operational costs of tuning up; that is the TCO; will be significant unless better, faster, software infrastructure becomes available.

The TCO for a database that could eliminate these extra constructs and could eliminate the cost of developing and maintaining them; and could eliminate the architectural fragility these approaches imply… and replace this with a DBMS that holds base data which could satisfy all queries in seconds; delivering the business agility this implies… the TCO would be compelling.

I actually believe that the answer is available in the market today… this is no longer a pipe dream… more later…

There seems to be a sort of odd tradition for bloggers to look back at the past year as the New Year starts to unfold. Here is my review of my posts and some presents

…

My Favorite Post

I liked this the best… ’nuff said: What is Big Data?

OK, here is my 2nd favorite: A Quick Five Minute Rule Update for In-memory Databases, but you probably need to read the prequel first: The Five Minute Rule and In-memory Databases

These papers and the underlying thinking by smarter folks than I will inform you about the definition of Hot Data from the point of pure IT economics.

The Most Under-rated Post

This is the post I thought was the most important… as it might strongly influence data warehouse platform buying decisions over the next few years… And it might even influence the stocks you pick: The Future of Hadoop and Big Data DBMSs

Some Other Posts to Read

Here are two posts that informed me:

The Five Minute Rule… This will point you to a Wikipedia article that will point you to the whole series of papers.

What Every Programmer Should Know About Memory… This paper goes into gory detail about how memory works inside a processor. It is hardware-centric for you software folks… but provides the basis for understanding why in-memory DBMSs are fast and why Exadata is not an in-memory DBMS.

And some other Good Stuff

Kevin Closson on Exadata

Google Research

Thank you for your attention last year. I hope that each of you has a safe, prosperous, and happy new year…

– Rob

I posted a blog on the SAP site here that discussed the implications of mobile clients. I want to re-emphasize the issue as it is crucial.

While at Greenplum we routinely replaced older EDW platforms and provided stunning performance. I recall one customer in particular where we were given a query that ran in 7 hours and Greenplum executed the query in seven seconds. This was exceptional… more typical were cases where we reduced run-times from several hours to under 30 minutes… to 10 minutes… to 5 minutes. I’m sure that every major competitor: Teradata, Greenplum, Netezza, and Exadata has similar stories to tell.

But 5 minutes will not cut it if you are servicing a mobile client where sub-second response to the device is a requirement… and 10 minutes is out of the question. It does not matter if it ran in 10 hours before… 10 minute response is not acceptable to a mobile device.

Today we see sub-second response delivered to our phones by custom applications built on special high-performance platforms designed specifically to service a mobile client: iPhones, iPads, and Android devices.

But what will we do about the BI applications built on commercial platforms which have just used every trick in the book to become one of the 5 minute stories mentioned above?

I think that there are only a couple of architectural choices.

We can rewrite the high-value queries as custom applications using specialized infrastructure… at great expense… and leaving the vast majority of queries un-serviced.
We can apply the 80/20 rule to get the easiest queries serviced with only 20% of the effort. But according to Murphy the 20% left will be the highest value queries.
We can tack on expensive, specialized, accelerators to some queries… to those that can be accelerated… but again we leave too much behind.
Or we can move to a general purpose high performance computing platform that can service the existing BI workload with sub-second response.

In-memory computing will play a role… Exalytics provides option #3… HANA option #4.

SSD devices may play a role… but the performance improvements being quoted by vendors who use SSD as a block I/O device is 10X or less. A 10X improvement applied to a query that was just improved to 10 minutes yields a 1 minute query… still not the expected level of service.

IT departments will have to evaluate the price/performance, not just the price, as they consider their next platform purchases. The definition of adequate response is changing… and the old adequate, at the least cost, may not cut it. Mobile clients are here to stay. The productivity gains expected from these devices is significant. High performance BI computing is going to be a requirement.

As you may have noticed I’m looking at in-memory databases (IMDB) these days… Here are some quick architectural observations on VMWare‘s SQLFire, Oracle’s Exalytics and TimesTen offerings, and SAP HANA.

It is worth noting up front that I am looking to see how these products might be used to build a generalized data mart or a data warehouse… In other words I am not looking to compare them for special case applications. This is important because each of these products has some extremely cool features that allow them to be applied to application-specific purposes with a narrow scope of data and queries… maybe in a later blog I can try to look at some narrow use-cases.

Further, to make this quick blog tractable I am going to assume that the mart/dw problem to be solved requires more data than can fit on one server node… and I am going to ignore features that let queries access data that resides on disk… in-memory or bust.

Finally I will assume that the SQL dialect supported is sufficient and not drill into details there. I will look at architecture not SQL features…

Simply put I am going to look at a three characteristics:

Will the architecture support ad hoc queries?
Does the architecture support scale-out?
Can we say anything with regards to price/performance expectations?

Exalytics is a smart-aggregate store that sits over an Oracle database to offload aggregate query workload (see my previous post here or the Rittman Mead post here which declares: “Oracle Exalytics uses a specially enhanced version of Oracle TimesTen, Oracle’s in-memory database, to cache commonly used aggregates used in dashboards, analyses and other BI objects.” Exalytics does not support a scale-out shared-nothing architecture but it can scale up by adding nodes with new aggregate data. Queries access data within the aggregate structure and it is not possible to join to data off the Exalytics node… so ad hoc is out. Within these limits, which preclude Exalytics from being considered as a general platform for a mart or warehouse, Exalytics provides dictionary-based compression which should provide around 5X compression to reduce the amount of memory required and reduce the amount of hardware required.

TimesTen can do more. It is a general RDBMS. But it was designed for OLTP. I assume that the reason that Oracle has not rolled it out as a general-purpose data mart or data warehouse has to do with constraints that grow from those OLTP architectural roots. For example, BI queries run longer and require more data than a OLTP query… and even with data in-memory temporary storage is required for each query… and memory utilization is a product of the amount of data required and the amount of time the data has to inhabit memory… so BI queries put far more pressure on an in-memory DBMS. There are techniques to mitigate this… but you have to build the techniques in from the ground up.

I imagine that this is why TimesTen works for Exalytics, though. A OLAP query against a pre-aggregated cube does not graze an entire mart or warehouse. It is contained and “small data” (for my wacky take re: Exalytics and Exadata see here).

TimeTen is not sharded… so scalability is an issue. Oracle gets around this nicely by allowing you to partition data across instances and have the application route queries to the appropriate server. But this approach will not support joins across partitions so it severely limits scalability in a general-purpose mart or warehouse.

SQLFire is a very interesting new product built on top of Gemfire… and therefore mature from the start. SQLFire is more scalable than TimesTen/Exalytics. It supports sharded data in a cluster of servers. But SQLFire has the limitation that it cannot join data across shards (they call them partitions… see here) so it will be hard to support ad hoc queries… They provide the ability to replicate tables to support any sort of joins. If, for example, you replicate small dimension tables to coexist with sharded fact tables all joins are supported. This solution is problematic if you have multiple fact tables which must be joined… and replication of data uses more memory… but SQLFire has the foundation in place to become BI-capable over time.

Performance in an in-memory database comes first and foremost from eliminating disk I/O. All three IMDB product provide this capability. Then performance comes from the efficient use of compression. TimeTen incorporates Oracles dictionary-based “columnar” compression (I so hate this term… it is designed to make people think that Oracle products are sort-of columnar… but so far they are not). Then performance comes from columnar projection… the ability to avoid touching all data in a row to process a query. Neither TimesTen nor SQLFire are columnar databases. Then performance comes from parallel execution. Neither TimesTen nor SQLFire can involve all cores on a single query to my knowledge.

Price comes from compression as well. The more highly compressed the data is the less memory required to store it. Further, if data can be used without decompressing it, then less working memory is required. As noted, TimesTen has a compression capability. SQLFire does not appear to compress data. Neither can use compressed data. Note that 2X compression cuts the amout of memory/hardware required in half or more… 4X cuts it to a quarter… and so on. So this is significant.

Now for some transparency… I started the research for this blog, and composed a 1st draft, last Spring while I was at EMC Greenplum. I am now at SAP working with HANA. So… I will not go into HANA at great length… but I will point out that: HANA fully supports a shared-nothing architetcture… so it is fully scalable; HANA is fully parallel and able to use all cores for each query; HANA fully supports columnar tables so it provides deep compression and the ability to use the compressed data in execution. This is not remarkable as HANA was designed from the bottom up to support both BI and OLTP workloads while TimesTen and SQLFire started from a purely OLTP architectural foundation.

References:

vFabric SQLFire User’s Guide

Oracle Times Ten In-Memory Database Architectural Overview

English: Hawaiian spear fisherman near Hana; Maui, Hawai‘i. ca. 1890. (Photo credit: Wikipedia)

Here are links to several commercial posts on the Experience HANA Blog FYI…

The Five Minute Rule and HANA: This is a rehash of my posts here applying the famous Five Minute Rule to in-memory databases.

HANA & Exalytics: There is Barely Any Comparison: This is a rehash of my post here pointing out that Exalytics and HANA do not really compete.

HANA vs. Teradata – Part 1: This is a response to some poor thinking posted by Teradata. There is some new content that could be worth a look.

HANA vs. Teradata – Part 2: This continues the response… but it is a rehash of the post here on the rational economics of in-memory databases. Frankly, I had just reread the Teradata posts and wrote this while still annoyed… as a result it is a little flip and despite the junk posted by Teradata I might have shown them a little more respect…

Exalytics vs. Exadata: This post suggests some oddness in Oracle’s positioning of Exalytics and Exadata… maybe worth a look.

I was recently reminded of a couple of papers written by Jim Gray and Gianfranco Putzolu that calculated the cost of keeping data in memory vs the cost of paging it in from disk. I was happy to see that the thread was being kept alive by Goetz Graefe.

These papers used the cost of each media to determine how “hot” data needed to be to be cost-effectively stored in-memory. The 1987 five minute rule (click here to reference the original papers) was so named because at that time and based on the relative costs of CPU, Memory, and Disk; a 1KB record that was accessed every five minutes could be effectively stored in memory and a 4KB block of data broke-even at two minutes.

In 2009, with CPU prices coming down but the number of instructions executed per second going up, and with memory and prices down, the break-even point between keeping 4KB in memory or on a SATA disk was 90 minutes.

Let’s be clear about what this means. Based solely on the cost of CPUs, RAM, and SATA drives; any data that is accessed more frequently than each 90 minutes should be kept in memory. This does not include any ROI based on the business benefits of a speedy response. It does not adjust for data compression which allows more than 4KB of user data to use 4KB of RAM. Just pure IT economics gets us to this point.

So… if you have data in a data warehouse or a mart that is touched by a query at least once every 90 minutes… it is wasteful to store it on disk. If you have an in-memory database than can compress the data 2X and use it in its compressed form, then the duration goes up to 180 minutes. You do not have to look any further than this to find the ROI for an in-memory data base (IMDB).

OLAP searches a set of pre-aggregated data… a cube. If the cube is large enough that you don’t bump into the edges you might think that your search is ad hoc… but that is an illusion. The set is prescribed not ad hoc.

In the 1980’s we sent paper reports out… they were moved on a pallet with a fork-lift. The reports aggregated key metrics to many levels in a hierarchy sliced and diced across many dimensions. Today we take the lines off the reports and store them digitally in a cube and provide tools to let users navigate the cube to build their reports. What they build looks, to a large extent, like the reports from the 80’s.

Data warehousing provides more data and better data… so there are more cubes, more dimensions, more reports… and hopefully more business intelligence. But these reports provide 1980’s quality business intelligence on a screen instead of on paper… bounded by the OLAP cube.

When you hear folks talk about data science and data mining and advanced analytics and optimization… they are talking about advanced mathematical treatment of the data… know that this is going to require technology that is beyond the capabilities of a OLAP engine.

Exalytics is a OLAP engine. Here are some Exalytics use cases from a proponent. They are about OLAP dashboards… good stuff… but hardly advanced analytics. Oracle says that Exalytics is engineered for Extreme Analytics. If we agree that “extreme” analytics is not in any way advanced… then I agree.

In the previous blogs on this topic (Part 1, Part 2, Part 3) I suggested that:

Shared-nothing is required for an EDW,
An EDW is not usually under-utilized,
There are difficulties in re-distributing sharded, shared-nothing data to provide elasticity, and
A SAN cannot provide the same IO bandwidth per server as JBOD… nor hit the same price/performance targets.

Note that these issues are tied together. We might be able to spread the EDW workload over so many shards and so many SANs that the amount of I/O bandwidth per GB of EDW data is equal to or greater than that provided on a DW Appliance. This introduces other problems as there are typically overhead issues with a great many nodes. But it could work.

But what if we changed the architecture so that I/O was not the bottleneck? What if we built a cloud-based shared-nothing in-memory database (IMDB)? Now the data could live on SAN as it would only be read at start-up and written at shut-down… so the issues with the disk subsystem disappear… and issues around sharing the SAN disappear. Further, elasticity becomes feasible. With an IMDB we can add and delete nodes and re-distribute data without disk I/O… in fact it is likely that a column store IMDB could move column-compressed data without re-building rows. IMDB changes the game by removing the expense associated with disk I/O.

There is evidence emerging that IMDB technology is going to change the playing field (see here).

Right now there are only a few IMDB products ready in the market:

TimeTen: which is not shared-nothing scalable, nor columnar, but could be the platform for a very small, 400GB or less (see here), cloud-based EDW;
SQLFire: which is semi-shared-nothing scalable (no joins across shards), not columnar, but could be the platform for a larger, maybe 5TB, specialized EDW;
ParAccel: which is shared-nothing scalable, columnar, but not fully an IMDB… but could be (see C. Monash here); or
SAP HANA: which is shared-nothing, IMDB, columnar and scalable to 100TB (see here).

So it is early… but soon enough we should see real EDWs in the cloud and likely on Amazon EC2, based on in-memory database technologies.

I found myself wondering where did the rule-of-thumb for Exalytics that suggests that TimesTen can use 800GB of a 1TB memory space… and requires 400GB of that space for work tables leaving room for 400GB of user data… come from (it is quoted everywhere… here is an example… see question #13).

Sure enough, this rule has been around for a while in the TimesTen literature… in fact it predates Exalytics (see here).

Why is this important? The workspace per query for a TPC-A transaction is very small and the amount of time the memory is held by a TPC-A transaction is very short. But the workspace required by a TPC-H query is at least 10X the space required by a TPC-A query and the duration of a TPC-H query is at least 10X the duration of a TPC-A query. The result is at least 100X more pressure on memory utilization.

So… I suspect that the 600GB of user data I calculated here may be off by more than a little. Maybe Exalytics can support 300GB of user data or 100GB of user data or maybe 60GB?

Note that this is not bad… all of this pressure on memory is still moved to Exalytics from the Exadata RAC subsystem… where memory is dear.

As a side note… it is always important to remember that the pressure on memory is the amount of memory utilized times the duration of the utilization. This is why the data flow architecture used in modern databases like Greenplum are effective. Greenplum uses more memory per transaction but it holds the memory for less time by never (almost) writing it to disk. This is different from older database architectures like Teradata and Oracle which use disk to store intermediate results… lowering the overall amount of memory required but increasing the duration of the query. More on this here…

Tag: Exalytics

CPUs and HW for HANA, BLU, Hekaton, and Oracle 12c

Indexes are not a good thing… A blog on TCO

A Look Back at 2012