More on the Oracle 12c In-Memory Option

English: a human brain in a jar
In-memory DBMS in a jar (Photo credit: Wikipedia)

I recently listened to a session by Juan Loalza of Oracle on the 12c In-memory option. Here are my notes and comments in order…

There is only one copy of the data on disk and that is in row format (“the column format does not exist on disk”). This has huge implications. Many of the implications come up later in the presentation… but consider: on start-up or recovery the data has to be loaded from the row format and converted to a columnar format. This is a very expensive undertaking.

The in-memory columnar representation is a fully redundant copy of the row format OLTP representation. Note that this should not impact performance as the transaction is gated by the time to write to the log and we assume that the columnar tables are managed by MVCC just like the row tables. It would be nice to confirm this.

Data is not as compressed in-memory as in the hybrid-columnar-compression case. This is explained in my discussion of columnar compression here.

Oracle claims that an in-memory columnar implementation (level 3 maturity by my measure see here) is up to 100X faster than the same implementation in a row-based form. Funny, they did not say that the week before OOW? This means, of course, that HANA, xVelocity, BLU, etc. are 100X faster today.

They had a funny slide talking about “reporting” but explained that this is another word for aggregation. Of course in-memory vector aggregation is faster in-memory.

There was a very interesting discussion of Oracle ERP applications. The speaker suggested that there is no reporting schema for these apps and that users therefore place indexes on the OLTP database to provide performance for reporting and analytics. It was suggested that a typical Oracle E-Business table would have 10-20 indexes on it and it was not unusual to see 30-40 indexes. It was even mentioned that the Siebel application main table could require 90 indexes. It was suggested that by removing these indexes you could significantly speed up the OLTP performance, speed up reporting and analytics, cure cancer and end all wars (OK… they did not suggest that you could cure cancer and end war… this post was just getting a little dry).

The In-memory option is clusterable. Further, because a RAC cluster uses shared disk and the im-memory data is not written to disk there is a new shared-nothing implementation included. This is a very nice and significant architectural advance for Oracle. It uses the direct-to-wire infiniband protocol developed for Exadata to exchange data. Remember when Larry dissed Teradata and shared-nothingness… remember when Larry dissed in-memory and HANA… remember when Teradata dissed HANA as nonsense and said SAP was over-reacting to Oracle. Gotta smile :).

Admins need to reserve memory from the SGA for the in-memory option. This is problematic for Exadata as the maximum memory on a RAC node is 256GB…  My bad… Exadata X3-8 supports 2TB of memory per node… this is only problematic for Exadata X3-2 and below… – Rob

It is possible to configure a table partition into memory while leaving other partitions in row format on disk.

It is suggested again that analytic indexes may be dropped to speed up OLTP.

The presenter talked about how the architecture, which does not change the row-based tables in any way, allows all of the high-availability options available with Oracle to continue to exist and operate unchanged.

But the beautiful picture starts to dull a little here. On start-up or first access… i.e. during recovery… the in-memory columnar data is unavailable and loaded asynchronously from the row format on disk. Note that it is possible to prioritize the order tables are loaded to get high-priority data in-memory first… a nice feature. During this time… which may be significant… all analytic queries are run against the un-indexed row store. Yikes! This kills OLTP performance and destroys analytic performance. In fact, the presenter suggested that maybe you might keep some indexes for reports on your OLTP tables to mitigate this.

Now the story really starts to unravel. Indexes are required to provide performance during recovery… so if your recovery SLA’s cannot be met without indexes then you are back to square one with indexes that are only used during recovery but must be maintained always, slower OLTP, and the extra requirement for a redundant in-memory columnar data image. I imagine that you could throttle some reports after an outage until the in-memory image is rebuilt… but the seamless operations during recovery using standard Oracle HA products is a bit of a stretch.

Let me raise again the question I asked in my post last week on this subject (here)… how are joins processed between the row format and the columnar format. The presenter says that joins are no problem… but there are really only two ways… maybe three ways to solve for this:

  1. When a row-to-columnar join is identified the optimizer converts the columnar table to a row form and processes the join using the row engine. This would be very very slow as there would be no indexes on the newly converted columnar data to facilitate the join.
  2. When a row-to-columnar join is identified the optimizer pushes down aggregation and projection to the columnar processing engine and converts the columnar result to a row form and processes the join using the row engine. This would be moderately slow as there would be no indexes on the newly converted columnar data to facilitate the join (i.e. the columnar fact would have no indexes to row dimensions or visa versa).
  3. When a row-to-columnar join is identified the optimizer converts the row table to a columnar form and processes the join using the columnar engine. This could be fast.

Numbers two and three are the coolest options… but number three is very unlikely due to the fact that columnar data is sharded in a shared-nothing manner and the row data is not… and number two is unlikely because if it were the case Oracle would surely be talking about it. Number one is easy and the likely state in release 1… but these joins will be very slow.

Finally, the presenter said that this columnar processing would be implemented in Times Ten and then in the Exalytics machine. I do not really get the logic here? If a user can aggregate in their OLTP system in a flash why would they pre-aggregate data and pass it to another data stovepipe? If you had to offload workload from your OLTP system why wouldn’t you deploy a small, standard, Oracle server with the in-memory option and move data there where, as the presenter suggested, you can solve any query fast… not just the pre-aggregated queries served by Exalytics. Frankly, I’ve wondered why SAP has not marketed a small HANA server as an Exalytics replacement for just this reason… more speed… more agility… same cost?

There you have it… my half a cents… some may say cents-less evaluation.

I will end this with a question for my audience (Ofir… I’ll provide a link to your site if you post on this…)… how do we suppose the in-memory option supports bulk data load? This has implications for data warehousing…

Here, of course, is the picture I should have used above… labeled as the in-memory database of your favorite vendor:

abbyn

Mobile Clients Require High Performance BI Computing

English: An image of an iPad 2.
(Photo credit: Wikipedia)

I posted a blog on the SAP site here that discussed the implications of mobile clients. I want to re-emphasize the issue as it is crucial.

While at Greenplum we routinely replaced older EDW platforms and provided stunning performance. I recall one customer in particular where we were given a query that ran in 7 hours and Greenplum executed the query in seven seconds. This was exceptional… more typical were cases where we reduced run-times from several hours to under 30 minutes… to 10 minutes… to 5 minutes. I’m sure that every major competitor: Teradata, Greenplum, Netezza, and Exadata has similar stories to tell.

But 5 minutes will not cut it if you are servicing a mobile client where sub-second response to the device is a requirement… and 10 minutes is out of the question. It does not matter if it ran in 10 hours before… 10 minute response is not acceptable to a mobile device.

Today we see sub-second response delivered to our phones by custom applications built on special high-performance platforms designed specifically to service a mobile client: iPhones, iPads, and Android devices.

But what will we do about the BI applications built on commercial platforms which have just used every trick in the book to become one of the 5 minute stories mentioned above?

I think that there are only a couple of architectural choices.

  1. We can rewrite the high-value queries as custom applications using specialized infrastructure… at great expense… and leaving the vast majority of queries un-serviced.
  2. We can apply the 80/20 rule to get the easiest queries serviced with only 20% of the effort. But according to Murphy the 20% left will be the highest value queries.
  3. We can tack on expensive, specialized, accelerators to some queries… to those that can be accelerated… but again we leave too much behind.
  4. Or we can move to a general purpose high performance computing platform that can service the existing BI workload with sub-second response.

In-memory computing will play a role… Exalytics provides option #3… HANA option #4.

SSD devices may play a role… but the performance improvements being quoted by vendors who use SSD as a block I/O device is 10X or less. A 10X improvement applied to a query that was just improved to 10 minutes yields a 1 minute query… still not the expected level of service.

IT departments will have to evaluate the price/performance, not just the price, as they consider their next platform purchases. The definition of adequate response is changing… and the old adequate, at the least cost, may not cut it. Mobile clients are here to stay. The productivity gains expected from these devices is significant. High performance BI computing is going to be a requirement.

SQLFire, Exalytics, TimesTen, and HANA… a quick comparison

Gemfire

As you may have noticed I’m looking at in-memory databases (IMDB) these days… Here are some quick architectural observations on VMWare‘s SQLFire, Oracle’s Exalytics and TimesTen offerings, and SAP HANA.

It is worth noting up front that I am looking to see how these products might be used to build a generalized data mart or a data warehouse… In other words I am not looking to compare them for special case applications. This is important because each of these products has some extremely cool features that allow them to be applied to application-specific purposes with a narrow scope of data and queries… maybe in a later blog I can try to look at some narrow use-cases.

Further, to make this quick blog tractable I am going to assume that the mart/dw problem to be solved requires more data than can fit on one server node… and I am going to ignore features that let queries access data that resides on disk… in-memory or bust.

Finally I will assume that the SQL dialect supported is sufficient and not drill into details there. I will look at architecture not SQL features…

Simply put I am going to look at a three characteristics:

  • Will the architecture support ad hoc queries?
  • Does the architecture support scale-out?
  • Can we say anything with regards to price/performance expectations?

Exalytics is a smart-aggregate store that sits over an Oracle database to offload aggregate query workload (see my previous post here or the Rittman Mead post here which declares: “Oracle Exalytics uses a specially enhanced version of Oracle TimesTen, Oracle’s in-memory database, to cache commonly used aggregates used in dashboards, analyses and other BI objects.” Exalytics does not support a scale-out shared-nothing architecture but it can scale up by adding nodes with new aggregate data. Queries access data within the aggregate structure and it is not possible to join to data off the Exalytics node… so ad hoc is out. Within these limits, which preclude Exalytics from being considered as a general platform for a mart or warehouse, Exalytics provides dictionary-based compression which should provide around 5X compression to reduce the amount of memory required and reduce the amount of hardware required.

TimesTen can do more. It is a general RDBMS. But it was designed for OLTP. I assume that the reason that Oracle has not rolled it out as a general-purpose data mart or data warehouse has to do with constraints that grow from those OLTP architectural roots. For example, BI queries run longer and require more data than a OLTP query… and even with data in-memory temporary storage is required for each query… and memory utilization is a product of the amount of data required and the amount of time the data has to inhabit memory… so BI queries put far more pressure on an in-memory DBMS. There are techniques to mitigate this… but you have to build the techniques in from the ground up.

I imagine that this is why TimesTen works for Exalytics, though. A OLAP query against a pre-aggregated cube does not graze an entire mart or warehouse. It is contained and “small data” (for my wacky take re: Exalytics and Exadata see here).

TimeTen is not sharded… so scalability is an issue. Oracle gets around this nicely by allowing you to partition data across instances and have the application route queries to the appropriate server. But this approach will not support joins across partitions so it severely limits scalability in a general-purpose mart or warehouse.

SQLFire is a very interesting new product built on top of Gemfire… and therefore mature from the start. SQLFire is more scalable than TimesTen/Exalytics. It supports sharded data in a cluster of servers. But SQLFire has the limitation that it cannot join data across shards (they call them partitions… see here) so it will be hard to support ad hoc queries… They provide the ability to replicate tables to support any sort of joins. If, for example, you replicate small dimension tables to coexist with sharded fact tables all joins are supported. This solution is problematic if you have multiple fact tables which must be joined… and replication of data uses more memory… but SQLFire has the foundation in place to become BI-capable over time.

Performance in an in-memory database comes first and foremost from eliminating disk I/O. All three IMDB product provide this capability. Then performance comes from the efficient use of compression. TimeTen incorporates Oracles dictionary-based “columnar” compression (I so hate this term… it is designed to make people think that Oracle products are sort-of columnar… but so far they are not). Then performance comes from columnar projection… the ability to avoid touching all data in a row to process a query. Neither TimesTen nor SQLFire are columnar databases. Then performance comes from parallel execution. Neither TimesTen nor SQLFire can involve all cores on a single query to my knowledge.

Price comes from compression as well. The more highly compressed the data is the less memory required to store it. Further, if data can be used without decompressing it, then less working memory is required. As noted, TimesTen has a compression capability. SQLFire does not appear to compress data. Neither can use compressed data. Note that 2X compression cuts the amout of memory/hardware required in half or more… 4X cuts it to a quarter… and so on. So this is significant.

Now for some transparency… I started the research for this blog, and composed a 1st draft, last Spring while I was at EMC Greenplum. I am now at SAP working with HANA. So… I will not go into HANA at great length… but I will point out that: HANA fully supports a shared-nothing architetcture… so it is fully scalable; HANA is fully parallel and able to use all cores for each query; HANA fully supports columnar tables so it provides deep compression and the ability to use the compressed data in execution. This is not remarkable as HANA was designed from the bottom up to support both BI and OLTP workloads while TimesTen and SQLFire started from a purely OLTP architectural foundation.

References:

vFabric SQLFire User’s Guide

Oracle Times Ten In-Memory Database Architectural Overview

Commercial Post Update: HANA and Exalytics and Teradata and IMDB Economics

English: Hawaiian spear fisherman near Hana; M...
English: Hawaiian spear fisherman near Hana; Maui, Hawai‘i. ca. 1890. (Photo credit: Wikipedia)

Here are links to several commercial posts on the Experience HANA Blog FYI…

The Five Minute Rule and HANA: This is a rehash of my posts here applying the famous Five Minute Rule to in-memory databases.

HANA & Exalytics: There is Barely Any Comparison: This is a rehash of my post here pointing out that Exalytics and HANA do not really compete.

HANA vs. Teradata – Part 1: This is a response to some poor thinking posted by Teradata. There is some new content that could be worth a look.

HANA vs. Teradata – Part 2: This continues the response… but it is a rehash of the post here on the rational economics of in-memory databases. Frankly, I had just reread the Teradata posts and wrote this while still annoyed… as a result it is a little flip and despite the junk posted by Teradata I might have shown them a little more respect…

Exalytics vs. Exadata: This post suggests some oddness in Oracle’s positioning of Exalytics and Exadata… maybe worth a look.

OLAP is not advanced analytics

OLAP searches a set of pre-aggregated data… a cube. If the cube is large enough that you don’t bump into the edges you might think that your search is ad hoc… but that is an illusion. The set is prescribed not ad hoc.

In the 1980’s we sent paper reports out… they were moved on a pallet with a fork-lift. The reports aggregated key metrics to many levels in a hierarchy sliced and diced across many dimensions. Today we take the lines off the reports and store them digitally in a cube and provide tools to let users navigate the cube to build their reports. What they build looks, to a large extent, like the reports from the 80’s.

Data warehousing provides more data and better data… so there are more cubes, more dimensions, more reports… and hopefully more business intelligence. But these reports provide 1980’s quality business intelligence on a screen instead of on paper… bounded by the OLAP cube.

When you hear folks talk about data science and data mining and advanced analytics and optimization… they are talking about advanced mathematical treatment of the data… know that this is going to require technology that is beyond the capabilities of a OLAP engine.

Exalytics is a OLAP engine. Here are some Exalytics use cases from a proponent. They are about OLAP dashboards… good stuff… but hardly advanced analytics. Oracle says that Exalytics is engineered for Extreme Analytics. If we agree that “extreme” analytics is not in any way advanced… then I agree.

Cloud Computing and Data Warehousing: Part 4 – IMDB Data Warehouse in a Cloud

In the previous blogs on this topic (Part 1, Part 2, Part 3) I suggested that:

  1. Shared-nothing is required for an EDW,
  2. An EDW is not usually under-utilized,
  3. There are difficulties in re-distributing sharded, shared-nothing data to provide elasticity, and
  4. A SAN cannot provide the same IO bandwidth per server as JBOD… nor hit the same price/performance targets.

Note that these issues are tied together. We might be able to spread the EDW workload over so many shards and so many SANs that the amount of I/O bandwidth per GB of EDW data is equal to or greater than that provided on a DW Appliance. This introduces other problems as there are typically overhead issues with a great many nodes. But it could work.

But what if we changed the architecture so that I/O was not the bottleneck? What if we built a cloud-based shared-nothing in-memory database (IMDB)? Now the data could live on SAN as it would only be read at start-up and written at shut-down… so the issues with the disk subsystem disappear… and issues around sharing the SAN disappear. Further, elasticity becomes feasible. With an IMDB we can add and delete nodes and re-distribute data without disk I/O… in fact it is likely that a column store IMDB could move column-compressed data without re-building rows. IMDB changes the game by removing the expense associated with disk I/O.

There is evidence emerging  that IMDB technology is going to change the playing field (see here).

Right now there are only a few IMDB products ready in the market:

  • TimeTen: which is not shared-nothing scalable, nor columnar, but could be the platform for a very small, 400GB or less (see here), cloud-based EDW;
  • SQLFire: which is semi-shared-nothing scalable (no joins across shards), not columnar, but could be the platform for a larger, maybe 5TB, specialized EDW;
  • ParAccel: which is shared-nothing scalable, columnar, but not fully an IMDB… but could be (see C. Monash here); or
  • SAP HANA: which is shared-nothing, IMDB, columnar and scalable to 100TB (see here).

So it is early… but soon enough we should see real EDWs in the cloud and likely on Amazon EC2, based on in-memory database technologies.

More on Exalytics Capacity…

I found myself wondering where did the rule-of-thumb for Exalytics  that suggests that TimesTen can use 800GB of a 1TB memory space… and requires 400GB of that space for work tables leaving room for 400GB of user data… come from (it is quoted everywhere… here is an example… see question #13).

Sure enough, this rule has been around for a while in the TimesTen literature… in fact it predates Exalytics (see here).

Why is this important? The workspace per query for a TPC-A transaction is very small and the amount of time the memory is held by a TPC-A transaction is very short. But the workspace required by a TPC-H query is at least 10X the space required by a TPC-A query and the duration of a TPC-H query is at least 10X the duration of a TPC-A query. The result is at least 100X more pressure on memory utilization.

So… I suspect that the 600GB of user data I calculated here may be off by more than a little. Maybe Exalytics can support 300GB of user data or 100GB of user data or maybe 60GB?

Note that this is not bad… all of this pressure on memory is still moved to Exalytics from the Exadata RAC subsystem… where memory is dear.

As a side note… it is always important to remember that the pressure on memory is the amount of memory utilized times the duration of the utilization. This is why the data flow architecture used in modern databases like Greenplum are effective. Greenplum uses more memory per transaction but it holds the memory for less time by never (almost) writing it to disk. This is different from older database architectures like Teradata and Oracle which use disk to store intermediate results… lowering the overall amount of memory required but increasing the duration of the query. More on this here