I changed the picture to show you the billboard SAP bought on US101N right across from the O HQ…
Larry Ellison announced a new in-memory capability for Oracle 12c last night. There is little solid information available but taken at face value the new feature is significant… very cool… and fairly capable.
In short it appears that users have the ability to pin a table into memory in a columnar format. The new feature provides level 3 (see here) columnar capabilities… data is stored compressed and processed using vector and SIMD instruction sets. The pinned data is a redundant copy of the table in-memory… so INSERT/UPDATE/DELETE and data loads queries will use the row store and data is copied and converted to the in-memory columnar format.
As you can imagine there are lots of open questions. Here are some… and I’ll try to sort out answers in the next several weeks:
- It seems that data is converted row-to-columnar in real-time using a 2-phased commit. This will significantly slow down OLTP performance. LE suggested that there was a significant speed-up for OLTP based on performance savings from eliminating indexes required for analytics. This is a little disingenuous, methinks… as you will most certainly see a significant degradation when you compare OLTP performance without indexes (or with a couple of OLTP-centric indexes) and with the in-memory columnar feature on to OLTP performance without the redundant copy and format to columnar effort. So be careful. The use case suggested: removing analytic indexes and using the in-memory column store is solid and real… but if you have already optimized for OLTP and removed the analytic indexes you are likely to see performance drop.
- It is not clear whether the columnar data is persisted to disk/flash. It seems like maybe it is not. This implies that on start-up or recovery data is read from the row store on-disk tables and logs and converted to columnar. This may significantly impact start-up and recovery for OLTP systems.
- It is unclear how columnar tables are joined to row tables. It seems that maybe this is not possible… or maybe there is a dynamic conversion from one form to another? Note that it was mentioned that is possible for columnar data to be joined to columnar data. Solving for heterogeneous joins would require some sophisticated optimization. I suspect that when any row table is mentioned in a query that the row join engine is used. In this case analytic queries may run significantly slower as the analytic indexes will have been removed.
- Because of this and of item #2 it is unclear how this feature plays with Exadata. For lots of reasons I suspect that they do not play well and that Exadata cannot use the new feature. For example, there is no mention of new extended memory options for the Exadata appliance… and you can be sure that this feature will require more memory.
There was a new hardware system announced that uses this in-memory capability… If you add all of this up it may be that this is a system designed to run SAP applications. In fact, before the presentation on in-memory there was a long (-winded) presentation of a new Fujitsu system and the SAP SD benchmark was specifically mentioned. This was not likely an accident. So… maybe what we have is a counter to HANA for SAP apps… not a data warehouse at all.
As I said… we’ll see as the technical details emerge. If the architectural constraints 1-4 above hold then this will require some work for Oracle to compete with HANA for SAP apps or for data warehouse workloads…
22 thoughts on “Oracle 12c IMDB Announcement at OOW13”
Hi Robb, some info I’ve received from the folks on the show floor.
On point 2) the data in the columnar is not persisted to disk. It is in memory only. Which immediately brings up some concerns to me. A) If I have to recycle the instance I lose what is inmemory and I suspect recreating these things is very I/O intensive when I come back online (and CPU intensive to strip the rows out into columns. B) What if I have a pure analytics database from multiple OLTP sources. I still need row and in memory columnar tables. Kind of a waste.
The columnar tables does have logs but those logs are apparently kept in memory only and would be used to ensure queries see the current data. But since these are in memory only if the db crashes I don’t need those logs becuase I have to recreate the in memory columnar anyway from scratch.
I agree with you on the OLTP side I think is a bit of a stretch. I already make sure I don’t have too many indexes in my highly tuned OLTP system and in fact many of those indexes are used for tuning my Batch processes which manipulate my row data nightly. So I can’t get rid of those indexes because the queries on the columnstore aren’t likely used to help me update the row stores in my database.
What they are doing is really like creating MQT (materialized query tables) in columnar format in DB2 with BLU Acceleration. In this with DB2 you would have the row and the column but in this case the columnar is persisted and not ALL in memory necessarily. For 12c it seems like all the active partitions need to be in memory (that’s a lot of memory potentially).
Thanks, Chris… Great report from the floor.
I had two other things on my list that didn’t make it in the blog… you mentioned one:
5) Can a single partitioned table span from columnar in-memory to row-on-disk? I suspect not which means that big partitioned tables will not usually fit.
6) Can an in-memory columnar table span nodes and scale-out in a shared-nothing scheme?
Not sure on 6 but on 5 they showed you as a DBA specify what tables or partitions you want to have duplicated in columnar in memory. So yes you can have one partition in columnar memory. Down side is you have to decide 🙂 With HANA or DB2 BLU the tables can persist as only columnar and then pulled into memory as needed. With DB2 BLU you don’t have to have all active data fit in memory. With HANA and Oracle you do…but I think Oracle is worse because you have to decide what should go in memory and so if you choose wring your performance will be terrible (especially if you take their advice and drop a lot of indexes 🙂
The constraint on all-data-in-memory with HANA will go away soon. Thanks again, Chris…
I thought I understood from Larry’s presentation that the column store would be used as ‘indexes’ when joining to row-store tables. It makes sense since when looked at in a certain light, the column store is a bunch of indexes. However, it does mean you will be without those indexes when you restart your instance.
I do not believe this is what he said, Joel. And certainly the follow-on session with the expert did not suggest this at all.
The idea is that with a redundant, columnar, table you can run analytics against that table faster than having indexes on the table. This does not mean that the columnar table performs the duties of an index when joining to a table with no columnar copy.
Hi Chris, I think the manual currently states that CREATE TABLE as MQT is not supported for DB2 column-organized tables? Is that wrong or are you saying that this restriction has been lifted?
Perhaps column-organized MQT, refreshed incrementally (say) every 15 minutes, could provide near real time analytics plus OLTP in the same database, with a degree of loose coupling that might be desirable from an OLTP throughout viewpoint?
You are correct Jeremy that this restriction has not *yet* been lifted. But of course the comparison here is to something Oracle does not have yet either (it’s something some time in 2014 they have said). I can’t make any comments here on IBM’s future plans but I think in my explanation above in the comparison folks perhaps can read between the lines.
I have also posted a more detailed blog comparison to Oracle 12c In memory today here http://bit.ly/19rrt27
Another question: I have heard mixed reports on when the 12c in-memory stuff will be GA. There are rumors that is is still a ways out? Can anyone confirm?
OK… the official word from Andy Mendelsohn of O is that the in-memory feature is “pre-beta” and will be available “sometime next year”.
Short and sweet: “columnar” is in some implementations just a collection of single-column rows.
If the table is in-memory only, must it be populated manually? And this raises the next question: is this really a (useful) use case? Or does it bring in even more maintenance madness?
Slide said DBA marks which tables or table partitions are eligable for in memory columnar. These new objects are materialized in memory either at startup or first access (another DBA choice). Will be interesting to see how long these things take to create in memory.
So it has to read all the rows of a regular table to create a in-memory columns.
Kinda loses the advantage of columnar store of reading only the columns used in a query.
Also, what about the tables that do not fit into memory ? Not everyone can buy a monster box with terabytes of memory. The whole thing looks like it is not ready for prime-time..
O has to read all of the rows once to initialize the in-memory columns. From there it uses column projection… this is a very good thing and Oracle deserves kudos.
I will post something next week on how tables bigger than the available memory work. This will still be pretty preliminary… but should be close and expose some issues. We just won’t know what is real until the product is GA.
It is so not ready for prime time that it is not available. The engineering VP in charge of the product said it was in pre-beta testing… an Oracle euphemism for alpha testing… and would be GA in 2014… which means that it is not really cooked yet. Anyone who knows about software-engineering-speak knows that “available in 2014” means available December 31, 2014. Note that they would likely have said available Q3-2014 or Q2-2014 if that was the expected case.
This is a marketing, not a product, announcement designed to freeze Oracle fanbois from moving to another in-memory product… smart marketing but not much about architecture.
I assume that an in-memory columnar table may be built from a persistent row table. Per Chris above (or maybe below… not sure how these will be ordered) and my speculations… the time it takes to generate a columnar table may be prohibitive as it will slow the start-up of an analytics systems and slow both start-up and recovery of an associated OLTP system.
Assumption is correct but in fact it’s not “may be built” but rather “is built”. The in memory version of the table is the same table (or partition) just in columnar in memory format (or as Kevin says – in single column row format – love that statement 🙂
I do agree the time taken (and resources consumed) may make it prohibitive for many use cases. I suspect next year you will see sessions on how to improve restart time by turning off in memory materialized columnar objects at startup and then brining them online later after your OLTP system is running again.
I suspect down the road (2015 or later maybe) they will start to persist these object to disk in some way so that they can solve the start-up issue with in memory only object.
good discussions here.
Great discussion here. Absolutely love the picture Rob. Classic.
Derek Oats – IBM SAP HANA COE
Rob – great post and great discussion… One interesting side effect of “analytics queries will run 100x faster” is – who needs Exadata? Why pay for specialized storage “with brains” if SQL is in-memory? I wrote a bit more in my Oracle In-Memory Option – the good, the bad, the ugly
BTW – Oracle claims that In-Memory is 10x faster than Exadata 🙂
Comments are closed.