I recently listened to a session by Juan Loalza of Oracle on the 12c In-memory option. Here are my notes and comments in order…
There is only one copy of the data on disk and that is in row format (“the column format does not exist on disk”). This has huge implications. Many of the implications come up later in the presentation… but consider: on start-up or recovery the data has to be loaded from the row format and converted to a columnar format. This is a very expensive undertaking.
The in-memory columnar representation is a fully redundant copy of the row format OLTP representation. Note that this should not impact performance as the transaction is gated by the time to write to the log and we assume that the columnar tables are managed by MVCC just like the row tables. It would be nice to confirm this.
Data is not as compressed in-memory as in the hybrid-columnar-compression case. This is explained in my discussion of columnar compression here.
Oracle claims that an in-memory columnar implementation (level 3 maturity by my measure see here) is up to 100X faster than the same implementation in a row-based form. Funny, they did not say that the week before OOW? This means, of course, that HANA, xVelocity, BLU, etc. are 100X faster today.
They had a funny slide talking about “reporting” but explained that this is another word for aggregation. Of course in-memory vector aggregation is faster in-memory.
There was a very interesting discussion of Oracle ERP applications. The speaker suggested that there is no reporting schema for these apps and that users therefore place indexes on the OLTP database to provide performance for reporting and analytics. It was suggested that a typical Oracle E-Business table would have 10-20 indexes on it and it was not unusual to see 30-40 indexes. It was even mentioned that the Siebel application main table could require 90 indexes. It was suggested that by removing these indexes you could significantly speed up the OLTP performance, speed up reporting and analytics, cure cancer and end all wars (OK… they did not suggest that you could cure cancer and end war… this post was just getting a little dry).
The In-memory option is clusterable. Further, because a RAC cluster uses shared disk and the im-memory data is not written to disk there is a new shared-nothing implementation included. This is a very nice and significant architectural advance for Oracle. It uses the direct-to-wire infiniband protocol developed for Exadata to exchange data. Remember when Larry dissed Teradata and shared-nothingness… remember when Larry dissed in-memory and HANA… remember when Teradata dissed HANA as nonsense and said SAP was over-reacting to Oracle. Gotta smile :).
Admins need to reserve memory from the SGA for the in-memory option. This is problematic for Exadata as the maximum memory on a RAC node is 256GB… My bad… Exadata X3-8 supports 2TB of memory per node… this is only problematic for Exadata X3-2 and below… – Rob
It is possible to configure a table partition into memory while leaving other partitions in row format on disk.
It is suggested again that analytic indexes may be dropped to speed up OLTP.
The presenter talked about how the architecture, which does not change the row-based tables in any way, allows all of the high-availability options available with Oracle to continue to exist and operate unchanged.
But the beautiful picture starts to dull a little here. On start-up or first access… i.e. during recovery… the in-memory columnar data is unavailable and loaded asynchronously from the row format on disk. Note that it is possible to prioritize the order tables are loaded to get high-priority data in-memory first… a nice feature. During this time… which may be significant… all analytic queries are run against the un-indexed row store. Yikes! This kills OLTP performance and destroys analytic performance. In fact, the presenter suggested that maybe you might keep some indexes for reports on your OLTP tables to mitigate this.
Now the story really starts to unravel. Indexes are required to provide performance during recovery… so if your recovery SLA’s cannot be met without indexes then you are back to square one with indexes that are only used during recovery but must be maintained always, slower OLTP, and the extra requirement for a redundant in-memory columnar data image. I imagine that you could throttle some reports after an outage until the in-memory image is rebuilt… but the seamless operations during recovery using standard Oracle HA products is a bit of a stretch.
Let me raise again the question I asked in my post last week on this subject (here)… how are joins processed between the row format and the columnar format. The presenter says that joins are no problem… but there are really only two ways… maybe three ways to solve for this:
- When a row-to-columnar join is identified the optimizer converts the columnar table to a row form and processes the join using the row engine. This would be very very slow as there would be no indexes on the newly converted columnar data to facilitate the join.
- When a row-to-columnar join is identified the optimizer pushes down aggregation and projection to the columnar processing engine and converts the columnar result to a row form and processes the join using the row engine. This would be moderately slow as there would be no indexes on the newly converted columnar data to facilitate the join (i.e. the columnar fact would have no indexes to row dimensions or visa versa).
- When a row-to-columnar join is identified the optimizer converts the row table to a columnar form and processes the join using the columnar engine. This could be fast.
Numbers two and three are the coolest options… but number three is very unlikely due to the fact that columnar data is sharded in a shared-nothing manner and the row data is not… and number two is unlikely because if it were the case Oracle would surely be talking about it. Number one is easy and the likely state in release 1… but these joins will be very slow.
Finally, the presenter said that this columnar processing would be implemented in Times Ten and then in the Exalytics machine. I do not really get the logic here? If a user can aggregate in their OLTP system in a flash why would they pre-aggregate data and pass it to another data stovepipe? If you had to offload workload from your OLTP system why wouldn’t you deploy a small, standard, Oracle server with the in-memory option and move data there where, as the presenter suggested, you can solve any query fast… not just the pre-aggregated queries served by Exalytics. Frankly, I’ve wondered why SAP has not marketed a small HANA server as an Exalytics replacement for just this reason… more speed… more agility… same cost?
There you have it… my half a cents… some may say cents-less evaluation.
I will end this with a question for my audience (Ofir… I’ll provide a link to your site if you post on this…)… how do we suppose the in-memory option supports bulk data load? This has implications for data warehousing…
Here, of course, is the picture I should have used above… labeled as the in-memory database of your favorite vendor:
13 thoughts on “More on the Oracle 12c In-Memory Option”
There is only one way to load the data files into columnar in-memory structures : parsing the files and concatenating the column values.
However my bet would be on Oracle not supporting bulk load for in-memory columnar option the first few iterations of the product. Instead customers would be directed to bulk load the data into row store and go from there.
I don’t think that anyone will parse and load, Antony. You may be right about bulk loading to the row store and then refreshing the column store. This would be my first guess. The capability sort of exists already when you cold start the column store and it loads.
As Ofir points out… Disabling bulk loads would mean that existing processes would fail. I doubt that would be their approach…
Rob – you are giving me homework 🙂 I’ll comment here for now:
Anthony – that would break the “just flip a switch and it all works as is” promise…
But it is worse, how will they support standard DDL – like a single UPDATE statement updating thousands of non-sequential rows (several or many columns), while keeping both row and column format in-sync? If columnar data is compressed, updates could make “OLTP” stall. If columnar data is uncompressed (as you heard) – that’s crazy talk, you can’t implement columnar these days without at least some columnar encoding (dictionary, RLE, delta etc) with optional compression on top.
Maybe they will have some write optimized row-store that in the background periodically merge into the column store, and have queries access them both transparently,,, That would be innovative –> copying from Vertica and Hana.
Interesting stuff…slight correction, or rather an expansion, on the RAC memory limitation…I believe 256GB is for an X3-2, whereas an X3-8 has 2TB of memory per RAC node, according to the specs.
Also, did I miss something (I’m only just reading about this stuff, so be gentle please) but the columnar table is a redundant form of the row based table, so why would the data need to be converted back from columnar to row based when a join is needed…it’s already there isn’t it? And if it’s a separate object, presumably it could have it’s own analytic indexes – if they are needed, of course.
I’ll correct the Exadata memory stuff right away… thanks.
Inside Oracle, or any DBMS, is a join engine. The classic Oracle join engine would expect a physical row as input from both sides of the join. It is possible to develop a join engine that works on row formatted tables and columnar tables… but I doubt this is in the current Oracle release as it is a very cool feature and Oracle would surely point out this capability if it exists… the Universal Data Join Engine or some such flubber. In fact, they are very silent on joining across formats… so we might assume a simple first implementation. This simple implementation likely converts columns to rows and uses the existing join engine.
Remember… I am speculating… but given no other information from Oracle I think this might be a good educated guess.
Not sure I’m making myself clear on the join stuff…I’ll try again…
As i understand it, Table X_ROW is in Row format and is stored on disk and, some of it, perhaps is also in memory, depending on how hot parts of it are. Table X_ROW may have some indexes on it, for specific performance reasons.
Table X_ROW can also be represented as a Columnar format, in memory only – lets call that X_COLUMNAR.
Now, you suggest a scenario where one would like to join another row based table Y_ROW, to the content in X_COLUMNAR and that there are these three approaches for the join processing…what I’m saying is why would one be joining Y_ROW to X_COLUMNAR when X_ROW already exists – just join to X_ROW and then no conversions are necessary.
In simple terms, I’m saying approach 1 is not really an option as it is unnecessary.
Approach 2 might be quicker at doing the projection/aggregation on the X_COLUMNAR, but as you say, the conversion to row format for the join to Y_ROW might be onerous and you’d lose the indexes – it would be a tradeoff as to whether that is ultimately quicker than doing the projection and aggregation directly on the available X_ROW table, with it’s performance indexes.
We are not disagreeing in the end… There is no columnar join… And if you have eliminated indexes as suggested… The the row joins will be very very slow. If you have kept the indexes then the rows joins will be faster but not benefit from the in-memory speed up. It is a huge hole in the implementation.
I think your 3 options are too technique-oriented and they are insignificant in real cases. Use oramoss’ example, if the 2 table are so important to join together for OLAP, why not create 2 columnar indexes for both? Or an in-memory join index (projection) so that no actual joins are needed any more? Yes, you will talk about how to synchornize btw row-store and column-store. Based on my programming expericence, if you do not think about persisting columnar data, then using bitmap or other in-memory structurture will have no significant performance overhead.
The real important point is the difference in fundmental idea btw HANA and Oracle in-memory option: HANA is pure in memory, while oracle just adds an option. But thinking in real use cases, which one is more suitable for the market? Do thinking it is necessary for SAP ERP puts all its 40 thousands tables in memory, or just MSEG, BSEG, and so on? Do you think SAP ERP has resolved its OLTP performance problem?
Maybe pure in-memory is a good option for OLAP. But in-memory can not store very huge data, which I think commerical column-store is more suitable to to this. In-memory is only for realtime analysis and can not( or do not necessary to deal )such a big data. And we all see it is good to combine OLTP and OLAP into one DBMS. Oracle is on the right way!
You cannot have it both ways, methinks?
In your first paragraph you suggest that if you need to get around the difficulties of joining columnar to row data in Oracle you suggest complete redundancy: add column tables for every table… or worse, build pre-joined projections. Note that there is no way to build these today without building a redundant pre-joined row projection. In other words you suggest that you should put everything in-memory. In your last paragraph you want to ding HANA for suggesting all tables be in-memory.
By the way… let me say again… HANA does not require all data to be in-memory.
You twice mention real use cases… but I do not see them? Oracle mentions a real use case: ERP systems where OLTP and analytics run against the same table instance. They suggest that you can remove all of the indexes required for analytics, add an in-memory column store table, and speed up both OLTP and analytics. I suggest that in this real use case your OLTP performance will suffer unless there are multiple, maybe 10 or more, analytic indexes to drop. This is because there is a cost to inserting and updating and committing both the row and the column copies of the data in a single OLTP transaction. I suggest that there will be a huge performance hit for reports whenever the analytics indexes are dropped and joins must occur… if there is no in-memory columnar table. You suggest, and I agree, that the solution is to put every table both in-memory columnar and row. But the cost of memory for a large global shared are to make OLTP fast… plus memory to keep a redundant columnar copy will be impossible.
I might agree that in-memory cannot store “very huge data” at a fair price. But keeping a single copy has to be less expensive than keep two full copies… one row store and one columnar?
I think that you have missed the point about in-memory, Vincent. The reason that Oracle built this feature is because the price/performance of in-memory is significantly better than the price/performance of flash-based or disk-based systems… and the price is falling with Moore’s Law. Oracle understands that non-volatile memories are coming and the price performance of in-memory will leap further ahead. Oracle understands that in-memory is not just for real-time… it is a required technology driven by the economics of the hardware. In-memory is as important to Oracle as it is to SAP… so much so that they announced, but did not release, an in-memory tack-on with some serious constraints rather than wait and announce a more solid, ready-to-go, offering. Over time they will fix the issues… I am sure. But their announced product… with R1 due out in a year… will not be up-to-par with today’s offerings from other players: IBM, Microsoft, and SAP.
Oracle is on the right way… but not in the lead.
I didn’t mean “put everything in-memory”. Say if the 2 tables together reflect a real business model, and joins would be happenning btw them, they should be put in-memory together with columnar format. I don’t think there is any controdiction within my last comments. HANA tries to put a collections of system wide tables in-memory, while I mean we just put a bunch of tables that reflect a real business analysis model in-memory.
You said “Note that there is no way to build these today without building a redundant pre-joined row projection.”, I should say it is possible to creat a columnar join-index without pre-joined row projection. Using SAP ERP table for an example: MKPF and MSEG. Column “MKPF-BLDAT” can be easily added to MSEG columnars if you only perfmits the right order reflection with value&positions pairs. Column “MKPF-BLDAT” can use Run-Length-Encoding which can avoid redundancy. Column store usually avoids joins between different projections. Becasue actually they prefer joins between columns, or you can say they are good at gluing columns together to get a materialize view. For this reason, column store does not suggest involving too many columns in an analysis case, as gluing too many columns will have performance overhead.
You said ” HANA does not require all data to be in-memory”. This is somehow correct, HANA can set some of tables in “LAZY” mode so that these tables are not loaded in-memory when DB started. But when the tables (or partitions) are accessed, they are loaded into memory either whole or partitions. This mechanism can save memory space while does not affect the idea that HANA is a pure in-memory database. But the drawback is: When you do a single row select/update, it loads the whole table, and poor, most of tables are columnar format. With long time running, it’s for sure the majority of tables are resides in-memory. HANA tries to persist the columnar format to disks, so it has to convert the in-memory columnar format into b-tree like disk columnar format. And this is the real performane overhead for column store update. I have to emphasize again “ORACLE DOES NOT PERSIST COLUMNAR FORMAT DATA”, so update in-memory columnar format doesn’t have significant performance overhead. The performance overhead is moved to the DB start moment, as it has to convert row-store to an in-memory column store. But think about how occasionally a DB restarts when it is running steady? And also remember HAHA has the same performance overhead in restarting.
You said “…that the solution is to put every table both in-memory columnar and row.” I want to remind that morden row-store RDBMS already has good CACHE mechansims. Nothing was changed for the row-store as I have confirmed with my firend who is working in ORACLE. when you do OLTP, the same row-store will serve with less BTREE indexes. So it is not necessary to put the whole row table in-memory, but just follow the exsiting CACHE mechansim. You should know SAP has suffered the performance drawback when running ERP on HANA which shows 4 times slower in the benchmark on its’ SD module. The reasion is too many “select single” and “update where” statements. Although they have improved the ABAP codes, but the speed is still 2 times slower than other DBs. And remember, the ABAP improvements have the same effect on other DBs. All column store suffers the OLTP drawback that is why it is not popular in past years, and the situtation is not changed. Just forget ORACLE, but think about the more realistic solution: OLTP on row-store, and OLAP on columnar indexes, making columnar indexes an option in a matrue row-store.
You said “that non-volatile memories are coming and the price performance of in-memory will leap further ahead.”. If non-volatile memories are coming, how much difference btw “MYSQL” and “HANA”? MYSQL does not need modify much of its code to enjoy the performance boom on the non-volatile memories. Even the DRAM is becoming more and more cheaper, you still have to solve the volatile problem if we assume: “volatile memory always faster than non-volatile memory”. At the same time, we should still think about how the real business is now, and what’s the real requirements of company now?
To a conclusion: I think SAP is pioneer in in-memory DB, but pioneer is not always good. HANA is a product of metaphysic, while oracle’s way is more realistic. And MS SQL is following the oracle way. Finally, just recollect the story of Netscape and MicroSoft.
I’m not really sure where you are going here? You started out suggesting that the Oracle in-memory option had some advantages over HANA… I’ll assume this is still your point and address your reply in that vein.
First, you did not pick up my comment that HANA loads columns, not tables, as they are referenced. It is possible to load columns on startup rather than upon reference. The Oracle feature requires you to name columns to be loaded into memory… and those columns are loaded from the row store format upon startup or during recovery. Queries use the row store while the in-memory copy is loaded which will severely impact performance if, as recommended, the indexes to support analytics are removed. I do not see how you could describe the Oracle implementation as better?
You suggest that it is possible to build a “columnar join index” on the in-memory columns. Could you provide a reference for this? I cannot see where this is supported.
You state that HANA has to convert the in-memory column format into a “B-Tree like disk columnar format”. Could you provide a reference for this statement? It is not how I understand HANA to work.
I pointed out that Oracle was embracing in-memory as a strategic imperitive… and that this was a good thing as memory prices fall and non-volatile memory becomes prevalent. You bring up MySQL and suggest that no change is required. In my opinion there will be major architectural changes required to fully utilize NVM… I guess we won’t know until it appears?
I’ll leave my readers to ponder your closing paragraph… and allow you to have the last word if you choose to respond to these comments.
Rob as a TD guy how does Teradata stack up in the in memory database world? I have rad a little about the new in memory features with 14.10 but want to hear more
Have a look at the “Who is How Columnar?” series and at my note on the Teradata feature. The bottom line: Teradata uses columns to eliminate I/O… a good thing… and uses in-memory to eliminate I/O… more goodness. But they do not have a columnar engine so they convert columns to rows to process them. The conversion is expensive; when combined with the inability to effectively feed the processor cache… and the inability to use supercomputer instructions, AVX and SIMD, methinks Teradata will be significantly slower than HANA, xVelocity, or BLU now… and slower than Oracle when that feature comes out.
Comments are closed.