Last year two associates from Greenplum suggested that I read a very smart academic paper titled “Efficiently Compiling Efficient Query Plans for Modern Hardware” by Thomas Neumann. Having reiterated the idea of database supercomputing in my last blog (here)… I can now suggest this paper to you.
In short this paper suggests that the classic approach to building query plans using an iterator, an approach that assumed that I/O was the bottleneck, misses a significant opportunity for optimization at the hardware level. He suggests that an approach designed to keep data in the hardware registers as long as possible and push instructions to the data provides a significant performance boost. Further, he suggests that this approach extends the advantages of vector processing. The paper is here and a set of slides on the topic are here.
My friends subsequently started-up a little company named Vitesse Data (vitessedata.com) and implemented the technique over Postgres. Check out their site… the benchmark numbers are pretty cool and they certainly prove the paper. My guess is that this might be a next step in the database architecture race.
FYI… here is a link to more information on the LLVM compiler framework… another very cool bit of software.
One last note… some folks see optimization to the bare metal as an odd approach in a cloudy world where even the database is abstracted away from the bare metal by virtualization. But this thinking misses the point. At some point the database program executes in real hardware and these optimizations matter. What really happens is that bare metal optimization exposes more of the inefficiency of virtualization.
We are already starting to see the emergence of clouds that deploy on bare metal. I expect that we will shortly get to the point where things like databases are deployed on bare-metal cloud IaaS to squeeze every drop of performance out… while other programs are deployed in virtualized IaaS…
Today I am going to focus on a topic that I’ve suggested previously without the right emphasis: the new database architecture that uses vector processing on compressed columns to significantly accelerate performance.
The term “super-computing” was coined to describe the extreme hardware and software optimization developed to crunch numbers in scientific applications. As these technologies developed super-computer hardware evolved to leverage parallel microcomputers, software evolved to better leverage parallelism. Recently, microcomputers have started to incorporate the specialized instructions that support advanced mathematical applications. These super-computer instructions directly support vector algebra by manipulating strings of bits, vectors, in a single instruction. Finally, application developers recognized that these bit strings, these vectors, could be loaded into the microprocessors in a more effective manner to optimize their applications to the bare metal.
The effect of these optimizations accumulate for these applications as vectors compress and use memory more effectively, vectors load into processor cache more effectively, and vector instructions dramatically outperform integer instructions. The cumulative effect is that super-computer programs may be 10X-100X faster than commercial applications that provide the same result.
As this evolution progressed there was a similar evolution changing the architecture of database technology. Databases actually leveraged microcomputers before the high performance space made the move. But databases focused on the benefits of massively parallel I/O more than on the benefits of parallel compute. The drive to minimize the cost of I/O eventually led database developers to implement column store and then a very interesting discovery was made. Engineers recognized that a highly compressed column, a string of bits, could be processed as a vector.
Let’s see if we can make this 10X-100X number more than marketing foam. We can do this by roughly comparing the low-level processing of a chunk of data in integer and then in vector formats.
Let’s skip I/O processing and just focus on internals. This simplification greatly favors our integer DBMS. Keep in mind that the vector DBMS will process compressed vector data directly while the integer DBMS will expend resources to uncompress data and then take up 4X or more memory. This less efficient memory utilization will increase the chance that an I/O may be required and I/O is very expensive in the scenario we will discuss. Even an I/O on 1% of the time by the integer DBMS will provide a 1000X-100,000X advantage to the vector DBMS (see Figure 8 to gauge the latency to SSD or to disk).
So we’ll start with uncompressed integer data versus compressed vector data. We can assume that both databases are effective at populating cache. But the 4X compression advantage means that the vector processor is more likely to find data in the fast Level 1 cache and in the mid-range L2 cache. Given the characteristics outlined in Figure 8 we might suggest that the vector database is 4X more likely of finding data in cache than the integer database and that if we assume the latency of L2 cache as an estimate this results in a 15X-200X performance advantage.
Since data is in a vector form we can perform relational algebra and basic mathematics using vector algebra and vector addition. This provides another 8X-50X boost to the vector side
When we combine these advantages we see that a 10X-100X advantage is conservative. The bottom line is clear. A columnar database that effectively manages vectors into cache and further utilizes super-computing instructions will significantly out-perform an integer-based product.
This is the 3rd and final example of a vendor admitting, without admitting, to an architectural limitation. The first two parts on Exadata and Teradata are here and here.
Teradata started to get real traction in the EDW space with a shared-nothing architecture in the late 1980’s. At that time the only real competition was DB2 on an IBM mainframe. From those days until just a couple of years ago IBM insisted that for MVS, then z/OS, customers should stick to the mainframe for their data warehouses and marts. There was some dabbling with sharded data in DB2 for z/OS… and Teradata made some in-roads… as did Netezza… but IBM insisted that there was no reason not to stay Blue. DB2 on AIX and then LINUX appeared… and both offered a better price/performance option than DB2 ob Z/OS… but the faithful stayed faithful for the most part.
Then IBM bought Netezza, a pure shared-nothing microprocessor-based machine, and the recommendation changed. Today IBM recommends the Analytics Accelerator, based on Netezza, to mainframe users who want to deploy an EDW. This is an admission, with no admission at all, that there was all along an architectural advantage to shared-nothingness.
If you search this blog for “Netezza” you can get my perspective on that technology. But to be blunt, the Analytics Accelerator is not IBM’s best EDW platform… DB2 LUW is by far… and with BLU LUW is better still.
I have made it clear in my previous posts that I consider it lazy for an IT shop to commit to a vendor or to a product. As engineers we need to embrace change. For IBM z/OS shops this means a realistic look at non-z/OS alternatives to deploy or to re-deploy an EDW. It makes no sense to build a data warehouse or a data mart directly on z/OS. Use the Analytics Accelerator or, better still, open the competition to better products like DB2 LUW, Teradata, Vertica, etc.
I composed the video below on a contract for Intel… but they were kind enough to let me tell the story with only a lite promotional touch. I think that you will find the story interesting as it describes 20+ years of systems architecture and suggests where we may well be headed in the next 5 years…
The bottom line here is that we developed a fully distributed systems architecture over the course of 15 years in order to use the economics of microprocessors. The distributed architecture was required because no micro-based server, and no small cluster of micro-based servers, could manage an enterprise-sized workload. We had to gang micro-processors together to solve the problem. Today we can very nearly solve for an enterprise workload on a small cluster of 32-core or 64-core processors… so distribution may no longer be a driving requirement.
I’ll post a couple of more notes on this video over the next few weeks. There are two possible endings to the video and we’ll explore these future states.
About three years ago I started with SAP and early in my second week I was asked to appear before Hasso Platner and Vishal Sikka. In the five minutes before I walked in I was informed that the topic was a book they wanted me to ghost-write for them. I was flabbergasted… I had never written a book.. but so it goes. In the meeting I was told that the topic was “HANA for CIOs” and I was handed a list of forty or fifty key words… topics to be included in the narrative. We agreed that we would meet again to consider content more fully. Despite several requests… that was the last meeting I had on this subject and the project dissolved.
In the month or so before it became clear that there was no real interest in the project I struggled to figure out how to tell a story about HANA that would be compelling… rather than make the book a list of technical features. The story in the video, with the HANA ending that I will post next, was to be the story that opened the book.
This is the second post (see Part 1 here) on how vendors adjust their architecture without admitting that the previous architecture was flawed. This time we’ll consider Teradata and in-memory….
When SAP HANA appeared Teradata went on the warpath with a series of posts and statements that were pointed but oddly miscued (see the references below). According to the posts in-memory was unnecessary and SAP was on a misguided journey.
Then Teradata announced Intelligent Memory and in-memory was cool. This is pretty close to an admission that SAP was right and Teradata was wrong. The numbers which drove Teradata here are compelling… 100K-200K ns to access an SSD device or 100 ns to access DRAM… a 1000X reduction… and the latency to disk is 100X worse than SSD.
Intelligent Memory was announced shortly after the release of Teradata’s columnar table type. Column-orientation is important because you need a powerful approach to compression to effectively use an expensive memory resource… and columnar provides this. But Teradata, like Greenplum, extended a row-based engine to support columns in order to get to market quick… they hoped to get 80% of the effectiveness of in-memory with only 20% of the engineering effort. The other 20% comes when you develop a new engine that fully exploits the advantages of a columnar architecture. These advanced exploits allow HANA, DB2 BLU, and Oracle 12c to execute directly on columnar data thereby avoiding decompression, fully utilizing the processor caches, and allowing sets to be operated on by super-computing vector-processing instructions. In fact, Teradata really applied the 50/20 rule… they gained 50%, maybe only 40%, of the benefits with their columnar and Intelligent Memory features… but it was easy to deploy what is in-effect an in-memory cache over their existing relational engine.
Please don’t jump to the wrong conclusion here… Intelligent Memory is a strong product. If you were to put hot data in memory, cool data in Teradata-on-SSD-or-Disk, and cold data in Hadoop and manage them as one EDW you could deploy a very cost-effective platform (see here).
Still, Teradata with Intelligent Memory is not likely to compete effectively against HANA, BLU, or 12c for raw performance… so there will be some marketing foam attached and an appeal for Teradata shops to avoid database apostasy and stick with them. You can see some of the foam in the articles below.
A quick aside here… generally a DBMS should win or lose based on price/performance. The ANSI standard makes a products features nearly, not completely but nearly, irrelevant. If you cannot win on price/performance then you blow foam. When any vendor starts talking about things like TCO you should grab your wallets… it is an appeal to foaminess to hide a weakness. I’m not calling out Teradata here… this is general warning that applies to every software vendor.
Intelligent Memory is a smart move. While it may not win in a head-to-head POC… it will be close-ish… close enough to keep the congregation in their pews. As readers know, I am not a big fan of technical religiosity… being a “Teradata-shop” is lazy… engineers we should pick the best solution and learn it. The tiered approach mentioned three paragraphs up is a good solution and non-Teradata shops should be considering it… but Teradata shops should be open to new technology as well. Still, we should pick new technology with a sensitivity to the cost of a migration… and in many cases Intelligent Memory will save business for Teradata by getting just close enough to make migration a bad trade-off. This is why it was so smart.
Back to the theme of these posts… Teradata back-tracked on the value of in-memory… and in the process admitted-without-admitting a shortcoming in their architecture. So it goes…
Next we will consider whether you should be building data warehouses on z/OS using DB2 or the DB2 Analytics Accelerator aka Netezza.
Yesterday I provided a model for how business sees open source as a means to be profitable (here). This is the game Pivotal seems to be playing with their release of Hadoop, Gemfire, HAWQ, and Greenplum into open source. I do not know their real numbers… so they may need more or fewer additional customers than the mythical company to get back to break-even. But it is unlikely that any company can turn the corner from a license-based revenue stream to a recurring revenue stream in a year… so Pivotal must be looking at a loss. And when losses come it is usual to cut costs… to cut R&D.
There has already been a brain-drain out of the database ranks at Pivotal as they went “all in” on Hadoop. They likely hope for an open source community to pick up the slack… but there is not a body of success I can see in building a community to engineer a commercial product-turned-open. This is especially problematic for Gemfire, an old technology that has been in the commercial space for a very long time. HAWQ has to compete for database resources with the other Hadoop RDBMS technologies… that will be difficult. Greenplum has a chance as it is based on PostgreSQL… but it is a long way away from the current PostgreSQL code base these days. There is danger here.
The bottom line… Greenplum and HAWQ and Gemfire have become risky propositions for both the current customer base and for new customers. I’ll leave it to you to evaluate the risk as this story unfolds. Still, with the risk comes reward… the cost of acquiring Greenplum will drop dramatically and today Greenplum is a competitive product. In addition, if Greenplum gains some traction, it will put price pressure on the other database products. Note that HAWQ was already marked down to open source price levels… and part of Pivotal’s problem was that HAWQ was eating at the Greenplum market. With these products priced at similar levels there becomes some weirdness in choosing… but the advantage is to customers looking at Greenplum.
One great outcome comes for Pivotal Hadoop customers… the fact that Hortonworks will more-or-less subsume Pivotal Hadoop leaves those folks in a better place than before.
If you consider the thought experiment you would have to ask yourself why a company that was breaking even would take this risky route? It could be that they took the route because they were not breaking even and this was a possible path to get even. Also consider… open sourcing code is the modern graceful way to retire an unprofitable product line.
This is sound thinking by Pivotal… during the creation, EMC gave Pivotal several unprofitable troubled assets and these announcements give Pivotal a path forward. If the database product line cannot carry their weight then they will go into maintenance mode and slowly fade. Too bad… as you know I consider Greenplum a solid product whose potential was wasted. But Pivotal has a very nice product in Cloud Foundry… and they clearly see this as their route to profitability and to an IPO… a route that no longer includes a significant contribution from database products.
This post is more about the technology business than about technology… but it may be relevant as you try to sort out winners and losers… and this sort of sorting is important if you consider new companies who may, or may not, succeed in the long run.
To make my point let us do a little thought experiment. Imagine a company doing $100M in revenue with a commercial, not open source, database product. They win the $100M in revenue by competing with Oracle, IBM, Microsoft, Teradata, et cetera… and maybe competing a little here and there with some open source products.
Let’s assume that they make 50% of their revenue from services and support, and that their average sale is $2M… so they close 25 deals a year competing in this market. Finally, let’s assume that they break-even each year and spend 20% of their revenues on R&D. The industry average for support services is 20%.. so with each $2M sale they add $400K in recurring revenue.
They are considering making their product open source. Let’s assume that they make the base product free… and provide some value-added offering that costs $200K for the average buyer. Further, they offer a support package for the same $400K/year customers currently pay. How does the math work out?
Let’s baseline against the 25 deals/year…
If they make 25 sales and every buyer buys both the support package and the value-added offer the average sale drops from $2M to $200K, sales revenue drops from $50M to $5M, the annual revenue drops from $100M to $55M… and the company loses $45M. So… starting off they need to make 225 more sales just to break even. But now it gets complicated… if they sell 5 extra deals then in the next year they earn $2M extra in support fees… so if they sell 113 extra deals in year one then in year two they have made up the entire $45M difference and they are back to break-even going forward. If it takes them 2 years to get the extra recurring revenue then they lose money in year two… but are back to break-even in year three.
From here it gets even more complicated. The mythical company above sells the baseline of 25 new copies a year with an enterprise sales force that is expensive. There is no way that the same sales force that services 25 sales/year could service 100+ extra deals. So either costs go up or the 100+ extra customers becomes unattainable. We might hope that the cost of sales will drop way off as the sales price moves to $200K. This is not unreasonable… but certainly not guaranteed. Further, if you are one of the existing sales-staff then you have to sell 10X just to make the same commission. Finally these numbers assume that every customer buys the value-add and gets enterprise-level support. Reality will be something less than this.
We might ask: is it even possible to sell 100+ more with the same product in the same market? Let us be clear that the market the database product plays in has not changed. Open Source is not a market. All we have done is reduced the sales price for the product with some hope that price is a significant driver in the market.
This is not meant as an academic exercise. Tomorrow we will consider how this thought experiment applies to Pivotal’s announcements last week… and to the future of Pivotal’s database assets (here).
In January Greenplum rolled out a new query optimizer. This is very cool and very advanced stuff.
Query optimization is a search problem… in a perfect world you would search through the space of all possible plans for any query and choose the least expensive plan. But the time required to iterate through all possible plans would take more time than most queries… so optimizers use rules to cut down the space searched. The rules have been built up over the years and are designed to prune the space quickly to keep performance high for simple queries. But these rules can break down when complex queries are introduced… so Greenplum made the significant investment to build a new optimizer from scratch.
Florian Waas, the leader of this program for Greenplum (now off on another venture) explained it to me this way. If the large rectangle in Figure 1 represents the total search space for a query, a modern query optimizer only searches the area in the small gray square… it looks for the best plan in that small space.
You may be surprised to learn that the optimizers used by every major DBMS product are single-threaded… they use only one core of a multi-core processor to search the space and produce a plan. There is no way to effectively search more with a faster single processor (even though you could search more the amount of time you spend as a percentage of the query execution time would stay the same… because the query execution would speed up as well)… so if the optimizer is to search more of the space it will have to use multiple cores and search the space in parallel… and this is exactly what Greenplum has accomplished.
The benchmark results for this are impressive (see here)… several queries in the TPC-DS suite run hundreds of times faster.
ORCA is available to early support customers now and the results map to the benchmark… some queries see an extreme performance boost, while others run significantly slower. This is to be expected from any first release optimizer.
But Greenplum have built another advanced technology into ORCA to reduce the time it will take to mature the software. ORCA includes AMPERe, an optimizer debugging facility that captures the state necessary to recreate problems and fix them. Together these capabilities: parallel search and specialized debugging have advanced the state of the art significantly.
What does it mean to you? It will take some time to shake out ORCA… and HAWQ is still very slow when compared to other analytic databases… and very very slow when compared to the in-memory databases available… and in-memory products like Spark are coming to the Hadoop eco-system. But at the price point HAWQ is a bargain. If you need an inexpensive batch engine that crunches numbers offline then in the next year, as ORCA matures, it may be worth a look.
As a side note… this topic introduces one of the issues related to in-memory databases… when even a very complex query completes with a sub-optimal plan in under a second how much time can you spend searching the plan space? I suspect that applying the parallel optimization principles developed by the Greenplum team will yield similar or even better improvements for in-memory… and these techniques will be a requirement very soon in that space.
This post will consider the implications of a full database federation as would be required by a Logical Data Warehouse. I’ll build on the concepts introduced in the posts on RDBMS-Hadoop integration (Part1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8).
Figure 1 summarizes those earlier concepts from simple to advanced.
But the full federation required to implement a logical data warehouse requires a significant step up from this. Simple federation will be a disaster and Basic federation will not be much better. Here is why.
Let’s add a database and use Figure 2 to consider the possibilities when we submit a query that joins Table A.One to A.Two to B.One to C.One. Note that in this picture we have included a Governor to execute the federated queries that is independent of any of the DBMSs… this is the usual case for federation.
In the simple case where the Governor executes the entire plan all of the data must come to the Governor. This is clearly unacceptable. Consider the worse case where a SELECT is issued against only one table… still all of the data must bubble up.
In the Basic case the problem is partially mitigated… less data moves after the predicates are resolved but the overhead will still kill query performance. A Governor with basic capabilities provides the minimal features to make this work. It is useful where slow federation is better than data replication… but that is about all.
However, the advanced case becomes seriously more complicated. The optimizer now has to decide if table B.One should move to C to join the data or should it move to A… or should it move data to the Governor.
The problem is further complicated by any resource shortage on any node or any functional capability differences. If the cost of data movement would suggest moving B data to C… but there is no CPU resource available on C then maybe a different decision should be made? If C.One is a big table but C is a column-store and the cost of the SELECT is small because a minimum of columns are required and the cardinality of those columns is small so the data might be fetched from the dictionary then we might make a different decision. If B is a fast in-memory database but there is no memory available then the cost changes. Finally, if there are twenty databases in your logical DW then the problem increases exponentially.
The point here is clear… data federation over n-tiers is a hard problem. There will be severe performance issues when the optimizer picks wrong. This is why the independent governor model is so attractive… Many of the variables around CPU resources and database capabilities are removed… and while the performance will be poor it will be predictably poor. You should consider the implications carefully… it is just not clear that a high-performance logical data warehouse is feasible simply laid over an existing architecture. And if you build on a model with a Governor you must be sure that the Governor, and the Provincial databases can handle the load. I suspect that the Governor will have to run on a cluster and use a shared-nothing architecture to handle a true enterprise-sized logical EDW.
HANA has a twist on this that is interesting. The Governor lives inside one of the database nodes… so for data in HANA there is no data movement cost unless the optimizer decides to send the data to another node. Further, HANA is very fast… and the performance will mitigate some of the slowness inherent in federation. Finally, HANA is a shared-nothing DBMS… so it is not a problem to move lots of data to HANA in support of big tables and/or thousands of concurrent queries.
I’ll try to use this thinking: simple, basic, advanced federation over some governed federator on a an in-memory or fast shared-nothing architecture to evaluate the products on the market that provide federation. This may prove interesting as the Logical Data Warehouse concept catches on and as products like Teradata’s QueryGrid come to market.
Now for Greenplum & Hadoop… to continue this thread on RDBMS-Hadoop integration (Part 1, Part 2,Part 3,Part 4, Part 5,Part 6) I have suggested that we could evaluate integration architecture using three criteria:
How parallel are the pipes to move data between the RDBMS and the parallel file system;
Is there intelligence to push down predicates; and
Is there more intelligence to push down joins and other relational operators?
The Greenplum interface is architecturally similar to the Teradata interface described in Part 4. Hadoop files are defined to the DBMS as external tables and there are capable parallel pipes to effectively move data from the HDFS side to GPDB. In addition Greenplum uses their Scatter-Gather method to load data into the GPDB effectively.
There is no ability to push down predicates. When a query executes all of the relevant data is sucked through the parallel pipes into the database segments for processing. This is very inefficient and there is not even the crude capability to push down processing provided by Teradata.
Finally, there is no ability to push down joins or aggregation.
Greenplum’s offering is not very advanced. To perform with Greenplum analytics data must move between the two storage layers with no intelligence to mitigate the cost.
On to the last post in the series Part 8 on SQL Server and Polybase.