What is a Cloud-native Database?

Before this series is complete, I plan on defining in some detail what the various levels of Cloud-nativeness might be to allow readers to classify products based on architecture, not marketing. In this post, I’ll lay out some general concepts. First, let’s be real about cloud-things that are not cloud-native.

Any database that runs on physical hardware can run on virtual machines and, therefore, can run on virtual machines in the cloud. Databases with no cloud capabilities other than the ability to run on a VM on a cloud-provider are not cloud-native. Worse, there are lots of anecdotal stories that suggest that there are no meaningful savings to be had from moving a database from an on-premise server or VM to a cloud VM with no other change to take advantage of cloud elasticity.

So here are two general definitions for your consideration:

1) A cloud-native database will have one or more features that utilize capabilities found only on a cloud-computing platform, and

2) A cloud-native database will demonstrate economic benefits derived from those cloud-specific features.

Note that the way you pay for services via capital expenses (CapEx) or as operating expenses (OpEx) does not provide economic benefit. If the monetary costs of a subscription are more-or-less equal to the financial costs of a license, then savings are tied to tax law, not to economics. Beware of cloudy subscriptions that change how you pay without clearly adding beneficial cloudy features. It is these subscriptions that often are the source of the no-savings anecdotes mentioned above.

This next point is about the separation of storage from compute. Companies have long ago disconnected their databases from just-a-bunch-of-disks (JBOD) to shared storage such as SAN or S3. Any database today can use shared storage. It is not useful to say that any database that can use shared storage has separated storage from compute. Using the idea that there must be features, not marketing, that allow compute to scale separately and more-or-less dynamically from storage as the definition, we will be able to move forward in this area.

So:

3) For storage to be appropriately separated from Compute, it must be possible to scale Compute up and down dynamically.

Next, when compute scales, it scales at different granularity. An application or database that automatically adds and subtracts virtual machines provides different economics than a database that scales using containers. Apps that add and subtract containers have different economics than applications that use so-called serverless containers to scale. In this dimension, we will try to characterize granularity to account for the associated cloud economics. This topic will be covered in more detail later, so I’ll save the rule for that post.

Note that it is possible for database vendors to develop a granular architecture and to use the associated economics to their advantage. They may charge you for the time when any part of your database is running but be billed by their provider for smaller chunks. This is not an issue unless their overall costs become uncompetitive.

Last, and in some ways least, different products may charge for time in smaller or larger chunks. You might be charged by the hour, by the minute, by the second, or in smaller increments. Think about the scaling economics I suggested in the first posts of this thread. If you are charged by the hour, then there is no financial incentive to scale up to finish jobs to the minute. You will be charged the same for ten minutes or fifty minutes when you are charged by the hour. The rule:

4) Cloud databases that charge in smaller time increments are more economical than those that charge in larger increments.

The rationale here is probably obvious, but I’ll cover it in-depth in a later post.

With these concepts in place, we can discuss how architectural changes affect each aspect of the economics of databases in the cloud.

Note that this last sentence was written assuming that a British computer scientist with an erudite accent would speak it when they create the PBS series from these posts. Not.

By the way, a few posts from now, I am going to go back to some ideas I shared five years ago around the relationship between database processing and the underlying hardware platform. I’ll update this thinking with cloud computing in mind. You can find this thinking here which originally came from Jeff Dean and Peter Norvig (displayed in lots of places but here is one).

Hadoop and Company Financial Performance

I have posted several times about the impact of the Hadoop eco-system on a several companies (here, here, here, for example). The topic cam up in a tweet thread a few weeks back… which prompts this quick note.

Fours years ago the street price for a scalable, parallel, enterprise data warehouse platform was $US25K-$US35K per terabyte. This price point provided vendors like Teradata, Netezza, and Greenplum reasonable, lucrative, margins. Hadoop entered the scene and captured the Big Data space from these vendors by offering 20X slower performance at 1/20th the price: $US1K-$US5K per terabyte. The capture was immediate and real… customers who were selecting these products for specialized, very large, 1PB and up deployments switched to Hadoop as fast as possible.

Now, two trends continue to eat at the market share of parallel database products.

First, relational implementations on HDFS continue to improve in performance and they are now 4X-10X slower than the best parallel databases at 1/10th-1/20th the street price. This puts pressure on prices and on margins for these relational vendors and this pressure is felt.

In order to keep their installed base of customers in the fold these vendors have built ever more sophisticated integration between their relational products and Hadoop. This integration, however, allows customers to significantly reduce expense by moving large parts of their EDW to an Annex (see here)… and this trend has started. We might argue whether an EDW Annex should store the coldest 80% or the coldest 20% of the data in your EDW… but there is little doubt that some older data could satisfy SLAs by delivering 4X-10X slower performance.

In addition, these trends converge. If you can only put 20% of your old, cold data in an Annex that is 10X slower than your EDW platform then you might put 50% of your data into an Annex that is only 4X slower. As the Hadoop relational implementations continue to add columnar, in-memory, and other accelerators… ever more data could move to a Hadoop-based EDW Annex.

I’ll leave it to the gamblers who read this to guess the timing and magnitude of the impact of Hadoop on the relational database markets and on company financial performance. I cannot see how it cannot have an impact.

Well, actually I can see one way out. If the requirement for hot data that requires high performance accelerates faster than the high performance advances of Hadoop then the parallel RDBMS folks will hold their own or advance. Maybe the Internet of Things helps here…. but I doubt it.

More Database Supercomputing Technology

Last year two associates from Greenplum suggested that I read a very smart academic paper titled “Efficiently Compiling Efficient Query Plans for Modern Hardware” by Thomas Neumann. Having reiterated the idea of database supercomputing in my last blog (here)… I can now suggest this paper to you.

In short this paper suggests that the classic approach to building query plans using an iterator, an approach that assumed that I/O was the bottleneck, misses a significant opportunity for optimization at the hardware level. He suggests that an approach designed to keep data in the hardware registers as long as possible and push instructions to the data provides a significant performance boost. Further, he suggests that this approach extends the advantages of vector processing. The paper is here and a set of slides on the topic are here.

My friends subsequently started-up a little company named Vitesse Data (vitessedata.com) and implemented the technique over Postgres. Check out their site… the benchmark numbers are pretty cool and they certainly prove the paper. My guess is that this might be a next step in the database architecture race.

FYI… here is a link to more information on the LLVM compiler framework… another very cool bit of software.

One last note… some folks see optimization to the bare metal as an odd approach in a cloudy world where even the database is abstracted away from the bare metal by virtualization. But this thinking misses the point. At some point the database program executes in real hardware and these optimizations matter. What really happens is that bare metal optimization exposes more of the inefficiency of virtualization.

We are already starting to see the emergence of clouds that deploy on bare metal. I expect that we will shortly get to the point where things like databases are deployed on bare-metal cloud IaaS to squeeze every drop of performance out… while other programs are deployed in virtualized IaaS…

Database Super-computing

Today I am going to focus on a topic that I’ve suggested previously without the right emphasis: the new database architecture that uses vector processing on compressed columns to significantly accelerate performance.

The term “super-computing” was coined to describe the extreme hardware and software optimization developed to crunch numbers in scientific applications. As these technologies developed super-computer hardware evolved to leverage parallel microcomputers, software evolved to better leverage parallelism. Recently, microcomputers have started to incorporate the specialized instructions that support advanced mathematical applications. These super-computer instructions directly support vector algebra by manipulating strings of bits, vectors, in a single instruction. Finally, application developers recognized that these bit strings, these vectors, could be loaded into the microprocessors in a more effective manner to optimize their applications to the bare metal.

The effect of these optimizations accumulate for these applications as vectors compress and use memory more effectively, vectors load into processor cache more effectively, and vector instructions dramatically outperform integer instructions. The cumulative effect is that super-computer programs may be 10X-100X faster than commercial applications that provide the same result.

As this evolution progressed there was a similar evolution changing the architecture of database technology. Databases actually leveraged microcomputers before the high performance space made the move. But databases focused on the benefits of massively parallel I/O more than on the benefits of parallel compute. The drive to minimize the cost of I/O eventually led database developers to implement column store and then a very interesting discovery was made. Engineers recognized that a highly compressed column, a string of bits, could be processed as a vector.

Let’s see if we can make this 10X-100X number more than marketing foam. We can do this by roughly comparing the low-level processing of a chunk of data in integer and then in vector formats.

Let’s skip I/O processing and just focus on internals. This simplification greatly favors our integer DBMS. Keep in mind that the vector DBMS will process compressed vector data directly while the integer DBMS will expend resources to uncompress data and then take up 4X or more memory. This less efficient memory utilization will increase the chance that an I/O may be required and I/O is very expensive in the scenario we will discuss. Even an I/O on 1% of the time by the integer DBMS will provide a 1000X-100,000X advantage to the vector DBMS (see Figure 8 to gauge the latency to SSD or to disk).

Figure 8. Some Latency Metrics
Figure 8. Some Latency Metrics

So we’ll start with uncompressed integer data versus compressed vector data. We can assume that both databases are effective at populating cache. But the 4X compression advantage means that the vector processor is more likely to find data in the fast Level 1 cache and in the mid-range L2 cache. Given the characteristics outlined in Figure 8 we might suggest that the vector database is 4X more likely of finding data in cache than the integer database and that if we assume the latency of L2 cache as an estimate this results in a 15X-200X performance advantage.

Since data is in a vector form we can perform relational algebra and basic mathematics using vector algebra and vector addition. This provides another 8X-50X boost to the vector side

When we combine these advantages we see that a 10X-100X advantage is conservative. The bottom line is clear. A columnar database that effectively manages vectors into cache and further utilizes super-computing instructions will significantly out-perform an integer-based product.

The era of database super-computing has begun.

Dynamic Late Binding Schemas on Need

I very much like Curt Monash’s posts on dynamic schemas and schema-on-need… here and here are two examples. They make me think… But I am missing something… I mean that sincerely not just as a setup for a critical review. Let’s consider how dynamism is implemented here and there…so that I can ask a question of the audience.

First imagine a simple unschema’d row:

Rob KloppDatabase Fog Bloghttp://robklopp.wordpress.com42

A human with some context could see that there is a name string, a title string, a URL string, and an integer string. If you have the right context you would recognize that the integer holds the answer to the question: “What is the meaning of Life, the Universe, and Everything?”… see here… otherwise you are lost as to the meaning.

If you load this row into a relational database you could leave the schema out and load a string of 57 characters… or load the data parsed into a schema with Name, Title, URL, Answer. If you load this row into a key-value pair you can load it into an unschema’d row with the Key = Row and the Value equal to the string… or parse the data into four key-value pairs.

In any case you have to parse the data… If you store the data in an unschema’d format you have to parse the data and bind value to keys to columns late… if you store the data parsed then this step is unnecessary. To bind the data late in SQL you might create a view from your program… or more likely you would name the values parsed with SQL string functions. To parse the data into key-value pairs you must do the equivalent. The same logic holds true for more complex parsing. A graph database can store keys, values, and relationships… but these facets have to be known and teased out of the data either early or late. An RDBMS can do the same.

So what is the benefit of a database product that proclaims late binding as an advantage? Is it that late binding is easier to do than in an RDBMS? What am I missing?

Please do not respond with a list of other features provided by NewSQL and NoSQL databases… I understand many of the trade-offs… what I want to know is:

  • What can they do connected to binding values to names that an RDBMS cannot? And if there is no new functionality…
  • Is there someway they allow for binding that is significantly easier?

By the way, the Hitchhiker’s Guide is silent on the question of whether 42 is a constant or ever-changing. I think that I’ll ask Watson.

Database Computing is Supercomputing… Some external reading: May 2013

Superman: Doomsday & Beyond
Superman: Doomsday & Beyond (Photo credit: Wikipedia)

I would like to recommend to you John Appleby’s post  here on the HANA blog site. While the title suggests the article is about HANA, in fact it is about trends in computing and processors… and very relevant to posts here past, present, and upcoming…

I would also recommend Curt Monash’s site. His notes on Teradata here mirror my observation that a 30%-50% performance boost per release cycle is the target for most commercial databases… and what wins in the general market. This is why the in-memory capabilities offered by HANA and maybe DB2 BLU are so disruptive. These products should offer way more than that… not 1.5X but 100X in some instances.

Finally I recommend “What Every Programmer Should Know About Memory” by Ulrich Drepper here. This paper provides a great foundation for the deep hardware topics to come.

Database computing is becoming a special case, a commercial case, of supercomputing… high-performance computing (HPC) to those less inclined to superlatives. Over the next few years the differentiation between products will increasingly be due to the use of high-performance computing techniques: in-memory techniques, vector processing, massive parallelism, and use of HPC instruction sets.

This may help you to get ready…

The Future of Hadoop and of Big Data DBMSs

Image representing Hadoop as depicted in Crunc...

About four years ago Michael McIntire and I were pondering the rise of Hadoop. This blog will share bits of that conversation, provide an update based on the state of Hadoop today, and suggest a future state…

Briefly… we believed that the Hadoop eco-system was building all of the piece-parts of a very large database management system. You could see the basics: a distributed file system in HDFS, a low-level query engine in Map/Reduce with an abstraction in Pig, and the beginnings of optimization, SQL, availability, backup & recovery, etc.

We wondered why this process was underway… why would enterprises go to Hadoop when there were perfectly good relational VLDBs that could solve most of the problems… and where they could not… extending a mature RDBMS would be easier than the giant start-from-scratch of Hadoop.

We saw two reasons for the Hadoop project, process, and progress:

  • Michael pointed out that the RDBMS vendors just did not understand how to price their products on “Big Data” (to be fair that term was not in use then)… if you have 7PB of data, as Michael did… then at the current $35K/TB list price the bill would be $245M. Even if you discounted to $1K/TB the tab would be $7M. The DBMS vendors were giving the big guys a financial incentive to Build instead of Buy… and so Google Built and Yahoo Built and Hadoop emerged.
  • I pointed out that the academic community would support this… the ability to write a thesis based on new work in the DBMS space was becoming harder… but it was possible to sponsor papers that applied DBMS concepts to Hadoop and keep the PhD pipeline filled.

So there was funding, research, and development.

The narrative from here on is my own… Michael is off the hook..

Today Hadoop has a first release in the public domain. Dozens of companies are working to extend the core… some as contributors, some with a commercial interest, many with both incentives. The stack is maturing… and we now easily imagine a day when Hadoop will rival Teradata, Exadata, Netezza, and Greenplum in VLDB performance… with some product maturity and a rich set of features. And if Hadoop gets close and the price is free (or nearly so…) then the price/performance of Hadoop will make it unbeatable for “Big Data”.

In fact, the trigger for writing this piece now was the news a few weeks back from one of our Hadoop partners that HIVE was POC’d against one of the databases mentioned above on a big data problem and came close. The main query ran in 35 minutes on the DBMS and in 45 minutes with HIVE. The end is in sight… and sooner than expected.

What might this mean for the future?

Imagine a market where Hadoop can solve for big data problems… let’s say problems over 500TB just to draw a line… with the same performance as the best RDBMS, in a write-once/read-many use case like a data warehouse… for free. For FREE… plus the cost of the hardware. Hadoop wins… no contest.

Let’s suggest a market from 50TB to 500TB where a conventional RDBMS can out-perform Hadoop by 2X more-or-less… but Hadoop is free… so only applications where the performance matters can pay the price premium.

And let’s suggest a high performance in-memory database (IMDB) market that beats disk-based and SSD-based RDBMS by 50X for a 50% premium (based on new technologies like phase-change memory see here…) and can beat Hadoop by 1000X but at a higher cost.

You can see the squeeze:

  • IMDB will own the high performance market… most-likely in the 100TB and under space…
  • Hadoop will own the big data 500TB+ low-cost market…
  • and the conventional DBMS vendors will fight it out for adequate-performance/medium-priced applications from 100TB to 500TB… with continued pressure from the top and the bottom.

Economics will drive this. The conventional DBMS vendors are moving to SSD’s… which increases their price in the direction of an IMDB… and increases their price/performance in the same good direction. But the same memory in SSD’s will soon be generally available as primary memory. So the IMDB prices and the conventional DBMS prices will converge… but the IMDB products will retain a 50X-100X performance advantage by managing the new memory as memory instead of as a peripheral device. Hadoop may or may not leverage SSD’s… but it will be free.

Squeezed, methinks…

Numbers Everyone Should Know

Some of you have seen me build simple models to do a reality-check on architecture (see here, for example). Here are some metrics from a great presentation by Jeff Dean, a Google fellow.

Numbers Everyone Should Know

L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 3,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from disk 20,000,000 ns
Send packet CA->Netherlands->CA 150,000,000 ns

Of note is the 120X difference between the cost of reading 1MB from memory and the cost of reading 1MB from disk…

The entire presentation may be found here: http://www.odbms.org/download/dean-keynote-ladis2009.pdf

Happy Modeling…

Who is Massively Parallel? HANA vs. Teradata and (maybe) Oracle

I have promised not to promote HANA heavily on this site… and I will keep that promise. But I want to share something with you about the HANA architecture that is not part of the normal marketing in-memory database (IMDB) message: HANA is parallel from its foundation.

What I mean by that is that when a query is executed in-memory HANA dynamically shards the data in-memory and lets each core start a thread to work on its shard.

Other shared-nothing implementations like Teradata and Greenplum, which are not built on a native parallel architecture, start multiple instances of the database to take advantage of multiple cores. If they can start an instance-per-core then they approximate the parallelism of a native implementation… at the cost of inter-instance communication. Oracle, to my knowledge, does not parallelize steps within a single instance… I could be wrong there so I’ll ask my readers to help?

As you would expect, for analytics and complex queries this architecture provides a distinct advantage. HANA customers are optimizing price models sub-second in-real-time with each quote instead of executing a once-a-week 12-hour modeling job.

June 11, 2013: You can find a more complete and up-to-date discussion of this topic here… – Rob

As you would expect HANA cannot yet stretch into the petabyte range. The current HANA sweet spot is for warehouses or marts is in the sub-TB to 20TB range.

More on Exalytics Capacity…

I found myself wondering where did the rule-of-thumb for Exalytics  that suggests that TimesTen can use 800GB of a 1TB memory space… and requires 400GB of that space for work tables leaving room for 400GB of user data… come from (it is quoted everywhere… here is an example… see question #13).

Sure enough, this rule has been around for a while in the TimesTen literature… in fact it predates Exalytics (see here).

Why is this important? The workspace per query for a TPC-A transaction is very small and the amount of time the memory is held by a TPC-A transaction is very short. But the workspace required by a TPC-H query is at least 10X the space required by a TPC-A query and the duration of a TPC-H query is at least 10X the duration of a TPC-A query. The result is at least 100X more pressure on memory utilization.

So… I suspect that the 600GB of user data I calculated here may be off by more than a little. Maybe Exalytics can support 300GB of user data or 100GB of user data or maybe 60GB?

Note that this is not bad… all of this pressure on memory is still moved to Exalytics from the Exadata RAC subsystem… where memory is dear.

As a side note… it is always important to remember that the pressure on memory is the amount of memory utilized times the duration of the utilization. This is why the data flow architecture used in modern databases like Greenplum are effective. Greenplum uses more memory per transaction but it holds the memory for less time by never (almost) writing it to disk. This is different from older database architectures like Teradata and Oracle which use disk to store intermediate results… lowering the overall amount of memory required but increasing the duration of the query. More on this here