An Elastic Shared-Nothing Architecture

In this post we will consider again the implications of implementing a shared-nothing architecture in the cloud. That is, we will start wondering about how to extend a static shared-nothing cluster deployed into an elastic hardware environment.

This is the first of three posts inspired by a series of conversations with the folks at Bityota ( After seeing the topics they asked if they could use the content in their marketing… so to be transparent… this is sort of a commercial post… but as you will see there is no promotional foam in the narrative.

– Rob

There is an architectural mismatch between Cloud Computing and a shared-nothing architecture.

In the Cloud: compute, processors and memory, scale independently of storage, disk and I/O bandwidth. This independence allows for elasticity: more compute can be dynamically added with full access to data on a shared disk subsystem. Figure 1 shows this relationship and depicts the elasticity that makes the Cloud so compelling.

Figure 1. Elastic Compute

In a shared-nothing architecture, compute and storage scale together as shown in Figure 2. This tight connection ensures that I/O bandwidth, the key to read performance, is abundant. But, in the end scalability is more about scaling I/O than about scaling compute. And this fact is due to the imbalance Moore’s Law injects into computer architecture… compute performance has far outstripped I/O performance over the years creating an imbalance.

Figure 2. Shared-nothing Bundles
Figure 2. Shared-nothing Bundles

To solve for this imbalance database engineers have worked very hard to avoid I/O. They invented indexing and partitioning and compression and column-store all with the desire to avoid I/O. When they could not avoid I/O they worked hard to minimize the cost by pre-fetching data into memory and, once fetched, by keeping data in memory as long as possible.

For example, one powerful and little understood technique is called the data flow architecture. Simply put data flow moves rows through each step of a query execution plan and out without requiring intermediate I/O. The original developers of Postgres, Sybase, SQL Server, Teradata, DB2, and Oracle did not have enough memory available to flow rows through so they spill data to the storage layer in between each step in the plan. Figure 3 shows how classic databases spill and Figure 4 shows how a more modern data flow architecture operates.

Figure 3. Classic Query Plan
Figure 3. Classic Query Plan
Figure 4. Data Flow Query Plan
Figure 4. Data Flow Query Plan

Why is this relevant? In a classic RDBMS the amount of I/O bandwidth available per GB of data is static. You cannot add storage without redistributing the data. So even though your workload has peaks and valleys your database is bottlenecked by I/O and this cannot flex. In a modern RDBMS most of the work is performed in memory without intermediate I/O… and as we discussed, compute and memory can elastically flex in a Cloud.

Imagine an implementation as depicted in Figure 5. This architecture provides classic static shared-nothing I/O scalability to read data from disk. However, once the read is complete and a modern data flow takes over the compute and memory is managed by a scalable elastic layer. The result is an elastic shared-nothing architecture that is well suited for the cloud.

Figure 5, Flowing to a Separate Compute Node
Figure 5, Flowing to a Separate Compute Node

In fact you can imagine how this architecture might mature over time. In early releases a deployment might look like Figure 5 where the advantage of the cloud is in devising a cost-effective flexible configuration. As the architecture matures you could imagine a cloud deployment such as in Figure 6 where the 1:1 connection between storage nodes and compute nodes is broken and compute can scale dynamically with the workload.

Figure 6. Elastic Compute on a Shared-nothing Architecture
Figure 6. Elastic Compute on a Shared-nothing Architecture

Cloud changes everything and it will significantly change database systems architecture.

It is strange to say… but the torch that fires innovation has been passed from the major database vendors to a series of small start-ups. Innovation seems to occur exclusively in these small firms… with the only recent exception being the work done at SAP on HANA.

More thinking on Specialized Databases

Recently I posted (here) some thinking that suggested that the cost of replicating data into specialized databases might outweigh the benefits of specialization. This post will present a counter view and try to sort out when a specialized database might make sense.

In the ZDNet post here: “Look at What Google and Amazon are doing with Databases: That’s your future” Toby Wolpe and Neo Technology CEO Emil Eifrem suggest that:

“The era of the one-size-fits-all database is over. It used to be when I grew up as a developer that for the architect in the project, when it came to choosing the bottom layer of the stack — the persistence layer — the choice was Microsoft, or IBM, or Oracle, or Sybase. It was a vendor choice.

They were all the same type of database. But that era has gone forever and it will never come back because data is just so big and so irregularly shaped now that you’re always going to be able to get a hundred times improvement, a thousand times improvement, a million times improvement if you get a data technology that is shaped like the shape of your data.”

While I have suggested that a swiss army knife DBMS that solves many problems from a single data source… thereby eliminating the cost and complexity of data replication and data synchronization… might provide a sensible choice for most commercial applications.

Actually I agree with Eifrem and Wolpe in many respects… but there is a difference in our starting assumptions. Let me be clear first about where I strongly agree.

When data volumes grow to web-scale… to Google-scale or Amazon-scale… then the inefficiencies of one-size-fits-all amplify and become intractable… so with a specialized DBMS you might indeed see 100X, 1000X, or more performance advantage and gain a competitive edge from replication and specialization.

But a lot of core data is not Big Data. This is where we do not seem to agree. While our company’s all aspire to have a customer database that is in the petabyte range… it is just not usually the case. Likewise we aspire to have a transaction database requiring petabyte scale… but it just is not the case in most businesses even if you keep years and years of history.

Let’s consider graph databases… Maybe customer data should be in a graph database to specialize it for processing relationships. But this is likely to make it sub-optimal for many other processes… in fact it is the thesis of the ZDNet article that it will make it sub-optimal for many other processes… and so replication to more specialized databases is the only alternative.

How might we handle this relationship problem in a generalized DBMS? HANA, for example, can form graphs in-memory from data shaped into columns… unfolding the graphDB blade from the swiss army knife when required but storing the data in a generalized shape otherwise.

It may be true that there could be orders of magnitude advantage for big data shaped into a specialized graphDB form… But if your customer database is in the terabyte range or less, then the advantage may be negligible… or at least the advantage may not justify the cost of replication into two forms.

And think about the implications of specializing big data. Google replicates tens of petabytes of data into multiple shapes to gain competitive advantage… and ten petabytes specialized and replicated ten times is really really big data.

So I agree with parts of the ZDNet post… big data companies are likely to be pushed by the competition to store the data multiple times in specialized replicated big databases… and for this you will look to Google, Amazon, Netflix, and the like for database technology. But most enterprises will be able to store core data in generalized databases… and will extend into big data realms only as machine-to-machine transactions and/or the Internet of Things drive them there… and then they will extend their data architectures rather than replicate again and again.

Cloud DBMS < High Performance DBMS

English: Cloud
English: Cloud (Photo credit: Wikipedia)

In my post here I suggested that database computing was becoming a special case of high-performance computing. This trend will bump up against the trend towards cloud computing and the bump will be noisy.

In the case of general commercial computing customers running cloudy virtualized servers paid a 5%-20% performance penalty… but the economics still worked for the cloud side.

For high-performance database computing it is unclear how much the penalty will be? If a virtualized, cloudy, database gives up performance because SIMD becomes problematic, priming the cache becomes hard, CPU stalls become more common, and there is a move from a shared nothing architecture to SANs or SAN-like shared data devices, then the penalty may be 300%-500% and the cloud databases will likely lose.

As I noted in the series starting here, there are lots of issues around high-performance database computing in the cloud. It will be interesting to see how the database vendors manage the bump and the noise. So keep an eye out. If your database of choice starts to look cloudy… if it becomes virtualized and it starts moving from a shared-nothing cluster to a SAN… then you will know which side of the bump they are betting on. And if they pick the cloudy side then you need to ask how they plan to architect the system to hold the penalty to under 20%…

I also mentioned in that series that in-memory databases had an advantage over peripheral-based databases as they did not have to pay a penalty for de-coupling the IO bandwidth that is part of a shared-nothing cluster. But even those vendors have to manage the fact that the database is abstracted… virtualized… away from the hardware.

If I were King I would develop a high-performance database that implemented the features of a cloud database: elasticity, easy provisioning, multi-tenancy; over bare metal. Then you might get the best of both worlds.

Thoughts on AWS Redshift…

English: In visible light, 4C 71.07 is less th...
English: In visible light, 4C 71.07 is less than impressive, just a distant speck of light. It’s in radio and in X-rays – and now, gamma rays – that this object really shines. (Photo credit: Wikipedia)

The shared-nothing architecture has, from the beginning, offered the promise of using hardware to solve performance problems rather than applying staff and tuning. By this I mean… if you can add nodes and scale out to improve query response then why not throw hardware at performance problems rather than build a fragile infrastructure of aggregate tables, cubes, pre-joined/de-normalized marts, materialized views, indexes, etc. Each of these performance workarounds are both expensive to build and expensive to operate.

There are several reasons, I think tuning has been more popular than scaling. Not in any particular order:

First, hardware vendors made it too hard to order/provision new nodes. You could not just press a button and buy capacity. Vendors wanted to charge you for terabytes when all you wanted might be CPU and Memory to fix the problem (see here, sigh). You had to negotiate a deal with a rep, work through your procurement group, wait weeks for delivery. Then, the hardware you have might not match the hardware for sale. New models could not be mixed with old nodes… so you had to consider a whole new cluster. The process was so not-agile. There have been attempts to fix this… and some of them are credible… but none are popular.

Next, the process to install the new nodes was moderately difficult… not rocket science but not seamless to be sure. Data had to move. Backups had to be reconfigured and sometimes old backups could not be easily restored to the new configuration. There was no easy way to burn in the new hardware and if it failed early there were issues reversing the process. It just was not considered an everyday operational process… it was the exception and that made it tough. This process too has improved over time but it never became a no-brainer.

Finally, buying hardware is a capital expense (CAPEX). Even if you had to pay more in people costs to do the hard work of tuning those were operational expenses… and funding was easier to get.

Redshift changes the game here. Even if the Paraccel database is just OK (see here)… and if the overhead of running in the virtualized AWS environment makes it worse… it is still OK. You can provision new hardware in a couple of minutes. If Teradata is 25% faster than Paraccel for your query set… so what? You can add 25% more Redshift for a fraction of the extra cost of Teradata. Need more performance? Dial it in. Need permission? No problem because it is all OPEX dollars.

Redshift will deliver the flexibility to make scale out less expensive than tune it out. The TCO reductions from running a simple system where hardware solves performance problems instead of ETL and staff will be significant. This is how it always should have been.

The issue for Redshift will be… given the trend to reduce the data latency from operations to BI… can you move significant amounts of data from on-premise into the cloud fast enough to meet service level agreements?

Do not overlook Redshift… Amazon could be a player in the EDW space… But look for other databases to make inroads here as well. In-memory databases could work well in the cloud as they avoid some of the hardware abstraction required to access disks.

%d bloggers like this: