In this post we will consider again the implications of implementing a shared-nothing architecture in the cloud. That is, we will start wondering about how to extend a static shared-nothing cluster deployed into an elastic hardware environment.
This is the first of three posts inspired by a series of conversations with the folks at Bityota (Bityota.com). After seeing the topics they asked if they could use the content in their marketing… so to be transparent… this is sort of a commercial post… but as you will see there is no promotional foam in the narrative.
There is an architectural mismatch between Cloud Computing and a shared-nothing architecture.
In the Cloud: compute, processors and memory, scale independently of storage, disk and I/O bandwidth. This independence allows for elasticity: more compute can be dynamically added with full access to data on a shared disk subsystem. Figure 1 shows this relationship and depicts the elasticity that makes the Cloud so compelling.
In a shared-nothing architecture, compute and storage scale together as shown in Figure 2. This tight connection ensures that I/O bandwidth, the key to read performance, is abundant. But, in the end scalability is more about scaling I/O than about scaling compute. And this fact is due to the imbalance Moore’s Law injects into computer architecture… compute performance has far outstripped I/O performance over the years creating an imbalance.
To solve for this imbalance database engineers have worked very hard to avoid I/O. They invented indexing and partitioning and compression and column-store all with the desire to avoid I/O. When they could not avoid I/O they worked hard to minimize the cost by pre-fetching data into memory and, once fetched, by keeping data in memory as long as possible.
For example, one powerful and little understood technique is called the data flow architecture. Simply put data flow moves rows through each step of a query execution plan and out without requiring intermediate I/O. The original developers of Postgres, Sybase, SQL Server, Teradata, DB2, and Oracle did not have enough memory available to flow rows through so they spill data to the storage layer in between each step in the plan. Figure 3 shows how classic databases spill and Figure 4 shows how a more modern data flow architecture operates.
Why is this relevant? In a classic RDBMS the amount of I/O bandwidth available per GB of data is static. You cannot add storage without redistributing the data. So even though your workload has peaks and valleys your database is bottlenecked by I/O and this cannot flex. In a modern RDBMS most of the work is performed in memory without intermediate I/O… and as we discussed, compute and memory can elastically flex in a Cloud.
Imagine an implementation as depicted in Figure 5. This architecture provides classic static shared-nothing I/O scalability to read data from disk. However, once the read is complete and a modern data flow takes over the compute and memory is managed by a scalable elastic layer. The result is an elastic shared-nothing architecture that is well suited for the cloud.
In fact you can imagine how this architecture might mature over time. In early releases a deployment might look like Figure 5 where the advantage of the cloud is in devising a cost-effective flexible configuration. As the architecture matures you could imagine a cloud deployment such as in Figure 6 where the 1:1 connection between storage nodes and compute nodes is broken and compute can scale dynamically with the workload.
Cloud changes everything and it will significantly change database systems architecture.
It is strange to say… but the torch that fires innovation has been passed from the major database vendors to a series of small start-ups. Innovation seems to occur exclusively in these small firms… with the only recent exception being the work done at SAP on HANA.