DW Cloud Economics – The Model

This post describe the model used to generate the content from the DW Cloud Economics – A Do Over post incase you want to check the numbers…

First, keep in mind that the numbers in this model are relative and indicative, not absolute. Here are the fixed parameters I used to generate these indicative numbers:

Cost per server per hour: $4.00 This could be any number or any currency and it would not matter.

Runtime per job with no contention: 3 hours This could be any duration and it would not change the relative results. If the number was well below an hour, it would increase the hour-based billing costs.

Jobs per Workload: 6 Again any number works.

Baseline Configuration: 24 Servers

Contention between 6 jobs on 24 Servers: 50% This sets the average amount of time a job will wait for resources. So, a job that runs for 3 hours with no contention will run in 4.5 hours with 50% contention. As the number of servers goes up the contention comes down for a fixed workload.

Baseline Performance: 2000 A completely arbitrary number that allows the model to calculate a relative price/performance. Relative “Performance” is calculated by dividing the Baseline Performance by the actual runtime. This means that a workload that takes 5 hours will yield a performance of 400 and a workload that take 2 hours will yield a performance number of 1000.

There are also a series of calculated values:

Total Servers / Workload is equal to Servers/Job for the Multi-tasking and Serial Execution configurations. It is equal to Servers/Job * Number of Jobs for the dedicated server per job configuration.

Scaleup is the ratio between the number of Total Servers / Workload and the Baseline Configuration. It is used to scale down the runtime and the contention.

Runtime / Job is the number of hours per job with no contention scaled down based on the Scaleup plus the time the Job is waiting due to contention. Contention scales down as the number of servers scales up.

Actual Runtime / Workload is Runtime / Job plus contention. Note that in the Serial Configuration and the Dedicated Server configuration there is no contention.

Paid Runtime / Workload is the Actual Runtime / Workload rounded to the nearest hour for the hourly billing table and equal to the Actual Runtime / Workload for the billing per minute table.

Cost / Workload is the Cost per Server per Hour parameter times the Paid Runtime / Hour value times the Total Servers / Workload value.

Price / Performance is the Cost / Workload divided by the Actual Runtime / Workload times the Baseline Performance parameter divided by the Cost / Workload value as noted above.

DB Cloud Economics – A Do Over

Here is a better model that demonstrates how cloudy databases can provide dramatic performance increases without increasing costs.

Mea culpa… I was going to extend the year-old series of posts on the economics of cloud computing and recognized that I poorly described the simple model I used, and worse, made some mistakes in the model. It was very lame. I’m so sorry… That said, I hope that you will like the extension to the model and the new conclusions.

I’ll get to those conclusions in a few short paragraphs and add a new post to explain the model.

If we imagine a workload consisting of 6 “jobs” that would execute on a dedicated cluster of 24 servers in 3 hours each with no contention, we can build a simple model to determine the price performance of the workload for three different configurations:

  1. M1: The usual configuration where all six jobs run together and share the configuration, multitasking and contending for the resources. As the configuration scales up the cost goes up, the runtime goes down, and the contention drops.
  2. M2: A silly model where each job runs serially on the configuration with no contention. As the configuration scales up the costs go up, the runtime goes down.
  3. M3: A model where each job runs independently, simultaneously, on its own cluster with no contention.

If we pay for servers by the hour the model shows that there is an opportunity to deploy many more servers, as many as one per “job”, to reduce the runtime and the cost. Here are the results:

ConfigServers DeployedActual Runtime / Workload (Hours)Paid Runtime / Workload
(Hours)
Cost / WorkloadPrice / Performance
Multitasking244.55$480444
 483.84$768533
 963.44$1,536593
Serial Execution2418.018$1,728111
 489.09$1,728222
 964.54.5$1,920444
Dedicated Server per Job1440.51$5764000
 2880.31$1,1528000
 5760.11$2,30416000
Table 1. Costs with Hourly Billing

Note that the price/performance metric is relative across the models. You can see the dramatic performance and price/performance increases using a configuration where each unit of work gets a dedicated set of resources. But the more dramatic picture is exposed in Table 2.

ConfigurationServers DeployedActual Runtime / Workload (Hours)Paid Runtime / Workload (Hours)Cost / WorkloadPrice / Performance
Multitasking244.54.5$432444
 483.83.8$720533
 963.43.4$1,296593
Serial Execution2418.018.0$1,728111
 489.099.0$1,728222
 964.54.5$1,728444
Dedicated Server per Job1440.50.5$2884000
 2880.30.3$2888000
 5760.10.1$28816000
Table 2. Costs with per Minute Billing

Here the granular billing reduces the total cost with the same dramatic price and price/performance increases. This was the point I was trying to make in my earlier posts. Note that this point, that fine-grained billing can be used to significantly reduce costs, is why deploying work in containers, or better still as serverless transactions, is so cost-effective. It is why deploying virtual machines in the cloud misses the real cost savings. In other words, it is why building cloud-native implementations is so important and why cloud-native databases will quickly overcome databases that cannot get there.

I also was trying to show that work deployed across these configurations, what I call a “job”, can be ETL jobs or single queries or a set of queries or a Spark job. If you have lots of smaller work, it may be best to run them in a multitasking configuration to avoid the cost of tearing down and starting up new configurations. But even here there is a point where the tear down cost can be mitigated across multiple semi-dedicated configurations.

What is a Cloud-native Database?

Before this series is complete, I plan on defining in some detail what the various levels of Cloud-nativeness might be to allow readers to classify products based on architecture, not marketing. In this post, I’ll lay out some general concepts. First, let’s be real about cloud-things that are not cloud-native.

Any database that runs on physical hardware can run on virtual machines and, therefore, can run on virtual machines in the cloud. Databases with no cloud capabilities other than the ability to run on a VM on a cloud-provider are not cloud-native. Worse, there are lots of anecdotal stories that suggest that there are no meaningful savings to be had from moving a database from an on-premise server or VM to a cloud VM with no other change to take advantage of cloud elasticity.

So here are two general definitions for your consideration:

1) A cloud-native database will have one or more features that utilize capabilities found only on a cloud-computing platform, and

2) A cloud-native database will demonstrate economic benefits derived from those cloud-specific features.

Note that the way you pay for services via capital expenses (CapEx) or as operating expenses (OpEx) does not provide economic benefit. If the monetary costs of a subscription are more-or-less equal to the financial costs of a license, then savings are tied to tax law, not to economics. Beware of cloudy subscriptions that change how you pay without clearly adding beneficial cloudy features. It is these subscriptions that often are the source of the no-savings anecdotes mentioned above.

This next point is about the separation of storage from compute. Companies have long ago disconnected their databases from just-a-bunch-of-disks (JBOD) to shared storage such as SAN or S3. Any database today can use shared storage. It is not useful to say that any database that can use shared storage has separated storage from compute. Using the idea that there must be features, not marketing, that allow compute to scale separately and more-or-less dynamically from storage as the definition, we will be able to move forward in this area.

So:

3) For storage to be appropriately separated from Compute, it must be possible to scale Compute up and down dynamically.

Next, when compute scales, it scales at different granularity. An application or database that automatically adds and subtracts virtual machines provides different economics than a database that scales using containers. Apps that add and subtract containers have different economics than applications that use so-called serverless containers to scale. In this dimension, we will try to characterize granularity to account for the associated cloud economics. This topic will be covered in more detail later, so I’ll save the rule for that post.

Note that it is possible for database vendors to develop a granular architecture and to use the associated economics to their advantage. They may charge you for the time when any part of your database is running but be billed by their provider for smaller chunks. This is not an issue unless their overall costs become uncompetitive.

Last, and in some ways least, different products may charge for time in smaller or larger chunks. You might be charged by the hour, by the minute, by the second, or in smaller increments. Think about the scaling economics I suggested in the first posts of this thread. If you are charged by the hour, then there is no financial incentive to scale up to finish jobs to the minute. You will be charged the same for ten minutes or fifty minutes when you are charged by the hour. The rule:

4) Cloud databases that charge in smaller time increments are more economical than those that charge in larger increments.

The rationale here is probably obvious, but I’ll cover it in-depth in a later post.

With these concepts in place, we can discuss how architectural changes affect each aspect of the economics of databases in the cloud.

Note that this last sentence was written assuming that a British computer scientist with an erudite accent would speak it when they create the PBS series from these posts. Not.

By the way, a few posts from now, I am going to go back to some ideas I shared five years ago around the relationship between database processing and the underlying hardware platform. I’ll update this thinking with cloud computing in mind. You can find this thinking here which originally came from Jeff Dean and Peter Norvig (displayed in lots of places but here is one).