Michael Stonbreaker has suggested several times… and again in this interview… that databases will become more specialized and that “one size will fit none”. I’m sure that his argument is more nuanced than the sound bites in the interview, but in this post I’ll suggest a line of thinking that may lead to a different conclusion.
First, let’s agree that the word “fit” means the best price for the performance required to meet your company’s service level requirements.
Then let’s agree with the basic premise behind Dr. Stonebreaker’s argument… we agree that in any single-purpose application a specialized single-purpose DBMS can be developed that will out perform a generalized DBMS. This means that one-half of the fit, performance, is likely.
We would also agree that between the growth of open source databases and the general growth in the database space that it is likely that someone can and will develop specialized databases and bring them to market in cases where there is enough market to make it worthwhile. But it is important to note that specialization will be not become infinitely narrow… there has to be enough market for the special case to generate an attractive product.
So where is the disagreement: I do not believe that data is ever used in a single specialized business context. Not ever.
Let’s imagine that we have a business requirement for an extremely high volume OLTP application… and let us assume that the performance and/or scalability requirements are beyond what any general DBMS can provide and that the business ROI is significant… in other words, let us imagine that we are Google or Facebook. In this case we have no choice but to select or to develop an extreme, specialized, DBMS to solve the problem and extract the return.
But in this case, once the OLTP transactions are recorded… what do we do with the data? We need to use the data elsewhere in the business as the basis for deep analytics or for basic business intelligence… so we have to replicate the data to a second database. Since the second DBMS is also sort of specialized… it does not have to support OLTP… we select a second specialized, data warehousish product.
And then come new requirements for doing light queries in near real time to support operational analytics… so we build some sort of operational data store. Again we can select a product with a narrow technical sweet spot… but we have to replicate the data a third time.
In other words… given my premise… that data is never used in a single specialized context… specialized databases force replication… and replication allows for further specialization.
But what if the requirements are not so extreme? Then we might use a single conventional RDBMS for the EDW and for the ODS problems. If a more generalized DBMS product exists that could handle both the operational reporting and the analytic reporting requirements in a single image of the data then we could eliminate one replica and one DBMS. In other words, if the problem is not so extreme then a generalized solution might provide a solution in a single instance of the data avoiding replication.
Now the issue becomes: is the cost of a specialized system plus replication plus a second specialized system plus the cost of operating these systems less than the cost of a single generalized system? I believe that the answer will often be in favor of a single system even when the specialized systems are low-cost open source. Since “fit” is about cost maybe one size does fit now and again.
This suggests the strategy of the OldSQL vendors. They are offering a Swiss Army Knife product that serves multiple requirements. Their feature sets have grown over 30 years and they are pretty capable across a wide array of business problems… and with the columnar and in-memory features being added they continue to cover ever more extreme uses cases… not the most extreme use cases… but they cover more ground each year.
The strategy of the NewSQL vendors is to focus tight and hard. They might develop an extreme OLTP DBMS with no ability to do a join… a product with extreme scalability and no performance… or a graph database to solve for an important, narrow, set of queries… or a columnar product that performs analytics but support no OLTP. This trend feeds the specialize and replicate meme advocated by Dr. Stonebreaker.
HANA is a horse of a different color… neither NewSQL nor OldSQL. It is a new code base designed to solve for a very wide set of uses cases in a single instance of the data. We certainly agree in this blog with Dr. Stonebreaker’s contention that the 30 year old legacy code base has to be retired. But SAP contends that you can build a new, generalized, DBMS that solves for all but the most extreme cases.
This is a great spot to end the year… having laid out the battle we will cover in this blog ongoing… with the legacy OldSQL vendors trying to tack on to their legacy code base… and doing pretty well at it… with the NewSQL vendors trying to specialize and replicate… and with HANA offering a new code base designed to solve for the the whole picture. 2014 will be great fun to watch. This also sets the stage to ask next year whether “big data” applications are so extreme as to force users to specialize-replicate-specialize.
Have a great holiday season… and my best wishes. Thank you all for reading the Database Fog Blog in 2013… I hope for your continued attention in the New Year…