I have posted several times about the impact of the Hadoop eco-system on a several companies (here, here, here, for example). The topic cam up in a tweet thread a few weeks back… which prompts this quick note.
Fours years ago the street price for a scalable, parallel, enterprise data warehouse platform was $US25K-$US35K per terabyte. This price point provided vendors like Teradata, Netezza, and Greenplum reasonable, lucrative, margins. Hadoop entered the scene and captured the Big Data space from these vendors by offering 20X slower performance at 1/20th the price: $US1K-$US5K per terabyte. The capture was immediate and real… customers who were selecting these products for specialized, very large, 1PB and up deployments switched to Hadoop as fast as possible.
Now, two trends continue to eat at the market share of parallel database products.
First, relational implementations on HDFS continue to improve in performance and they are now 4X-10X slower than the best parallel databases at 1/10th-1/20th the street price. This puts pressure on prices and on margins for these relational vendors and this pressure is felt.
In order to keep their installed base of customers in the fold these vendors have built ever more sophisticated integration between their relational products and Hadoop. This integration, however, allows customers to significantly reduce expense by moving large parts of their EDW to an Annex (see here)… and this trend has started. We might argue whether an EDW Annex should store the coldest 80% or the coldest 20% of the data in your EDW… but there is little doubt that some older data could satisfy SLAs by delivering 4X-10X slower performance.
In addition, these trends converge. If you can only put 20% of your old, cold data in an Annex that is 10X slower than your EDW platform then you might put 50% of your data into an Annex that is only 4X slower. As the Hadoop relational implementations continue to add columnar, in-memory, and other accelerators… ever more data could move to a Hadoop-based EDW Annex.
I’ll leave it to the gamblers who read this to guess the timing and magnitude of the impact of Hadoop on the relational database markets and on company financial performance. I cannot see how it cannot have an impact.
Well, actually I can see one way out. If the requirement for hot data that requires high performance accelerates faster than the high performance advances of Hadoop then the parallel RDBMS folks will hold their own or advance. Maybe the Internet of Things helps here…. but I doubt it.
8 thoughts on “Hadoop and Company Financial Performance”
Hi Rob.. it’s been a while since got to see your blogs. I’m one them who envision the MPPs will have a much smaller to play in the coming years as Hadoop is catching up and EDW/staging platform. On the other hand I see customer wants to experiment by moving smaller OLTP systems ..may be one off and I’m having a tough time convincing them Hadoop is meant for that. But a lot of 10-15 sessions in the Hadoop summits and Strata world sending false customers thinking Hadoop is as matured as Oracle :(.
Hi Rafi… I do not see Hadoop as a capable OLTP platform… Not even close. Why not use PostgreSQL if cost is a concern?
Have you looked at NoSQL for smaller OLTP systems? Couchbase (www.couchbase.com) is a great example of an Enterprise NoSQL technology that is being used for new Applications in the digital economy. This is definitely a lower cost solution and will outperform PostgreSQL and other like solutions.
Since the Database Fog Blog strives to represent database architecture, not database marketing… let’s be transparent here. You work for Couchbase and your comment is more about peddling your wares than about architecture.
I appreciate the comment though… as it inspires a new post coming soon.
In that post I will ask, among other things, how a free download of Couchbase could possibly be “definitely” lower cost than a free download of PostgreSQL.
You guessed this 3 years ago. Hadoop is now formidable force in data management space. Heck, Microsoft talks about hadoop in their dev conference!
So, an OLTP database, or ORCL in specific should never be worried of Hadoop? There has been a component in hadoop stack for almost every use case (real time, stream processing…etc) and they are growing. Would you see a case where Hadoop will be used for all sorts of workloads in future?
You raise an interesting question… At this point in time I do not see the core technologies of HDFS and/or in the various layers on top of HDFS as being capable of replacing Oracle in the OLTP arena. I do not see an emerging Hadoop component with the breadth of Oracle with the required ACIDity and performance. I do not see HDFS as a high performance persistence layer.
It could be that an in-memory component on top could persist data to HDFS and compete in that space in this way… but until in-memory becomes the norm, and everyone deploys servers with big memory, this will not be competitive with Oracle.
But PostgreSQL is competitive with Oracle now… and MariaDB is competitive with both Oracle and PostgreSQL in a very important high performance computing niche.
Why wait for Hadoop?
Comments are closed.