I have suggested that the big EDW parallel databases: Teradata, Exadata, Greenplum, and Netezza in particular will be squeezed over time. Colder data will move from those products to Hadoop and hotter data will move in-memory. You can see posts on this here, here, and here.
But there are three products, the Greenplum database (GPDB), HAWQ, and Aster Data, that will be squeezed more quickly as they are positioned either in between the EDW and Hadoop… or directly over Hadoop. In this post I’ll explain what I suspect Pivotal and Teradata are trying to do… why I believe their strategy will not work for long… and why readers of this blog should be careful moving forward.
The Squeeze picture assumes that Hadoop consumes more and more “big data” over time as the giant investment in that open source eco-system matures the software and improves both the performance and the feature base. I think that this is a very safe assumption. But the flip side of this assumption is that we recognize that currently the Hadoop eco-system is not particularly mature and that the performance is not top-notch. It is this flip side that provides the opportunity targeted by Pivotal and Teradata.
Here is the situation… Hadoop, even in its newbie state, is lowering the price point for biggish data. Large EDW implementations, let’s say over 100TB, that had no choice but to pick a large EDW database product 4 years ago are considering and selecting Hadoop more often at a price point 10X-20X less than the lowest street price offered by commercial DBMS vendors. But these choices are painful due to the relatively immature state of the Hadoop eco-system. It is this spot that is being targeted by Aster and GPDB… the “big data” spot where Aster and GPDB can charge a price greater than the cost of Hadoop but less than the cost of the EDW DBMS products… while providing performance and maturity worth the modest premium.
This spot, under the EDW and above Hadoop is a legitimate niche where revenue can be generated. But it is the niche that will be the first to be consumed by Hadoop as the various Hadoop RDBMS features mature. It is a niche that will not be commercially interesting in two years and will be gone in four years. Above is the Squeeze picture updated to position Aster, HAWQ and GPDB.
What would I do? Pivotal has some options. First, as I have stated before, GPDB is a solid EDW DBMS and the majority of it’s market even after running from the EDW space is there. They could move up the food chain back to the EDW space where they started and have an impact. This impact could be greater still if they could find a way to build a truly effective cloud-based EDW DBMS out of the GPDB. But this is not their current strategy and they are losing steam as an EDW both technically and in the market. The window to move back up is closing. Their current strategy which is “all-in” on Hadoop will steal business from GPDB for low-margin business around HAWQ and steal business from HAWQ for an even lower-margin business around Pivotal Hadoop. I wonder how long Pivotal can fund this strategy at a loss?
I’m not sure what I would do if I were Teradata? The investment in Aster Data is not likely to pay off before Hadoop consumes the space. Insofar as it is a sunk cost now… and they can leverage the niche described above… their positioning can earn them some revenue and stave off the full effect of the Squeeze for a short time. But Aster was never really a successful EDW play and there is no room for it to move up the food chain at Teradata.
What does this mean? Readers should take note and consider the risk that Hadoop wins in the near term… They might avoid a costly move to Aster or GPDB or HAWQ with a short lifespan. Maybe it is time to bite the bullet now and start introducing Hadoop into your infrastructure?
One final note… it is not my expectation that either the Hadoop DBMS nor any NoSQL DBMS product will consume the commercial RDBMS space anytime soon. There are reasons for this… stay tuned and I’ll post on this topic in the new year.
With this post the Database Fog Blog will receive its 100000th view. I am so grateful for your attention and consideration. And with this last post of my calendar year I wanted to say thanks… to send my regards to all whether you will be celebrating a holiday season or not… and to wish every reader, regardless of what calendar you follow, all the best in the next year…
9 thoughts on “Aster Data, HAWQ, GPDB and the First Hadoop Squeeze”
Enjoy your article as always but on this one I have to disagree. First, from what I have read HANA can’t support over 60 or 90 gb of (please correct me if I’m wrong)in memory space, let a lone a 1TB. I have been working with clients who on a daily average are loading 35 to 60 gb a day. I don’t see all of that data going in memory, maybe to SDD. This daily,weekly data is still to hot and Hadoop bi tools to immature to provide a viable solution. What I am seeing are these institutions using Hadoop to troll if you will the web or their call centers for feedback. Then acting on that feedback to bring actionable results to improve on their customer’s&l experience. Leveraging their DW and Hadoop platforms.
HANA scales out using the same shared-nothing architecture as Teradata, Netezza, etc. We have published benchmarks with 1PB in-memory (see http://www.saphana.com/servlet/JiveServlet/download/38-8369/SAP%20HANA%20One%20Petabyte%20Performance%20White%20Paper.pdf).
But this post is not about HANA… it is about the trend. I am not surprised to find that there is data too hot for Hadoop and not affordable in-memory. But I am trying to point out that Hadoop will get hotter and in-memory will become more affordable over the next few years and this will change how you mix your data. Further, the post suggests that the first victims as Hadoop matures will be the products under the EDW.
The real solution to the problem may be in the logical EDW concept touted by Gartner… who suggest that what is required is the flexibility to move data from platform to platform as costs and performance characteristics change.
My advice is to not get locked in… and try to pick carefully now while the DB market is in flux.
1) Why Hadoop is a competition to it’s commercial offerings? Wouldn’t they grow as Hadoop grows?
2) Is it really tough for existing EDW vendors to move from I/O based to in-memory?
I do not understand the first question, sorry. Could you re-phrase it for me rather than have me guess at what you are asking? Thanks.
It is not too tough for the existing vendors to move to in-memory… IBM is there now, and Microsoft and Oracle will be there soon. But it is unclear when any of them, besides SAP, get there with scale. IBM has hinted that they will scale up BLU sometime in the future… but as I understand it neither Microsoft nor Oracle have either hinted or announced the availability of in-memory scale out over multiple nodes.
So… your question is very fair… can Teradata build an in-memory add-on… yes they can… and I would be very surprised if they don’t have this in the works. They understand the impact of what I have called Level 3 columnar support (see here) and surely are working to get there.
Many products are attempting to support real-time and interactive capabilities by offering interactive SQL on top of Hadoop like Hortonworks Stinger initiative around Hive, Apache Drill, Apache Tajo, Cloudera’s Impala, Salesforce’s Phoenix (for HBase) and now Facebook’s Presto.
I agree Prabhanjana… this is exactly my point. They are working from the bottom up and will improve the feature set and the performance in each of these offerings over time. As they improve they will eat first into the big data analytics space occupied by GPDB, HAWQ, and Aster… then up into the EDW space. They won’t kill the EDW databases anytime soon… but they will take away the big data high end… first anything over 1PB will go then to 500GB then…
It took 10 years for GPDB to morph into a somewhat useable product in real-world production environment; and GPDB benefited tremendously from a mature database (postgreSQL). To argue that the slew of new SQL-on-HDFS product will become serious competition in 3 years, is overly optimistic at the best.
I wish all the SQL-on-HDFS products the best in the world. Serious competitions to MPP database? Bring them on, love them. But I don’t see them getting wide adoptions in 3 years. 5-10 years? Absolutely. Then again, who can foretell what MPP databases will evolve into in 5-10 years?
I agree, Dong… and have a draft in the works that will expand on this idea.
As for Greenplum… the question is how will they compete against software at the open source price point and be commercially viable? Competition is not just about product features… it is about being profitable.
I do like open source products, but I don’t discriminate commercial products just because of the perceived price points. Let us not rehash the talking points of open source vs commercial alternatives. In the end of the day, commercial products does need to justify the license fee by a superior product and/or significant reduction in maintenance and development costs. If SQL-on-Hadoop open source products can match features roughly with GPDB, or any MPP databases, then yes, the commercial products are in trouble. But 5-10 years are a long time, that I would not bet any serious money on it.
Comments are closed.