I want to soften my criticism of Greenplum‘s announcement of HAWQ a little. This post by Merv Adrian convinced me that part of by blog here looked at the issue of whether HAWQ is Hadoop too simply. I could outline a long chain of logic that shows the difficulty in making a rule for what is Hadoop and what is not (simply: MapR is Hadoop and commercial… Hadapt is Hadoop and uses a non-standard file format… so what is the rule?). But it is not really important… and I did not help my readers by getting sucked into the debate. It is not important whether Greenplum is Hadoop or not… whether they have committers or not. They are surely in the game and when other companies start treating them as competitors by calling them out (here) it proves that this is so.
It is not important, really, whether they have 5 developers or 300 on “Hadoop”. They may have been over-zealous in marketing this… but they were trying to impress us all with their commitment to Hadoop… and they succeeded… we should not doubt that they are “all-in”.
This leaves my concern discussed here over the technical sense in deploying Greenplum on HDFS as HAWQ… or deploying Greenplum in native mode with the UAP Hadoop integration features which include all of the same functionality as HAWQ… and 2x-3X better performance.
It leaves my concern that their open source competition more-or-less matches them in performance when queries are run against non-proprietary, native Hadoop, data structures… and my concerns that the community will match their performance very soon in every respect.
It is worth highlighting the value of HAWQ’s very nearly complete support for the SQL standard against native Hadoop data structures. This differentiates them. Building out the SQL dialect is not a hard technical problem these days. I predict that there will be very nearly complete support for SQL in an open source offering in the next 18-24 months.
These technical issues leave me concerned with the viability of Greenplum in the market. But there are two ways to look at the EMC Pivotal Initiative: it could be a cloud play… in which case Greenplum will be an uncomfortable fit; or it could be an open source play… in which case, here comes the wacky idea, Greenplum could be open-sourced along side Cloud Foundry and then this whole issue on committers and Hadoopiness becomes moot. Greenplum is, after all, Postgres under the covers.