My contacts from Strata read my post here and provided me with the following information:
- The performance numbers quoted for Greenplum HAWQ versus HIVE and Impala used Greenplum tables implemented over HDFS. In other words, this data is unreadable from outside of the Greenplum database… unreadable by any other program in the Hadoop eco-system… a proprietary format. If the tests were re-run using the same open data structures used by HIVE and Impala you would find the performance of HAWQ to be closer to, or worse than, those Hadoop components.
- The HAWQ performance numbers quoted represent a 2X-3X performance degradation over the same benchmark run on the native Greenplum RDBMS.
Again… this is from a credible source… but please consider this a rumor… and view this report, and the associated Greenplum marketing… with an appropriate measure of engineering skepticism.
Greenplum is a fantastic product… if I assume the report to be true then I do not understand why are they doing this… what use case is solved by a 300% performance degradation accessing proprietary data in HDFS? Remember, you could put Greenplum in the same cluster as Hadoop (UAP) and query everything HAWQ could query without the performance degradation. I just do not see the point? Could someone from GP comment and help my readers and myself here?
- EMC morphs Hadoop elephant into SQL database Hawq (go.theregister.com)
- EMC touts screeching Hawq SQL performance for Hadoop (go.theregister.com)
- EMC launches Hadoop distribution, takes aim at Cloudera (zdnet.com)
- Greenplum HAWQ (dbms2.com)