Vertica is the product I saw the most. In fact before they were acquired they were beginning to pop up a lot. The product is innovative in several dimensions. It is wholly column-oriented with several advanced columnar optimizations. Vertica has an advanced data loading strategy that quickly commits data and then creates the column orientation in the background. This greatly reduces the load time but slows queries while the tuples are transformed from rows to columns. Vertica offers a physical construct called a projection that may be built to greatly speed up query performance. Further, they provide a very sophisticated design workbench that will automatically generate projections.
Paraccel is the company I saw the least. I never saw it win… but I know that it does, in fact, win here and there. My impression, and I use this fuzzy word intentionally as I just don’t know, is that Paraccel is a solid product but it does not possess any fundamental architectural advantages that would allow it to win big. Every product will find an acorn now and again. The question is whether in a POC with the full array of competitors there is a large enough sweet spot to be commercially successful? I think that Paraccel cannot win consistently against the full array. Paraccel is now the basis for the Amazon Redshift data warehouse as a service offering. This will keep the product in the game for a while even if the revenues from a subscription model do not help the business much.
Where They Win
Vertica wins when projections are used for most queries. They are not likely to win without projections. This makes them a very effective platform for a single application data mart with a few queries that require fast performance… or for a data mart where the users tend to submit queries that fit into a small number of projection “grooves”.
Paraccel wins in the cloud based on Redshift. They can win based on price. They win now and again when the problem hits their sweet spot. They win when they compete against a small number of vendors (which increases their sweet spot).
Where They Lose
Vertica loses in data warehouse applications where queries cut across the data in so many ways that you cannot build enough projections. Remember that projections are physical constructs with redundant data.
Paraccel loses on application-specific marts and other data marts when the problems fit either Vertica’s projections or Netezza’s zone maps. They lose data warehouse deals to both Teradata and Greenplum when the query set is very broad. They will lose in Redshift when performance is the key… maybe. I have always thought that shared-nothing vendors made it too hard and too expensive to scale out. It should always have been easier to add hardware to improve performance than to apply people to tuning… but this has not been the case… maybe now it is (see here)?
In the Market
Since the HP acquisition the number of times Vertica shows up as a competitor has actually dropped. I cannot explain this but HP has had a difficult time becoming a player in the data warehouse space and had several false starts (Neoview, Exadata, …). The product is sound and I hope that HP figures this out… but HP is primarily a server vendor and it will be difficult for them to sell Vertica and stay agnostic enough to also sell HANA, Oracle, Greenplum, and others.
The Amazon Redshift deal breathes life into Paraccel. They have to hope that the exposure provided by Amazon will turn into on-premise business for them. They are still a venture-funded small company who has to compete against bigger players with larger sales forces. It will be tough.
My Guess at the Future
I worry about Vertica in the long run.
Until the Amazon deal I would have guessed that Paraccel was done… again, not because their technology was bad… it is not… but because it was not good enough to create a company that could go public and there was no apparent buyer… no exit. The Amazon Redshift deal may provide an exit. We will see? Maybe Amazon can take this solid technology into the cloud and make it a winner?
8 thoughts on “My 4 Cents: Vertica and Paraccel 1Q2013”
Good post. I would like to clarify that ParAccel has features which counter NZ Zone Maps and Vertica projections very effectively. ParAccel wont loose technologically to any of the vendors mentioned in the article. Greenplum is actually the farthest from a good data warehouse database and in many head to head bake-offs has been shown to be the slowest of all the technologies mentioned here.
I believe that ParAccel has features to counter Netezza and Vertica… and to beat Greenplum now and again… but in the market they do not seem to make ParAccel an out and out winner. ParAccel does not win every time they are up against these products… far from it… if they did they would have many more customers (see here) and be growing at a faster rate. Both Vertica and Greenplum grew faster than this before they were acquired.
So I will stand by my suggestion that ParAccel is solid… but that is all, K…
I would like to clarify a misconception about projections in Vertica. Specifically, all data in Vertica is stored in projections — there is no such thing as a query which “does not hit projections”. A more complete technical description can be found in our VLDB paper from last year: http://vldb.org/pvldb/vol5/p1790_andrewlamb_vldb2012.pdf
Our competitors often spew FUD that “Vertica needs one projection for each query to go fast” — this is simply untrue, but is semi-believable because it plays on classic DBA horrow shows with row store systems and catastrophic full table scans.
There is a fundamental technical difference between a native Column Store (e.g. Vertica) and a Row Store. In a Row Store, if you don’t have an appropriate index, the system must fallback to a full *table scan* to retrieve the data, which is often disastrous for large tables. In a Column Store, even if you don’t have the optimal physical structure (Projection in Vertica), at worst you end up with *column scans* for only those columns referred to in the query.
Furthermore, due to how we have our storage format set up a “full column scan” may very well not actually read the data off disk, but that is a topic left for a different forum.
Thanks, Andrew… I knew that you did not need a projection for each query… But I did not state it very clearly…
But the row store databases that have columnar table types, like Teradata and Greenplum, store columnar table data physically in columns… So they derive the benefits you claim for “native” columnar databases.
I have a different opinion of how effective in practice retrofitting a columnar storage format into a row store is in practice.
Often, row stores with a column option have to disable significant functionality (like UPDATING) because it is so very different in the columnar storage format, and all the components like the Optimizer and Executor were written assuming a row store architecture.
Such retrofitting isn’t impossible, I just haven’t personally seen it done well. Sadly for customers, the retrofitting story does make for great marketing.
I do not believe that either Greenplum or Teradata disable updating for columnar tables. And I do not believe that you could argue that these implementations of columnar have not improved their competitiveness versus Vertica.
It is much much more than marketing, Andrew… See Daniel Abadi’s view here…
Greenplum disables update on column option.
Comments are closed.