More on The Future of Hadoop and of Big Data DBMSs


First, you should look at Google’s Spanner paper here… this is the next-gen from Google and once it is embraced by the open source community it will put even more pressure on the big data DBMSs. Also have a look at YARN the next Map/Reduce… more pressure still…

Next… you can imagine that the conventional database folks will quibble a little with my analysis. Lets try to anticipate the push-back:

  • Hadoop will never be as fast as a commercial DBMS

Maybe not… but if it is close then a little more hardware will make up the difference… and “free” is hard to beat in price/performance.

  • SSD devices will make a conventional DBMS as fast as in-memory

I do not think so… disk controllers, the overhead of non-memory I/O, and an inability to fully optimize processing for in-memory will make a big difference. I said 50X to be conservative… but it could be 200X… and a 200X performance improvement reduces the memory required to process a query by 200X… so it adds up.

  • The Price of IMDB will always be prohibitive

Nope. The same memory that is in SSD’s will become available as primary memory soon and the price points for SSD-based and IMDB will converge.

  • IMDB won’t scale to 100TB

HANA is already there… others will follow.

  • Commercial customers will never give up their databases for open source

Economics means that you pay me now or you pay me later… companies will do what makes economic sense.

The original post on this is here