More Database Supercomputing Technology

Last year two associates from Greenplum suggested that I read a very smart academic paper titled “Efficiently Compiling Efficient Query Plans for Modern Hardware” by Thomas Neumann. Having reiterated the idea of database supercomputing in my last blog (here)… I can now suggest this paper to you.

In short this paper suggests that the classic approach to building query plans using an iterator, an approach that assumed that I/O was the bottleneck, misses a significant opportunity for optimization at the hardware level. He suggests that an approach designed to keep data in the hardware registers as long as possible and push instructions to the data provides a significant performance boost. Further, he suggests that this approach extends the advantages of vector processing. The paper is here and a set of slides on the topic are here.

My friends subsequently started-up a little company named Vitesse Data (vitessedata.com) and implemented the technique over Postgres. Check out their site… the benchmark numbers are pretty cool and they certainly prove the paper. My guess is that this might be a next step in the database architecture race.

FYI… here is a link to more information on the LLVM compiler framework… another very cool bit of software.

One last note… some folks see optimization to the bare metal as an odd approach in a cloudy world where even the database is abstracted away from the bare metal by virtualization. But this thinking misses the point. At some point the database program executes in real hardware and these optimizations matter. What really happens is that bare metal optimization exposes more of the inefficiency of virtualization.

We are already starting to see the emergence of clouds that deploy on bare metal. I expect that we will shortly get to the point where things like databases are deployed on bare-metal cloud IaaS to squeeze every drop of performance out… while other programs are deployed in virtualized IaaS…

4 thoughts on “More Database Supercomputing Technology

  1. Rob, I assume that your comparison of bare-metal cloud IaaS over virtualized IaaS – is Containers in the first case and Virtual Machines in the second. And if so – I completely agree that Container technology is the only one for IO based Compute Engine such as a Database. Defining a Database as an Application is another discussion…

    Like

  2. Interesting last few posts Rob, I’ve enjoyed reading them. Seeing your past with SAP HANA, I didn’t know how familiar you are with DSSD? When it comes high performance database computing, that’s their sweet spot. A recent article about the Texas Advanced Computing Center (TACC), leveraging this technology has now set some pretty impressive benchmarks for High Performance Database computing. See the article below, I’d love to hear your thoughts.

    http://insidehpc.com/2015/04/taccs-wrangler-uses-dssd-technology-for-data-intensive-computing/

    Sam

    Like

Comments are closed.