I would like to recommend to you John Appleby’s post here on the HANA blog site. While the title suggests the article is about HANA, in fact it is about trends in computing and processors… and very relevant to posts here past, present, and upcoming…
I would also recommend Curt Monash’s site. His notes on Teradata here mirror my observation that a 30%-50% performance boost per release cycle is the target for most commercial databases… and what wins in the general market. This is why the in-memory capabilities offered by HANA and maybe DB2 BLU are so disruptive. These products should offer way more than that… not 1.5X but 100X in some instances.
Finally I recommend “What Every Programmer Should Know About Memory” by Ulrich Drepper here. This paper provides a great foundation for the deep hardware topics to come.
Database computing is becoming a special case, a commercial case, of supercomputing… high-performance computing (HPC) to those less inclined to superlatives. Over the next few years the differentiation between products will increasingly be due to the use of high-performance computing techniques: in-memory techniques, vector processing, massive parallelism, and use of HPC instruction sets.
This may help you to get ready…
6 thoughts on “Database Computing is Supercomputing… Some external reading: May 2013”
May be I am missing something here. Let’s say HANA offers 100X performance by storing everything in memory. Do you think 100% data needs to be stored in memory ? What is your comment on HOT/COLD data?
How cost effective is HANA compared with some solution based on hybrid storage?
I would use the guidelines Teradata published in their Intelligent Memory announcements: 90% of the queries touch only 20% of the data… and I would put 20% of the data in-memory. HANA has always supported what you call a hybrid approach… data can flush to disk and be swapped in on-demand.
But the point of my post is that to be a database supercomputer you need to support massive parallelism within each node… and run on the most powerful nodes available… you need to support vector processing of columnar data and use HPC features like the SIMD instruction set. You need to get data in-memory or at least support a very high performance I/O subsystem. SAN is a commercial I/O subsystem and inadequate to compete (Teradata optionally supports SAN… but offers a very high performance I/O subsystem in the usual case).
Cost… that is a question for a sales person. But price/performance… I like in-memory databases here…
Does Teradata really support SAN?
Yep: See here…
I believe that there are several customers… but I do not believe that it was ever a popular option…
Why would you want to run a SAN with Teradata, why not take advantage of the bynet and parrellel processing and expand the disk (SSD and HDD) with in their systems?
Rob, have one more question/comment, with a lot of disk moving to SSD, is that what you mean by very high performance I/O subsytem?
I’m with you, Tim… I do not know why anyone would use a SAN device with any shared-nothing DBMS.
I’m not sure which post you are referring to with regards to a high performance I/O subsystem?
SSD’s are good… but if you put them behind a dual disk controller subsystem then you end up with around 2.4GB/sec of I/O bandwidth and no improvement over disk. If you put the SSD’s on the PCI/E bus then things get interesting.
Comments are closed.