My 2 Cents: Teradata 1Q2013

Since my blogs tend to be in response to some stimulus they may not reflect a holistic view on any particular product. The “My 2 Cents” series will try to provide a broader view…

Teradata Storage Rack
Teradata Storage Rack (Photo credit: pchow98)

Summary

Despite my criticisms of some of their market positions (here, here, here, and here) Teradata provides the single best data warehouse platform in the market, hands-down. As an EDW, or data mart it is the best. It will be very competitive as an analytics mart and/or as an operational data store. It has a very complete eco-system of utilities and offers a robust set of Reliability, Availability, Serviceability, and Recoverability (RASR) features to make the eco-system solid. Performance is very good… Teradata should win more POCs than they lose… and they have become more competitive on price… so their price/performance is good if not great.

I recommend a POC for most customers in most cases… you can often save 20%-30% in a competitive situation.. but if you don’t have any special requirements… if you are building a standard BI/DW eco-system then Teradata would be the only vendor I would trust without a POC.

Where They Win

Now that they support columnar tables and columnar projection Teradata should win way more POCs than they lose (before columnar support they could lose to the column stores or to hybrids like Greenplum). The Teradata optimizer is very robust. It efficiently solves for a broad array of queries, and for a mixed workload that cuts across the data is many ways. This makes Teradata well-suited as the platform for an EDW.

Every RDBMS has a sweet spot where they win… so Teradata will not win every POC. But if you POC for an EDW and you prove with a full contingent of data, with queries that cut across the data in several ways, with a fair emulation of data loading, querying, loading , and querying… with a full workload… Teradata is tough to beat.

Where They Lose

The shared-nothing architecture is an imperfect fit on a single node… so other players can win smaller data warehouses that can fit on 1-2 nodes. In addition, they can be beat for very large configurations (1PB and above…) by Hadoop.

Teradata can be beat when the workload consists of very complex queries and/or where the problem to be solved requires fantastic response on a small number of CPU-intensive queries… this is a side-effect of spooling the intermediate results to a block device.

Teradata can be beat when data is trickled in at a high, continuous, rate.

Teradata can be beat when a query set goes through the data in a narrow way, using a single index or the equivalent, as might be the case for a data mart.

Teradata can be beat on price.

In the Market

For the reasons above, Teradata is the leader in the DW platform market. Recent competition from Exadata, Netezza, Greenplum, Vertica… and now HANA… has cut margins but not impacted business growth too much. Competitors have projected Teradata’s demise for 20 years now… but the product continues to set the standard.

As noted here, I believe that Hadoop will squeeze Teradata at the 1PB level and above…

My Guess at the Future

Teradata has three architectural challenges to address… and I suspect they will manage all three more-or-less.

First, the old architecture which was designed for very small DRAM configurations forces unnecessary I/O in violation of Gray and Putzolu’s Five Minute Rule (see here). This will be mitigated in the short-term by writing spool to SSD devices… and in the medium term by writing spool to NVRAM. If these mitigations are not sufficient then Teradata may have to consider re-engineering in a data flow scheme… but this will be tough.

Next, there are several advances in network technology coming in the next 2-3 years… and software defined networks will impact the space as well. ByNet may have served its purpose… providing Teradata with a significant edge for 20+ years… but Teradata may consider moving to an off-the-shelf network (see here).

Finally, a truly active data warehouse requires support for simultaneous OLTP and BI workloads… I would expect Teradata to build in the sort of hybrid OLTP/BI table capability now supported by both Vertica and HANA… and quasi-supported by Gemfire/Greenplum.

Teradata has some interesting business challenges as their margins shrink… and one of those challenges is that their expensive 3-person relationship/technical/industry sales team approach will face some pressure. But it is these sales teams that also provide Teradata an edge. They are the only databases vendor who can field team after team of veterans who understand both the technology and the vertical space.

If I were King of Teradata I might try to push downstream and build a configuration optimized for the low end. This would not be a high-margin hardware business but it would sell services and increase market share.

14 thoughts on “My 2 Cents: Teradata 1Q2013”

  1. Rob,
    Good article. Smart thinking as I always expect from you.

    You are a little out of date on some Teradata topic areas. Memory size and SSD are humming right along — we aren’t falling behind there. Look at our website and stay tuned for more announcements 1H13.

    As for active data warehouses, we have over 200 systems that support both OLTP and DSS simultaneously. We’ve been at it for almost 10 years and really got it right in the last 5. I field 1-2 RFP inquiries a week and numerous technical questions to sales teams on this. Take a look at Travelocity, Wells Fargo My Spending Report, Discover, AT&T, and others who are doing it on a Teradata system. We only back pedal from OLTP if there is no analytic value. We don’t want to compete for OLTP applications.

    Regarding Hadoop, I saw that coming back in 2009. We adopted the snuggle don’t struggle attitude so now we have partnerships with Cloudera and Hortonworks. I posted a white paper on “Hadoop and the Data Warehouse — When to Use Which” and present on that topic frequently. No data warehouses have been displaced. RIght now, Hadoop does not pose a threat — they are growing to become part of an analytic ecosystem.

    As for small single node instances, you are largely correct. We have them, they work really well given all the parallelism. But Sales people tend not to sell small if they can avoid it. We do have a thriving midmarket business.

    If you are ever in Silicon Valley, lets meet up for a beer and old times.
    Cheers

    1. Hi Dan… I’ll respond to each of your points in a separate reply so that threads can continue independently…

      The issue I raised is not memory size… it is the use of spool to store intermediate results for each step of every query. The Five Minute Rule suggests that these results should be stored in-memory always… based on pure hardware economics.

      SSD’s mitigate the problem/cost some… but the spool-based design is just not sound architecturally.

      NVRAM will mitigate the problem further… but if you believe that Gray/Putzolu got it right then you have it not quite right.

    2. I believe that in the near future there will be an important class of applications where first class OLTP performance and first class BI performance will co-exist against the same table instance.

      The definition of first class OLTP performance is that the performance on OLTP is good enough that you would host an OLTP-only application on the platform.

      Teradata provides, at best, adequate OLTP performance when analytics are a critical component of the application. This is not the same.

    3. I would not disagree that Hadoop along side an EDW is a likely outcome. My point is that the workload that will run on Hadoop is workload that otherwise would have run on Teradata… and therefore Hadoop will significantly impact Teradata’s business.

      Further, where in 2009 customers turned to Teradata for multi-petabyte platforms to support specific application requirements… not for data warehouses… that business now and in the future will go to Hadoop.

      You have the right snuggle strategy… but it will have an impact.

    4. For small nodes… the market is ripe for a cost-effective single node data warehouse box that supports up to 5TB… maybe up to 10TB.

      Taking software architected for a shared-nothing cluster of single core servers and configuring it on an inexpensive server is not really what I meant.

      The pipeline starts small and Teradata, like almost everyone else, has become entranced by big sales. Thousands of small warehouses become thousands of big warehouses over time.

      (Fortunately for you I am not King, I guess?)

      1. ‘I would expect Teradata to build in the sort of hybrid OLTP/BI table capability now supported by both Vertica and HANA… and quasi-supported by Gemfire/Greenplum’…..no mention of Exadata…..? I enquire as we’re just about to implement it…:-)

  2. There are plenty of single node Teradata SMP platforms happily running production workloads here in the UK…they often wish they could afford a bigger MPP system but that’s another story…

    The fact that Teradata supports intra-node as well as inter-node parallelism through processor and disk virtualisation is something I have always considered an architectural strength (well, since V2 anyway). The number of PE and AMP vprocs that can be configured on a single SMP node delivers plenty of parallelism, before scaling out to MPP.

    We have built demo Teradata SMP systems running SLES on random (i.e. non-certified) node/storage hardware and it worked a treat. There is no obvious technical impediment to Teradata SMP being a ‘low-end’ market offering. Teradata will sell SMP ‘software only’ for production use if you are persistent enough. The head of EMEA confirmed this to me about 2 years ago at the Teradata Partners conference. The sales channel would rather go after much higher margin, higher ticket opportunities…naturally.

    As you point out, until relatively recently Teradata hummed along on 32bit MP-RAS with a measly 2-4GB RAM per node, and single core SMP nodes. The move to 64bit SUSE Linux, and the advent of multi-core nodes, has also heralded a move to far more RAM/node, as you might expect. The Teradata appliance offerings are especially stuffed full of RAM these days, so the move to the sunny uplands of lots of RAM is well underway.

    I’m not convinced Teradata will ever ditch the bynet completely and move to COTS network gear. There are too many patents and ‘smarts’ embedded in this part of the system for it to be ditched easily: http://lakshmikishore.blogspot.co.uk/2010/09/bynet.html

    However, this may have to happen to support a Teradata MPP cloud offering…so you never know.

    If I were ‘King of Teradata’ (great phrase) I’d introduce native incremental backup 🙂

  3. Well, and whilst Teradata is now quite a mature product its documentation still lacks basic information and accessibility, so that in case of an error one needs to refer to external source like http://teradataerror.com

    1. Hi Jon…

      I think that Hadoop will become a more capable DBMS over the next 18-24 months… You can already see Hive and Impala making headway. The amount of cumulative R&D applied will be huge. The ability to support production workloads will advance as well… And in 2015 Hadoop will be close to Teradata and way beyond Aster Data in capability. The price for Hadoop will be nearly nothing. The result will be that for Big Data… Over 1PB or maybe for over 500TB there will be no market for any commercial databases… Even at 5K/TB a 1PB system will cost $5M… So price will make up for any feature shortcomings.

      Bottom line… Economics will make application workloads go to Hadoop at the high end…

      -Rob

Comments are closed.

Discover more from Database Fog Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading