Sybase iAnywhere SQL AAnywhere Mobile and Embedded Database

I'd rather play golf


Thoughts on data management, autonomic computing, and self-managing database systems.

header image

The state of TPC-E

October 3rd, 2008 · 5 Comments

Last year, Brian Moran of SQL Server Magazine wrote this post, and more recently followed it with another post from 25 September of this year, wondering why, after an entire year, only Microsoft SQL Server is the database platform for any TPC-E benchmark. In his more recent article, Mr. Moran postulates:

I haven’t been able to get a formal response from IBM or Oracle about why they haven’t posted any TPC-E scores to date. I’ve sent email messages to their public relations departments and will let you know if I can track down an official answer. However, I’d be grateful to any readers who can pass along a link to Oracle or IBM’s official position on TPC-E. Although I’m lacking a formal answer, the most rational answer is that Oracle and IBM have tried to top Microsoft’s numbers and simply can’t.

If you look at the list of TPC-E results you can see that it is indeed true that only Microsoft SQL Server results are published. However, while it may also be true that the other vendors cannot compete with Microsoft’s results, I frankly doubt it. I believe there are other reasons for why IBM DB2 and Oracle have yet to post a TPC-E result. Below I list a few of these, but first some introduction:

TPC-E is an OLTP-oriented performance benchmark published by the Transaction Processing Council, an industry consortium made up of a variety of companies including the major database management system vendors (IBM, Oracle, Microsoft, Sybase, Teradata, Netezza, Ingres, Kickfire, and so on). As well-explained by this IBM whitepaper, TPC-E was designed as a more realistic OLTP benchmark to supersede the older TPC-C benchmark. The design goals of TPC-E were to:

  • make the benchmark more realistic in comparison with OLTP workloads in “real” applications, through:
    • introducing some elements of data skew into the database instance;
    • providing a more realistic transaction/SQL query mix;
    • requiring RAID hardware for reliability; and
    • implementing realistic referential integrity and CHECK constraints into the schema.
  • reduce the overall system cost than that required by top-of-the-line TPC-C results.

Lack of any significant data skew in TPC-C is a problem, in that uniform distributions do not reflect typical customer data, and make query optimization considerably simpler because sophisticated techniques to estimate skew and utilize those estimates to choose better access plans aren’t required, something I have pointed out previously. While the introduction of skew is a significant benefit of TPC-E, query complexity is only marginally better than that with TPC-C, although TPC-C sets an unrealistically low bar: there are virtually no join queries in the entire TPC-C benchmark. In contrast, TPC-E does contain some joins, but as a whole not significantly more. Here is a graph illustrating the complexity of the suite of 156 DML statements in the TPC-E benchmark specification (which are advertised as “pseudocode”, but bear a striking resemblance to Microsoft Transact-SQL syntax):
TPC-E Statement characteristics

There are a grand total of 2 queries (of 156 in the entire benchmark) that use GROUP BY, 28 that contain ORDER BY, and only 5 statements that utilize a subquery: 4 of these are (trivial) existential IN subqueries that can be trivially rewritten as joins. The fifth is a NOT IN subquery, over a join of two tables, contained within an UPDATE statement. Only one statement contains an OUTER JOIN. In terms of join degree – the number of joins in a query (one less than the number of tables) – the mix within TPC-E is as follows:
TPC-E SELECT statement join degree

In TPC-E, there exist only one each of queries with a join degree between 3 and 6 (queries that reference between 4 and 7 tables). The only DML statements that contain subqueries reference at most one table in the outer block, and there are no queries with more than one level of nesting. In other words, TPC-E fails utterly to offer a decent query optimizer any significant challenge, and fails to reflect the complexity that I see repeatedly in typical customer workloads.

Consequently, I think it can still be strongly argued that TPC-E is not representative of most customer applications. Indeed, the TPC-E specification urges that:

Benchmark Results are highly dependent upon workload, specific application requirements, and systems design and implementation. Relative system performance will vary because of these and other factors. Therefore, TPC-E should not be used as a substitute for specific customer application benchmarking when critical capacity planning and/or product evaluation decisions are contemplated.

Now to answering the question at hand: why are DBMS vendors other than Microsoft seemingly ignoring TPC-E? Here are a couple of reasons, in no particular order:

  • TPC-E is a moving target. Since its publication as Version 1.1 in April 2007, TPC-E has gone through five significant revisions (“E” is now at 1.6.0) encompassing 114 changes – a rate of over 7 per month on average. True, many of these are editorial in nature, but approximately one-half of these are substantive changes, including changes to the actual makeup of individual transactions.
  • Both DBMS vendors and hardware suppliers have a substantial investment in TPC-C expertise. TPC-C has been around a long time (September 1992) and while it is acknowledged as a simplistic benchmark, there exists a considerable body of expertise across all of the companies in the TPC consortium. We at iAnywhere relied, to some degree, on Sybase’s TPC expertise and experience when we published our own TPC-C benchmark with SQL Anywhere earlier this year. In contrast, TPC-E is new: a regular chicken-and-egg problem.
  • TPC-E isn’t that cheap. Of all the published TPC-E results, the cheapest total system cost isn’t trivial: it is approximately US$200,000.
  • Customers continue to desire and reference TPC-C results. Because of its long history, many customers continue to treat TPC-C as the “gold standard” of benchmarks; TPC-E, at least in part due to its lack of history and lack of participation by the other vendors, rarely appears on the radar screen.

Clearly Microsoft is an early adopter of TPC-E, at least for SQL Server 2008. However, hardware vendors like IBM continue to publish TPC-C results on older SQL Server releases, as recently as September 15 of this year.

In conclusion: in my view, an equally rational explanation for the absence of TPC-E results from other vendors is a combination of two factors: one, the lack of its visibility to clients (over TPC-C), and two, a preference to let Microsoft work out any additional kinks in the benchmark before jumping on the TPC-E bandwagon.

Tags: Microsoft SQL Server · Performance measurement · Query optimization · SQL Anywhere · Sybase ASE

5 responses so far ↓

  • 1 Vasil // Oct 9, 2008 at 4:02 am

    While db-X being slower than MS SQL Server in TPC-E is a reason for corp-X not to publish db-X TPC-E results, this is not a reason for someone else to publish db-X results.

    Why did TPC publish only MS SQL results?

  • 2 Glenn Paulley // Oct 9, 2008 at 5:24 pm

    Keep in mind two things:

    a) In practice, all published TPC benchmarks are actually run by hardware companies with the cooperation of the DBMS vendor. The lone exception to this rule, at least that I am aware of, was iAnywhere’s recent TPCC result.

    b) No-one can publicly make available performance test results for virtually any commercial product without prior approval, in writing, from the DBMS vendor. That is true, as far as I am aware, of most DBMS vendors including Sybase.

    What this means is, in a nutshell, that if Oracle (or Sybase, or any other vendor) wants to refrain from publishing a TPC-E benchmark, they can do so, and resort to legal action to enforce their wishes if they desire.

  • 3 Charles Levine // Oct 23, 2008 at 10:45 pm

    I’ve posted a detailed response to your post here.

    Summarizing a few key points:

    1. Your analysis of TPC-E query complexity is an interesting technical exercise, but TPC-E is not and was never intended to be a query optimizer test. The pseudo-SQL code in TPC-E is an example, not a requirement. Unlike TPC-H which strictly limits changing the specified SQL, in TPC-E test sponsors are free to rewrite the SQL anyway they like as long as it is functionally equivalent. One vendor might rewrite it to remove all joins while another might rewrite it to include more joins or more complex joins.

    2. The vast majority of changes to the TPC-E spec have been editorial. The rest are minor changes which have not affected the comparability of results. A better gauge of the high quality of the TPC-E spec is that to-date 18 results have been published by six vendors spanning 15 months, but there have been no compliance challenges.

    3. TPC-C is outdated, over-optimized, and of questionable relevance. Customers hold onto TPC-C because it is familiar and available, not because it is better. Database vendors need to exercise leadership by embracing a superior benchmark that will drive customer-relevant engineering innovation.

  • 4 Saw // Jun 25, 2009 at 9:42 pm

    How about now? As at 26th June 2009, there is still no TPC-E result for Oracle. I wonder would it possible that it is too difficult for Oracle DBMS to adapt to new environment from TPC-C to TPC-E? Or is it really Oracle performs too badly?

    Now, as each day passes I am more and more confident that SQL Server 2008 is much better than Oracle.

    I do hope that DB2 and Oracle can publish their TPC-E results.

  • 5 Glenn Paulley // Jun 30, 2009 at 10:13 am

    Saw –

    I don’t think anything has really changed with respect to TPC-E in the last year, nor do I expect other DBMS vendors to jump on the TPC-E bandwagon, certainly in the short term. With the exception of Microsoft, all of the other relational database vendors have ignored TPC-E. I believe this is due to the reasons that I’ve outlined above, not because their systems perform poorly. Colin White has recently posted the following comment on ParAccel’s TPC-H benchmark, which I think also has relevance to TPC-E:

    TPC benchmarks have always been controversial. People often argue that that do not represent real life workloads. What this really means is that you mileage may vary. These benchmarks are expensive to run and vendors throw every piece of technology at the benchmark in order to get good results. Some vendors are rumored to have even added special features to their products to improve the results. The upside of the benchmarks is that they are audited and reasonably well documented.

    The use of TPC benchmarks has slowed over recent years. This is not only because they are expensive to run, but also because they have less marketing impact than in the past. In general, they have been of more use to hardware vendors because they demonstrate hardware scalability and provide hardware price/performance numbers. Oracle was perhaps an exception here because they liked to run full-page advertisements saying they were the fastest database system in existence.