Last year, Brian Moran of SQL Server Magazine wrote this post, and more recently followed it with another post from 25 September of this year, wondering why, after an entire year, only Microsoft SQL Server is the database platform for any TPC-E benchmark. In his more recent article, Mr. Moran postulates:
I haven’t been able to get a formal response from IBM or Oracle about why they haven’t posted any TPC-E scores to date. I’ve sent email messages to their public relations departments and will let you know if I can track down an official answer. However, I’d be grateful to any readers who can pass along a link to Oracle or IBM’s official position on TPC-E. Although I’m lacking a formal answer, the most rational answer is that Oracle and IBM have tried to top Microsoft’s numbers and simply can’t.
If you look at the list of TPC-E results you can see that it is indeed true that only Microsoft SQL Server results are published. However, while it may also be true that the other vendors cannot compete with Microsoft’s results, I frankly doubt it. I believe there are other reasons for why IBM DB2 and Oracle have yet to post a TPC-E result. Below I list a few of these, but first some introduction:
TPC-E is an OLTP-oriented performance benchmark published by the Transaction Processing Council, an industry consortium made up of a variety of companies including the major database management system vendors (IBM, Oracle, Microsoft, Sybase, Teradata, Netezza, Ingres, Kickfire, and so on). As well-explained by this IBM whitepaper, TPC-E was designed as a more realistic OLTP benchmark to supersede the older TPC-C benchmark. The design goals of TPC-E were to:
- make the benchmark more realistic in comparison with OLTP workloads in “real” applications, through:
- introducing some elements of data skew into the database instance;
- providing a more realistic transaction/SQL query mix;
- requiring RAID hardware for reliability; and
- implementing realistic referential integrity and CHECK constraints into the schema.
- reduce the overall system cost than that required by top-of-the-line TPC-C results.
Lack of any significant data skew in TPC-C is a problem, in that uniform distributions do not reflect typical customer data, and make query optimization considerably simpler because sophisticated techniques to estimate skew and utilize those estimates to choose better access plans aren’t required, something I have pointed out previously. While the introduction of skew is a significant benefit of TPC-E, query complexity is only marginally better than that with TPC-C, although TPC-C sets an unrealistically low bar: there are virtually no join queries in the entire TPC-C benchmark. In contrast, TPC-E does contain some joins, but as a whole not significantly more. Here is a graph illustrating the complexity of the suite of 156 DML statements in the TPC-E benchmark specification (which are advertised as “pseudocode”, but bear a striking resemblance to Microsoft Transact-SQL syntax):
There are a grand total of 2 queries (of 156 in the entire benchmark) that use
GROUP BY, 28 that contain
ORDER BY, and only 5 statements that utilize a subquery: 4 of these are (trivial) existential IN subqueries that can be trivially rewritten as joins. The fifth is a
NOT IN subquery, over a join of two tables, contained within an
UPDATE statement. Only one statement contains an
OUTER JOIN. In terms of join degree – the number of joins in a query (one less than the number of tables) – the mix within TPC-E is as follows:
In TPC-E, there exist only one each of queries with a join degree between 3 and 6 (queries that reference between 4 and 7 tables). The only DML statements that contain subqueries reference at most one table in the outer block, and there are no queries with more than one level of nesting. In other words, TPC-E fails utterly to offer a decent query optimizer any significant challenge, and fails to reflect the complexity that I see repeatedly in typical customer workloads.
Consequently, I think it can still be strongly argued that TPC-E is not representative of most customer applications. Indeed, the TPC-E specification urges that:
Benchmark Results are highly dependent upon workload, specific application requirements, and systems design and implementation. Relative system performance will vary because of these and other factors. Therefore, TPC-E should not be used as a substitute for specific customer application benchmarking when critical capacity planning and/or product evaluation decisions are contemplated.
Now to answering the question at hand: why are DBMS vendors other than Microsoft seemingly ignoring TPC-E? Here are a couple of reasons, in no particular order:
- TPC-E is a moving target. Since its publication as Version 1.1 in April 2007, TPC-E has gone through five significant revisions (“E” is now at 1.6.0) encompassing 114 changes – a rate of over 7 per month on average. True, many of these are editorial in nature, but approximately one-half of these are substantive changes, including changes to the actual makeup of individual transactions.
- Both DBMS vendors and hardware suppliers have a substantial investment in TPC-C expertise. TPC-C has been around a long time (September 1992) and while it is acknowledged as a simplistic benchmark, there exists a considerable body of expertise across all of the companies in the TPC consortium. We at iAnywhere relied, to some degree, on Sybase’s TPC expertise and experience when we published our own TPC-C benchmark with SQL Anywhere earlier this year. In contrast, TPC-E is new: a regular chicken-and-egg problem.
- TPC-E isn’t that cheap. Of all the published TPC-E results, the cheapest total system cost isn’t trivial: it is approximately US$200,000.
- Customers continue to desire and reference TPC-C results. Because of its long history, many customers continue to treat TPC-C as the “gold standard” of benchmarks; TPC-E, at least in part due to its lack of history and lack of participation by the other vendors, rarely appears on the radar screen.
Clearly Microsoft is an early adopter of TPC-E, at least for SQL Server 2008. However, hardware vendors like IBM continue to publish TPC-C results on older SQL Server releases, as recently as September 15 of this year.
In conclusion: in my view, an equally rational explanation for the absence of TPC-E results from other vendors is a combination of two factors: one, the lack of its visibility to clients (over TPC-C), and two, a preference to let Microsoft work out any additional kinks in the benchmark before jumping on the TPC-E bandwagon.