Talk about those things about TPC

In recent days, OceanBase TPC-H's PR promotion that won the world’s No. 1 position has been reported by Chinese self-media. Apart from excitement, the author also thinks that based on the contact with TPC institutions and their auditors over the years, the TPC benchmark and its audit I have some understanding of the release process and heard some anecdotes in the market, so when we congratulate OceanBase for winning the championship, today we will also talk about the benchmark TPC from a different angle.

1. Who is doing the TPC-H test?

Let’s start with TPC-H. We can see that mainstream traditional database vendors have stopped publishing TPC-H audit reports in recent years, and in recent years, TPC-H has become more and more of a hardware equipment manufacturer display. One of the publicity indicators of the processing power of your own server. So today we will see a very "interesting" scene on the official website of TPC[1]. It is obviously a benchmark test of database performance, but the result list (the so-called "list") is almost all hardware manufacturers at a glance: The System column is almost all server models, and the test sponsors (Company/Sponsor) are almost all equipment manufacturers, with the exception of OceanBase and Alibaba Cloud AnalyticDB. And different equipment vendors on the list often choose the same database product for benchmarking. For example, in the TPC-H in recent years, Microsoft's SQL Server Enterprise Edition is the most used for testing. Seeing this, we may have some different ideas about the interpretation of the role of TPC-H in the industry.

So why is it so? First, let's take a look at the benchmark TPC-H. The TPC-H benchmark contains a very strict ACID test and query performance test, but strictly speaking, it is not a scenario where HTAP's TP and AP coexist with high concurrency, but some relatively simple verification based on ACID (compared to TPC- DS) performance test of OLAP query. Among them, the concurrent query of AP is different according to the test data volume (scale factor), generally there are only 2-11 concurrency (each concurrency is called a stream in TPC-H), and 22 queries are executed sequentially in each stream. In the entire test, TP has only one concurrency (stream), and the stream of this TP only contains the writing and deleting of about one-thousandth of the data of the Order and Lineitem tables. Generally, it only accounts for the execution cost of each AP stream. For example, according to the audit results published by OceanBase this time, the AP stream1 of Performance RUN1 took 1382 seconds, while the TP took less than 7 seconds to complete. Part of the query has almost little impact. And because the data model of TPC-H is too simple and the data distribution is single, the challenge to the traditional database system is not great. In addition, various database vendors and academia have thoroughly studied the TPC-H benchmark, which query should be optimized for, and even a special paper was discussed many years ago [2], so overall, TPC-H H has no challenge to traditional database systems, and its publicity effect is limited. For the “new forces” represented by OceanBase, TPC-H is moderately difficult and comprehensively evaluates the capabilities of database systems including ACID. Publishing the test results can bring a certain degree of credibility to it (we Later, I will talk about why it is "definite" rather than "absolute"), which can be used as one of commercial promotion.

2. What is the reference value of TPC-H?

Let me talk about the credibility of the benchmark results. First of all, TPC has a strict third-party audit system. It is an audit organization that cooperates with TPC for many years. The auditors in this organization are all old experts with decades of experience in the database field, and some are directly involved in the formulation of the benchmark. And revising. The entire audit process is very strict. It is said that even the format of the log output that the audit needs to provide has strict requirements. It stands to reason that under such a strict audit process, the audit results must be very credible. Yes and No!

There is no problem with the rigor of the audit process, but here TPC requires that most of the TPC benchmark results, including TPC-H, TPC-DS, and TPC-C, be publicly measurable, and are publicly released product versions. In addition, TPC has requirements for which special optimizations are allowed or not allowed. Traditional database vendors have also made many targeted optimizations for these benchmarks. However, individual optimizations cannot be turned on by default in the system and require switch control. Special optimization cannot be accepted by TPC, so it is said that some traditional database vendors have to withdraw and republish a TPC result. Back to large-scale distributed database systems, auditors now generally do not log in directly to test and verify products in person. They often review the test process and scripts, and let each manufacturer provide its own certification. These certifications include audit logs of the test process and test results. , Proof of the official version, proof of the price of the hardware system used, proof of public sale or provision of services (cloud products), etc. Sometimes these certificates cannot be fully verified, so after each result is released, there will be a three-month publicity period for the public (including competitors) to verify. But for the test results of super-large clusters, generally few people have the resources to verify, and for cloud products, various vendors can also have ways to circumvent the restrictions. So in general, the credibility of this result is generally not a problem from a technical point of view, but from the perspective of TPC's requirements for publicly measurable products, sometimes it cannot be fully guaranteed.

Let's take a look at TPC-DS. The first "listed" manufacturer in this benchmark test is the domestic Transwarp. This is also the first official audit test done after the launch of the TPC-DS benchmark. So it was Transwarp who helped TPC solve a lot of audit process details. The question, this is why Transwarp’s audit took a long time. Later, Alibaba Cloud's Cloud AnalyticDB and Cloud E-MapReduce products benefited from the public test scripts and documents of Transwarp's audit results, so there were fewer detours in the test process. Here is also gossip. I heard that these two products of Alibaba Cloud have internal competition, so there is a PK on TPC-DS. Many audit tests within two years have made TPC-DS's third-party audit company very prosperous in the past two years. The enthusiasm of Chinese companies for hitting the list should really move TPC and its audit organization. It is said that the cost of each audit is high. I want to come here to make the audit company’s business busier and better in the past two years. Come and make an appointment for the auditor's schedule long in advance.

3. In the cloud era, do we still want to make the TPC list?

So far, we have seen that the TPC-DS test score list is basically occupied by Chinese companies, while foreign companies, especially the United States, are few. There are different reasons for this. First of all, like AWS Redshift, its TPC-DS default out-of-the-box performance is very good, and the market popularity and status of its product itself does not need to use TPC-DS to promote it, just like traditional database vendors have rarely done audit testing. TPC-H is the same thing. Emerging cloud data warehouse vendors like Snowflake have always been very sensitive to being tested publicly (especially by competitors), and their TPC-DS performance has always been "mysterious". I look forward to seeing the results of its standard audit data in the future. A domestic manufacturer is said to have contacted TPC for an audit in 2019, but because it was listed on the US entity restriction list, TPC had to refund its audit fees and did not continue. So now in addition to the two products of Alibaba Cloud, Transwarp and H3C have also conducted audit tests respectively. The results can be seen on the TPC official website. Among them, Transwarp's second submission is still very good, exceeding The results of the two products of Alibaba Cloud have been “smashed” for the first time, but the latest results of these two products of Alibaba Cloud have not been seen for a year. In a Chinese self-media interview with the person in charge of OceanBase products, the author of the article is also looking forward to the publication of OceanBase’s TPC-DS audit results. I don’t know if it will once again set off Alibaba Cloud, OceanBase and other domestic database "new forces" Let us wait and see for a round of enthusiasm for "brushing the list".

Finally, let's look at TPC-C. The difficulty and "gold content" of this benchmark test are much higher than TPC-H in my personal opinion. This is a true measure of the comprehensive capabilities of a TP system. But like TPC-H, most traditional database vendors have almost stopped auditing and publishing the results of this benchmark test for various reasons, and more of them are equipment vendors or integrators. Another reason why traditional manufacturers no longer "sweep the list" is that the scalability of their architecture has reached a relative bottleneck, and this is also an important reason OceanBase can use its horizontal scalability to "dominate" the top of the list.

Well, today I simply wrote something about TPC freely. I personally think that TPC-H is meaningful, but it has limited meaning from a technical point of view. The relative technical challenges of TPC-C and TPC-DS will be Higher. On the other hand, there are more and more emerging database technologies and products in China. As the basic system software developed by the country, it is a very good thing for the development of database technology in China, but we must also see that database technology needs long-term development. Accumulate precipitation and innovation, and be careful not to hit the list for the sake of becoming famous, causing domestic manufacturers to "involve the list", but to truly develop new technologies and new products that have practical significance and technological breakthroughs for the country and the people's livelihood.