Let’s not talk about open source, but I have two questions: "OceanBase climbs to the top of TPC-H"

OceanBase climbed to the top of TPC-C last time, and this time to the top of TPC-H

[Global Financial Observation | News Express] Recently, an article entitled "15.26 million QphH! The article "OceanBase, an ant self-developed database, topped the TPC-H authoritative list" is widely spread on the Internet.

As mentioned in the article, on May 20th, the official website of the Transaction Processing Performance Council (TPC) released the latest data analysis benchmark (TPC-H) list. Among them, the distributed database OceanBase independently developed by Ant Group With a total performance score of 15.26 million QphH, it ranked first with 30,000 GB.

Previously, Ant OceanBase participated in the transaction processing benchmark (TPC-C) in 2019 and 2020, and reached the top twice.

OLTP+OLAP, can fish and bear's paw really have both?

I still have something to say about "OceanBase's summit on TPC-H".

Those who are familiar with databases know that online transaction processing (On-Line Transactional Processing, hereinafter referred to as OLTP) and On-Line Analytical Processing (hereinafter referred to as OLAP) are the two most widely used database applications. OLTP is used in real-time transaction scenarios. It is characterized by high concurrency, simple SQL, and has strict requirements on the delay and jitter of data query and update processing. OLAP is used in real-time analysis scenarios. It is characterized by many complex SQL, mostly multi-table joint queries, and the execution time is relatively long.

Therefore, the TPC organization has different test standards TPC-C and TPC-H to judge the OLTP and OLAP capabilities of the database respectively. According to the characteristics of the application, TPC-C uses the number of transactions processed per minute as an indicator, while TPC-H uses the number of queries executed per hour as an indicator. In recent years, many manufacturers have introduced HTAP (Hybrid Transaction and Analytical Processing) to realize the simultaneous processing of OLTP and OLAP services in the same database. OceanBase is one of them.

OceanBase ranks first in both TPC-C and TPC-H tests. Does it mean that OceanBase can cover the world and deal with all database application scenarios? We analyze the actual capabilities of OceanBase in the OLAP field based on the detailed test data in the test report on the TPC official website.

As can be seen from the test report on the TPC official website, this time OceanBase used 64 cloud servers for testing. The CPU of each server is 80 cores, the memory is 768GB, and the storage is 40GB. The total hardware resources of 64 servers are 5120 CPU cores, 49152 GB of memory, and 2560 GB of server storage. At the same time, 38000GB OFS (OceanBase File System) storage is also used, with a total storage resource of 40560GB, as shown in Figure 1 for the OceanBase TPC-H test hardware scale.

At the same time, the test report on the TPC official website also shows the cost of the software and hardware for this test. The total cost of ownership for three years is RMB 69,336,912. As shown in Figure 2, the cost of the software and hardware for the OceanBase TPC-H test is shown.

In terms of these hardware and software resources, OceanBase achieved a QphH of 15,265, and 305.7 under the test conditions of 30,000GB of data, and the hardware and software cost per kQphH was RMB 4,542.13.

Based on the above data, we can do some analysis on this OceanBase test.

First of all, the data volume of OceanBase this TPC-H test is 30000GB, and its memory used reaches 49152GB. The architecture of the OceanBase quasi-memory library determines that the data of this test can be completely loaded in the memory, and the disk I/O is avoided by querying and calculating in the memory, thereby greatly improving the score of the test. This method can be used in small and medium-sized data warehouses, but the data volume of large data warehouses often reaches the scale of petabytes or even hundreds of petabytes. The use of full-memory computing will greatly increase the total cost of ownership of the system, which users cannot afford. For large and medium-sized OLAP scenarios, more economical solutions need to be sought.

Secondly, the TPC-H test that OceanBase participated in this time contains 8 tables, 22 queries, and complies with the SQL92 standard, which is a relatively difficult test in the OLAP field by the TPC organization. The TPC-DS test, also organized by TPC, contains 7 fact tables and 17 latitude tables. Each table contains 18 columns on average. Its workload contains 99 SQL queries, covering the core parts of SQL99 and 2003 and OLAP. The TPC-H test is much higher. OceanBase chose to participate in the TPC-H test instead of the TPC-DS test this time. Does this indicate that its support for 99 complex SQL queries included in the TPC-DS test may not be complete, and can only support relatively simple OLAP services, but not Applied in all OLAP scenarios?

Finally, because the TPC organization also examines the cost per kQphH, OceanBase uses a 3-year prepaid cloud service rent to calculate the cost per kQphH. According to the 3-year total cost of ownership provided by OceanBase of 69,336,912 yuan, the cost per kQphH is 4,542.13 yuan. . Compared with the second-ranked cost per kQphH of US$744.13 (discount RMB 4,769.65), it seems that there is a certain price advantage, but in fact, the depreciation period of hardware equipment procurement is far more than 3 years. So, does OceanBase's solution really have a price advantage? Is there a question mark here?

To sum up, one is: OceanBase's ranking this time shows that it has certain application capabilities in simple OLAP scenarios, but its ability to support complex and large-scale data warehouse OLAP applications is just ranking TPC- Does H not fully explain the problem?

The second is: At the same time, in terms of the total cost of ownership of the solution, whether further consideration and optimization are needed.

These are my two questions.

Therefore, large and medium-scale data warehouse applications require more professional OLAP databases to provide solutions.

(Viewpoint analysis: insiders, edited by Amin)


TPC-H data indicators: QphH (Query-per-Hour H) This data indicator describes the complex query processing capabilities of the system. H indicates that this result is measured according to the TPC-H standard, and $/QphH is Price/Q.

Query: the meaning of query, in order to find a particular file, website, record or series of records in the database, a message sent by a search engine or database.







Welcome to add comments at the end of the article!

[Global Financial Observer] This article and the author's reply only represent personal opinions and do not constitute any investment advice.