What is MySQL partition table

Introduction

Partitioning means that the database divides a table into multiple smaller, easier-to-manage parts according to certain rules.

As far as the application of accessing the database is concerned, there is only one table or one index logically, but in fact this table may consist of 10 physical partition objects. Each partition is an independent object. From a partial perspective, each partition The difference can be handled independently, and as a whole, each partition table can be handled as a part of the total table. Partitioning is completely transparent to the application and does not affect the business logic of the application.

The underlying storage structure of the partition table

From the above description, we can see that the partition table is an independent logical table, but the bottom layer is composed of multiple physical sub-tables. The code for partitioning is actually an encapsulation of a set of underlying tables. Requests for the partition table will be transformed into interface calls to the storage engine. So in order to support the concept of partitioning, the SQL server has done a lot of processing, encapsulating the partitioning operation internally, which is transparent to the user (that is, the user can write SQL as usual), but if we start from the underlying file system Looking at it, you will find that every partition table has a table file named with # separated.

The underlying operation logic and principle of the partition table

logic

select

When querying a partitioned table, the partitioning layer first opens and locks all the underlying tables. The optimizer first judges whether it can filter some partitions, and then calls the corresponding storage engine interface to access the data of each partition.

insert

When writing a record, the partition layer first opens and locks all underlying tables, then determines the record of the partition receiver, and then writes the record to the corresponding underlying table.

delete

When deleting a record, the partition layer first opens and locks all the underlying tables, then determines the partition corresponding to the data, and finally deletes the response to the underlying table.

update (first add and then delete)

When updating a record, the partition layer first opens and locks all the underlying tables. MySQL first determines which partition the record needs to be updated is in, then retrieves the data and updates it, then determines which partition the updated data should be placed in, and finally the bottom layer The table is written and the underlying table where the original data is located is deleted.

principle

As mentioned earlier, the partition table is implemented by multiple related low-level tables. These low-level tables are also represented by Handler objects, so we can also directly access each partition. The storage engine manages the various underlying tables of the partitions the same as managing ordinary tables (all the underlying tables must use the same storage engine), and the index of the partition table just adds an identical index to each underlying table.

From the perspective of the storage engine, the underlying table is no different from a normal table (a partitioned table is nothing more than a secondary split of a table by the partitioner), and the storage engine does not need to know whether this is a normal table or a partitioned table. a part of. The operations on the partition table are performed in accordance with the following operational logic: Although each operation "opens and locks all underlying tables first", this does not mean that the partition table locks the entire table during processing.

If the storage engine can implement row-level locks by itself, such as InnoDB, the corresponding table locks will be released at the partition layer. This lock and unlock process is similar to the query on ordinary InnoDB.

Benefits of partition table

MySQL uses the PARTITION BY clause to define the data stored in each partition when creating a table. When executing a query, the optimizer will filter the partitions that do not have the data we need according to the partition definition, so that the query does not need to scan all the partitions, just find the partition that contains the required data.

Quick delete

One of the main purposes of partitioning is to divide data into different tables at a coarser granularity. In this way, related data can be stored together. In addition, it will be very convenient if you want to delete the data of the entire partition at once. In the following scenario, the partition can play a very big role.

Comparison with ConcurrentHashMap

It is not difficult to find that, similar to the design idea of ​​ConcurrentHashMap, it is implemented to reduce the lock granularity.

Usage scenarios of partition table

  1. The table is so large that it cannot be all stored in memory, or there is hot data only in the last part of the table, and the rest are historical data.
  2. The data of the partition table is easier to maintain. For example, if you want to delete large amounts of data in batches, you can use the method of clearing the entire partition. In addition, you can optimize, check, and repair an independent partition.
  3. The data of the partition table can be distributed on different physical devices, thereby efficiently using multiple hardware devices. Partition tables can be used to avoid some special bottlenecks, such as the mutually exclusive access of a single index of InnoDB, the inode lock competition of the ext3 file system, and so on.

Note: The hardware devices referred to here are not different servers, but different hard disks of the same server.

image.png

Here you can specify the storage disk path. So it can be put on different disks.

  1. If necessary, you can also back up and restore independent partitions, which works very well in scenarios with very large data sets.

Partial limitations of partition tables

  1. A table can only have 1024 partitions at most.
  2. If the primary key and unique index are a joint index composed of multiple columns, then the partition field must contain all the columns of the joint index. such as:

create table rcx (a int,b int,c char(3))engine=innodbpartition by range columns(a,b,c) (partition p0 values );

  1. Foreign key constraints cannot be used in partitioned tables.

Type of partition table

RANGE (Range Discrete Value) partition

Create partition

Row data is put into partitions based on column values ​​belonging to a given continuous interval.

create table t(id int)engine=innodbpartition by range (id) (partition p0 values ​​less than (10),partition p1 values ​​less than (20));

Check the physical files of the table on the disk. After partitioning is enabled, the table is no longer composed of an ibd file, but is composed of the ibd files of each partition when the partition is created, such as t#p#p0.ibd, t#p#p1. ibd.

Insert data

insert into t values(1);insert into t values(9);insert into t values(10);insert into t values(15);

You can view the specific information of each partition by querying the partitions table in the information_schema database:

select * from information_schema.partitionswhere table_schema='test_partition'and table_name='t'\G;

image.png

For table t, because the partition is defined, the inserted value should strictly comply with the definition of the partition. When a value not defined in the partition is inserted, the MySQL database will throw an exception.

To solve the above problems, you can add a partition with a MAXVALUE value to the partition. MAXVALUE can be understood as positive infinity, making our partition table more robust:

alter table t add partition(partition p2 values ​​less than maxvalue);

Query data

Use explain to view the execution plan of the SQL statement:

explainselect * from twhere id<10 and id>0\G;

image

It can be seen that after filtering by the partitioner, only the p0 partition will be queried. Narrow the scope of the query.

List (specify discrete value) partition

The LIST partition is very similar to the RANGE partition, except that the values ​​of the partition columns are discrete.

Create partition

create table r (a int,b int)engine=innodbpartition by list (b) (partition p0 values ​​in (1,3,5,7,9),partition p1 values ​​in (0,2,4,6,8));

Insert data

insert into r select 1,1;insert into r select 1,2;insert into r select 1,3;insert into r select 1,4;

Query data

select table_name,partition_name,table_rows from information_schema.partitions where table_name='r' and table_schema='test_partition'\G;

image

HASH (hash) partition

The purpose of HASH partition is to evenly distribute data in each pre-defined partition to ensure that the amount of data in each partition is roughly the same.

In RANGE partition and LIST partition, you must clearly specify in which partition a given column value or set of column values ​​should be stored; and in HASH partition, MySQL automatically completes these tasks, and all users have to do is based on what will be done. It is hoped that the column value of partition specifies a column value or expression, and specifies the number of partitions that the partitioned table will be divided into.

Create HASH partition

create table t_hash(a int,b datetime)engine=innodbpartition by hash (year(b))partitions 4;//Four partitions

Insert data

insert into t_hash values(1,'2010-04-01');

The partitioning algorithm is as follows

MOD(YEAR('2010-04-01'),4)=MOD(2010,4)=2

Check the partition situation

select table_name,partition_name,table_rows from information_schema.partitions where table_name='t_hash' and table_schema='test_partition'\G;

image

Create LINEAR HASH partition

MySQL also supports a partition called LINEAR HASH, which uses a more complex algorithm to determine where new rows are inserted into a partitioned table. Its syntax is similar to that of the HASH partition, except that HASH is changed to LINEAR HASH.

create table t_l_hash(a int,b datetime)engine=innodbpartition by linear hash (year(b))partitions 4;

Insert a record

insert into t_l_hash values(1,'2010-04-01');

Partition judgment

  • Take the next power of 2 greater than the number of partitions V, V=POWER(2,CEILING(LOG(2,num)))=4
  • Partition N=YEAR('2010-04-01')&(V-1)=2

Analysis of the advantages and disadvantages of LINEAR HASH

The advantage of LINEAR HASH partition is that adding, deleting, merging and splitting partitions will become faster, which is conducive to processing tables with large amounts of data. Its disadvantage is that compared with the data distribution obtained by using HASH partitions, the data among the partitions may not be balanced.

KEY (default algorithm) partition

KEY partitioning is similar to HASH partitioning. The difference is that HASH partitioning uses user-defined functions for partitioning, and KEY partitioning uses functions provided by MySQL for partitioning.

create table t_key (a int,b datetime)engine=innodbpartition by key(b)partitions 4;

COLUMNS partition

Among the four partitions of RANGE, LIST, HASH and KEY introduced earlier, the partition conditions are: the data must be an integer, if it is not an integer, it should be converted to an integer by a function, such as year() and other functions . COLUMNS partitions can be partitioned directly using non-integer data. The partitions are directly compared based on the type and do not need to be converted to integer. In addition, the RANGE COLUMNS partition can partition the values ​​of multiple columns.

The COLUMNS partition supports the following data types:

range columns time partition

create table t_columns(a int,b datetime)engine=innodbpartition by range columns (b) (partition p0 values ​​less than ('2009-01-01'),partition p1 values ​​less than ('2010-01-01'));

list columns specified field partition

create table t_columns2(a int,b datetime,city ​​varchar(15))engine=innodbpartition by list columns(city) (partition p1 values ​​in ('a','b','c'),partition p2 values ​​in ('d','e','f'),partition p3 values ​​in ('g','h','k'),);

range columns multi-column partition

create table rcx (a int,b int,c char(3))engine=innodbpartition by range columns(a,b,c) (partition p0 values );

Subpartition

Sub-partition is to partition on the basis of partition, also called compound partition. MySQL allows HASH or KEY sub-partitions on the RANGE and LIST partitions.

In the above example, we see that both range and List are integer partitions. In fact, range and List also support non-integer partitions, but they must be combined with COLUMN partitions to support integers, dates, and strings.

Specify the number of sub-partitions

create table ts (a int,b date)engine=innodbpartition by range(year(b))subpartition by hash(to_days(b))subpartitions 2 (partition p0 values ​​less than (1990),partition p1 values ​​less than (2000),partition p2 values ​​less than MAXVALUE)

As you can see, the above partitioning rule is equivalent to dividing the range into two regions first, and then dividing each range into three regions by hash. Therefore, there should be 2*3=6 partition areas.

image

Specify the sub-partition name partition

We can also use subpartition syntax to explicitly indicate the name of each subpartition

create table ts(a int,b date)engine=innodbpartition by range(year(b))subpartition by hash(to_days(b)) (partition p0 values ​​less than (1990) (subpartition s0,subpartition s1),partition p1 values ​​less than (2000) (subpartition s2,subpartition s3),partition p0 values ​​less than MAXVALUE (subpartition s4,subpartition s5));

Subpartition considerations

  • The number of each sub-partition must be the same
  • To use subpartition on any partition of a partition table to clearly define any subpartition, you must define all subpartitions.
  • Each subpartition clause must include a name for the subpartition.
  • The name of the sub-partition must be unique.

It is not recommended to use mysql partition table

After talking about so many partition tables, in the end you told me that it is not recommended to use partition tables? ? ?

Because in actual development, we often use sub-databases and sub-tables. Sub-database sub-tables not only support horizontal split (split rows) of MySQL partitioned tables, but also support vertical split (split columns). Divide the data of a large database (table) into several databases (tables). The structure of each library (table) is the same. But they may be distributed on different mysql instances or even different physical machines to achieve the purpose of reducing the amount of data in a single database (table) and improving access performance.

Compare the two

  1. Partition table, partition key design is not very flexible, if you do not follow the partition key, it is easy to appear full table lock
  2. Once the amount of data concurrency comes up, if the association is implemented in the partition table, it will be a disaster
  3. You can sub-databases and tables yourself, and you can control business scenarios and access modes yourself. Partition table, research and development wrote a SQL, I am not sure what the underlying partition logic of mysql is, it is not very controllable.
  4. No matter how the partition table is divided, it is on the same machine, which naturally has an upper limit of performance