Partitioning means that the database divides a table into multiple smaller, easier-to-manage parts according to certain rules.
As far as the application of accessing the database is concerned, there is only one table or one index logically, but in fact this table may consist of 10 physical partition objects. Each partition is an independent object. From a partial perspective, each partition The difference can be handled independently, and as a whole, each partition table can be handled as a part of the total table. Partitioning is completely transparent to the application and does not affect the business logic of the application.
The underlying storage structure of the partition table
From the above description, we can see that the partition table is an independent logical table, but the bottom layer is composed of multiple physical sub-tables. The code for partitioning is actually an encapsulation of a set of underlying tables. Requests for the partition table will be transformed into interface calls to the storage engine. So in order to support the concept of partitioning, the SQL server has done a lot of processing, encapsulating the partitioning operation internally, which is transparent to the user (that is, the user can write SQL as usual), but if we start from the underlying file system Looking at it, you will find that every partition table has a table file named with # separated.
The underlying operation logic and principle of the partition table
When querying a partitioned table, the partitioning layer first opens and locks all the underlying tables. The optimizer first judges whether it can filter some partitions, and then calls the corresponding storage engine interface to access the data of each partition.
When writing a record, the partition layer first opens and locks all underlying tables, then determines the record of the partition receiver, and then writes the record to the corresponding underlying table.
When deleting a record, the partition layer first opens and locks all the underlying tables, then determines the partition corresponding to the data, and finally deletes the response to the underlying table.
update (first add and then delete)
When updating a record, the partition layer first opens and locks all the underlying tables. MySQL first determines which partition the record needs to be updated is in, then retrieves the data and updates it, then determines which partition the updated data should be placed in, and finally the bottom layer The table is written and the underlying table where the original data is located is deleted.
As mentioned earlier, the partition table is implemented by multiple related low-level tables. These low-level tables are also represented by Handler objects, so we can also directly access each partition. The storage engine manages the various underlying tables of the partitions the same as managing ordinary tables (all the underlying tables must use the same storage engine), and the index of the partition table just adds an identical index to each underlying table.
From the perspective of the storage engine, the underlying table is no different from a normal table (a partitioned table is nothing more than a secondary split of a table by the partitioner), and the storage engine does not need to know whether this is a normal table or a partitioned table. a part of. The operations on the partition table are performed in accordance with the following operational logic: Although each operation "opens and locks all underlying tables first", this does not mean that the partition table locks the entire table during processing.
If the storage engine can implement row-level locks by itself, such as InnoDB, the corresponding table locks will be released at the partition layer. This lock and unlock process is similar to the query on ordinary InnoDB.
Benefits of partition table
MySQL uses the PARTITION BY clause to define the data stored in each partition when creating a table. When executing a query, the optimizer will filter the partitions that do not have the data we need according to the partition definition, so that the query does not need to scan all the partitions, just find the partition that contains the required data.
One of the main purposes of partitioning is to divide data into different tables at a coarser granularity. In this way, related data can be stored together. In addition, it will be very convenient if you want to delete the data of the entire partition at once. In the following scenario, the partition can play a very big role.
Comparison with ConcurrentHashMap
It is not difficult to find that, similar to the design idea of ConcurrentHashMap, it is implemented to reduce the lock granularity.
Usage scenarios of partition table
- The table is so large that it cannot be all stored in memory, or there is hot data only in the last part of the table, and the rest are historical data.
- The data of the partition table is easier to maintain. For example, if you want to delete large amounts of data in batches, you can use the method of clearing the entire partition. In addition, you can optimize, check, and repair an independent partition.
- The data of the partition table can be distributed on different physical devices, thereby efficiently using multiple hardware devices. Partition tables can be used to avoid some special bottlenecks, such as the mutually exclusive access of a single index of InnoDB, the inode lock competition of the ext3 file system, and so on.
Note: The hardware devices referred to here are not different servers, but different hard disks of the same server.
Here you can specify the storage disk path. So it can be put on different disks.
- If necessary, you can also back up and restore independent partitions, which works very well in scenarios with very large data sets.
Partial limitations of partition tables
- A table can only have 1024 partitions at most.
- If the primary key and unique index are a joint index composed of multiple columns, then the partition field must contain all the columns of the joint index. such as:
create table rcx (a int,b int,c char(3))engine=innodbpartition by range columns(a,b,c) (partition p0 values );
- Foreign key constraints cannot be used in partitioned tables.
Type of partition table
RANGE (Range Discrete Value) partition
Row data is put into partitions based on column values belonging to a given continuous interval.
create table t(id int)engine=innodbpartition by range (id) (partition p0 values less than (10),partition p1 values less than (20));
Check the physical files of the table on the disk. After partitioning is enabled, the table is no longer composed of an ibd file, but is composed of the ibd files of each partition when the partition is created, such as t#p#p0.ibd, t#p#p1. ibd.
insert into t values(1);insert into t values(9);insert into t values(10);insert into t values(15);
You can view the specific information of each partition by querying the partitions table in the information_schema database:
select * from information_schema.partitionswhere table_schema='test_partition'and table_name='t'\G;
For table t, because the partition is defined, the inserted value should strictly comply with the definition of the partition. When a value not defined in the partition is inserted, the MySQL database will throw an exception.
To solve the above problems, you can add a partition with a MAXVALUE value to the partition. MAXVALUE can be understood as positive infinity, making our partition table more robust:
alter table t add partition(partition p2 values less than maxvalue);
Use explain to view the execution plan of the SQL statement:
explainselect * from twhere id<10 and id>0\G;
It can be seen that after filtering by the partitioner, only the p0 partition will be queried. Narrow the scope of the query.
List (specify discrete value) partition
The LIST partition is very similar to the RANGE partition, except that the values of the partition columns are discrete.
create table r (a int,b int)engine=innodbpartition by list (b) (partition p0 values in (1,3,5,7,9),partition p1 values in (0,2,4,6,8));
insert into r select 1,1;insert into r select 1,2;insert into r select 1,3;insert into r select 1,4;
select table_name,partition_name,table_rows from information_schema.partitions where table_name='r' and table_schema='test_partition'\G;
HASH (hash) partition
The purpose of HASH partition is to evenly distribute data in each pre-defined partition to ensure that the amount of data in each partition is roughly the same.
In RANGE partition and LIST partition, you must clearly specify in which partition a given column value or set of column values should be stored; and in HASH partition, MySQL automatically completes these tasks, and all users have to do is based on what will be done. It is hoped that the column value of partition specifies a column value or expression, and specifies the number of partitions that the partitioned table will be divided into.
Create HASH partition
create table t_hash(a int,b datetime)engine=innodbpartition by hash (year(b))partitions 4;//Four partitions
insert into t_hash values(1,'2010-04-01');
The partitioning algorithm is as follows
Check the partition situation
select table_name,partition_name,table_rows from information_schema.partitions where table_name='t_hash' and table_schema='test_partition'\G;
Create LINEAR HASH partition
MySQL also supports a partition called LINEAR HASH, which uses a more complex algorithm to determine where new rows are inserted into a partitioned table. Its syntax is similar to that of the HASH partition, except that HASH is changed to LINEAR HASH.
create table t_l_hash(a int,b datetime)engine=innodbpartition by linear hash (year(b))partitions 4;
Insert a record
insert into t_l_hash values(1,'2010-04-01');
- Take the next power of 2 greater than the number of partitions V, V=POWER(2,CEILING(LOG(2,num)))=4
- Partition N=YEAR('2010-04-01')&(V-1)=2
Analysis of the advantages and disadvantages of LINEAR HASH
The advantage of LINEAR HASH partition is that adding, deleting, merging and splitting partitions will become faster, which is conducive to processing tables with large amounts of data. Its disadvantage is that compared with the data distribution obtained by using HASH partitions, the data among the partitions may not be balanced.
KEY (default algorithm) partition
KEY partitioning is similar to HASH partitioning. The difference is that HASH partitioning uses user-defined functions for partitioning, and KEY partitioning uses functions provided by MySQL for partitioning.
create table t_key (a int,b datetime)engine=innodbpartition by key(b)partitions 4;
Among the four partitions of RANGE, LIST, HASH and KEY introduced earlier, the partition conditions are: the data must be an integer, if it is not an integer, it should be converted to an integer by a function, such as year() and other functions . COLUMNS partitions can be partitioned directly using non-integer data. The partitions are directly compared based on the type and do not need to be converted to integer. In addition, the RANGE COLUMNS partition can partition the values of multiple columns.
The COLUMNS partition supports the following data types:
range columns time partition
create table t_columns(a int,b datetime)engine=innodbpartition by range columns (b) (partition p0 values less than ('2009-01-01'),partition p1 values less than ('2010-01-01'));
list columns specified field partition
create table t_columns2(a int,b datetime,city varchar(15))engine=innodbpartition by list columns(city) (partition p1 values in ('a','b','c'),partition p2 values in ('d','e','f'),partition p3 values in ('g','h','k'),);
range columns multi-column partition
create table rcx (a int,b int,c char(3))engine=innodbpartition by range columns(a,b,c) (partition p0 values );
Sub-partition is to partition on the basis of partition, also called compound partition. MySQL allows HASH or KEY sub-partitions on the RANGE and LIST partitions.
In the above example, we see that both range and List are integer partitions. In fact, range and List also support non-integer partitions, but they must be combined with COLUMN partitions to support integers, dates, and strings.
Specify the number of sub-partitions
create table ts (a int,b date)engine=innodbpartition by range(year(b))subpartition by hash(to_days(b))subpartitions 2 (partition p0 values less than (1990),partition p1 values less than (2000),partition p2 values less than MAXVALUE)
As you can see, the above partitioning rule is equivalent to dividing the range into two regions first, and then dividing each range into three regions by hash. Therefore, there should be 2*3=6 partition areas.
Specify the sub-partition name partition
We can also use subpartition syntax to explicitly indicate the name of each subpartition
create table ts(a int,b date)engine=innodbpartition by range(year(b))subpartition by hash(to_days(b)) (partition p0 values less than (1990) (subpartition s0,subpartition s1),partition p1 values less than (2000) (subpartition s2,subpartition s3),partition p0 values less than MAXVALUE (subpartition s4,subpartition s5));
- The number of each sub-partition must be the same
- To use subpartition on any partition of a partition table to clearly define any subpartition, you must define all subpartitions.
- Each subpartition clause must include a name for the subpartition.
- The name of the sub-partition must be unique.
It is not recommended to use mysql partition table
After talking about so many partition tables, in the end you told me that it is not recommended to use partition tables? ? ?
Because in actual development, we often use sub-databases and sub-tables. Sub-database sub-tables not only support horizontal split (split rows) of MySQL partitioned tables, but also support vertical split (split columns). Divide the data of a large database (table) into several databases (tables). The structure of each library (table) is the same. But they may be distributed on different mysql instances or even different physical machines to achieve the purpose of reducing the amount of data in a single database (table) and improving access performance.
Compare the two
- Partition table, partition key design is not very flexible, if you do not follow the partition key, it is easy to appear full table lock
- Once the amount of data concurrency comes up, if the association is implemented in the partition table, it will be a disaster
- You can sub-databases and tables yourself, and you can control business scenarios and access modes yourself. Partition table, research and development wrote a SQL, I am not sure what the underlying partition logic of mysql is, it is not very controllable.
- No matter how the partition table is divided, it is on the same machine, which naturally has an upper limit of performance