MySQL optimization (three) table partition

1. Introduction:

1. Concept: Table partitioning refers to decomposing a table in the database into multiple smaller, easy-to-manage parts according to certain rules. Logically, there is only one table, but the bottom layer is composed of multiple physical partitions. The data in the mysql database is stored on the disk in the form of a file, and it is placed under /mysql/data by default, and can be viewed through show variables like'%datadir%'; (you can also view it through datadir in my.cnf), A table mainly corresponds to three files: frm stores the table structure, myd stores the table data, and myi stores the table index. If the amount of data in a table is too large, myd and myi will be large, and the data will be very slow to find. Using the partitioning function of mysql, physically divide the three files corresponding to this table into many small pieces, so that when searching for data, you don’t need to search all of them, as long as you know which piece of data is in, then place it there. Just find one piece. If the data in the table is too large, one disk may not fit. At this time, the data can be allocated to different disks.

2. The difference between table partitioning and sub-table: The difference between sub-table and partition is that the partition logically has only one table, and the sub-table is to decompose a table into multiple tables.

3. The advantages of table partitioning:

(1) Compared with a single disk or file system partition, more data can be stored.

(2) For data that has lost its meaning for preservation, it is usually easy to delete those data by deleting the partitions related to those data. Conversely, in some cases, the process of adding new data can be easily implemented by adding a new partition specifically for those new data.

(3) Some queries can be greatly optimized. This is mainly due to the fact that data that satisfies a given WHERE statement can only be stored in one or more partitions, so that there is no need to find other remaining partitions when searching. Because the partition can be modified after the partition table is created, you can reorganize the data to improve the efficiency of commonly used queries when you have not done so when you configure the partition scheme for the first time.

(4) Queries involving aggregate functions such as SUM() and COUNT() can be easily processed in parallel. A simple example of this kind of query is "SELECT sale_id, COUNT (orders) as order_total FROM sales GROUP BY sale_id;". By "parallel", this means that the query can be performed on each partition at the same time, and the final result only needs to be obtained by totaling the results of all partitions.

(5) Obtain greater query throughput by spreading data queries across multiple disks.

4. The limiting factors of the partition table:

(1) A table can only have 1024 partitions at most.

(2) In MySQL 5.1, the partition expression must be an integer or an expression that returns an integer. MySQL5.5 provides support for non-integer expression partitioning.

(3) If there are primary key or unique index columns in the partition field, then many primary key columns and unique index columns must be included. That is: the partition field either does not contain the primary key or index column, or contains all the primary key and index column.

(4) Foreign key constraints cannot be used in partitioned tables.

(5) MySQL partitioning is applicable to all data and indexes of a table. It cannot only partition the table data but not the index, nor can it only partition the index but not the table, nor can it partition only a part of the data of the table.

5. Determine whether the current MySQL supports partitioning: show variables like'%partition%';

The value of have_partintioning is YES, indicating that partitioning is supported.

6. Partition types supported by MySQL:

(1) RANGE partition: Assign multiple rows to partitions based on column values ​​belonging to a given continuous interval.

(2) LIST partition: Similar to partition by RANGE, the difference is that LIST partition is selected based on the column value matching a value in a discrete value set.

(3) HASH partition: partition selected based on the return value of a user-defined expression, which is calculated using the column values ​​of these rows to be inserted into the table. This function can contain any expression that is valid in MySQL and produces a non-negative integer value.

(4) KEY partition: Similar to partition by HASH, the difference is that KEY partition only supports calculation of one or more columns, and MySQL server provides its own hash function. One or more columns must contain integer values.

Note: In MySQL 5.1 version, RANGE, LIST, HASH partition requires that the partition key must be of type INT, or the type of INT must be returned by an expression. But when KEY partitioning, you can use other types of columns (except BLOB, TEXT type) as the partition key.

Two, RANGE partition: The range partitioning, but do not overlap the continuous range should be used PARTITION BY RANGE,  VALUES LESS THANkeyword. When the COLUMNSkeyword is not used , the RANGEbrackets must be an integer field name or a function that returns a certain integer.

1. Grammar:

(1) According to the numerical range:

drop table if exists test;CREATE TABLE `test` (  `id` int(11) NOT NULL AUTO_INCREMENT,  `user_name` varchar(255) DEFAULT NULL COMMENT '名称',  `user_age` int(11) DEFAULT NULL COMMENT '年龄',  `birthday` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,  PRIMARY KEY (`id`)) ENGINE=myisam   DEFAULT CHARSET=utf8mb4partition by range(id)(    partition p0 values less than (10),    partition p1 values less than (20),    partition p2 values less than (30),    partition p3 values less than (40),    partition p4 values less than MAXVALUE ); 

Note that if there is no last sentence, inserting data with id>40 will result in an error. Take a look at the generated file:

Test: Insert a few pieces of data, you can see that p0 and p4 partitions are inserted respectively.

insert into test(user_name) VALUES('测试1');insert into test(id,user_name) VALUES(50,'测试50');

(2) According to the TIMESTAMPscope:

drop table if exists test;CREATE TABLE `test` (  `id` int(11) NOT NULL ,  `user_name` varchar(255) DEFAULT NULL COMMENT '名称',  `user_age` int(11) DEFAULT NULL COMMENT '年龄',  `birthday` timestamp not null default current_timestamp on update current_timestamp) ENGINE=myisam   DEFAULT CHARSET=utf8mb4partition by range(unix_timestamp(birthday))(  partition p0 values less than (unix_timestamp('2000-01-01 00:00:00')),  partition p1 values less than (unix_timestamp('2010-04-01 00:00:00')),  partition p2 values less than maxvalue); 

(3) Basis DATEand DATETIMErange: Adding COLUMNSkeywords can define non-integer ranges and multi-column ranges, but you need to note COLUMNSthat only column names can be used in parentheses, and functions are not supported; for multi-column ranges, the multi-column ranges must show an increasing trend:

drop table if exists test;CREATE TABLE `test` (  `id` int(11) NOT NULL ,  `user_name` varchar(255) DEFAULT NULL COMMENT '名称',  `user_age` int(11) DEFAULT NULL COMMENT '年龄',  `birthday` date not null) ENGINE=myisam   DEFAULT CHARSET=utf8mb4partition by range columns(birthday)(  partition p0 values less than ('2000-01-01'),  partition p1 values less than ('2020-01-01'),  partition p2 values less than maxvalue); 

(4) According to the range of multiple columns:

drop table if exists test;CREATE TABLE `test` (  `id` int(11) NOT NULL ,  `user_name` varchar(255) DEFAULT NULL COMMENT '名称',  `user_age` int(11) DEFAULT NULL COMMENT '年龄',  `birthday` date not null) ENGINE=myisam   DEFAULT CHARSET=utf8mb4partition by range columns(id,user_age)(  partition p0 values less than (0,10),  partition p1 values less than (10,20),  partition p2 values less than (20,30),  partition p3 values less than (30,40),  partition p4 values less than (40,50),  partition p5 values less than (maxvalue,maxvalue)); 

2. RANGE partition is particularly useful in the following situations:

(1) When you need to delete data on a partition, just delete the partition. For example, using "alter table test drop partition p0;" is much more effective than running "delete from test where...";

(2) Want to use a column that contains a date or time value, or a value that starts to grow from some other series;

(3) Frequently run queries that directly depend on the columns used to split the table. If you execute "select count(*) from test where year(birthday) = 2020", MySQL can quickly determine that only partition p1 needs to be scanned. This is because the remaining partitions cannot contain any records that conform to the WHERE clause.

3. LIST partition: partition according to specific values, each partition value does not overlap, use PARTITION BY LISTand VALUES INkeywords. With Rangepartitions similar, do not use COLUMNSkeyword Listmust be an integer field names or determine function returns an integer in parentheses. Similar to partitioning by RANGE, the difference is that LIST partitioning is selected based on the column value matching a value in a discrete value set. If you try to insert a column value (or the return value of a partition expression) that is not in a row in the partition value list, the "INSERT" query will fail and an error will be reported. As with Rangepartitioning, adding COLUMNSkeywords can support non-integer and multiple columns.

1. Grammar:

drop table if exists test;CREATE TABLE `test` (  `id` int(11) NOT NULL ,  `user_name` varchar(255) DEFAULT NULL COMMENT '名称',  `user_age` int(11) DEFAULT NULL COMMENT '年龄',  `birthday` date not null) ENGINE=Innodb   DEFAULT CHARSET=utf8mb4partition by list(user_age)(  partition p0 values in (2,4,6,8,10),  partition p1 values in (1,3,5,7,9));  

4. HashPartition: It is mainly used to ensure that the data is evenly distributed in a predetermined number of partitions. The Hashbrackets can only be integer columns or functions that return a certain integer. In fact, the returned integer is used to modulate the number of partitions. To partition a table using HASH partitioning, add a "PARTITION BY HASH (expr)" clause to the CREATE TABLE statement, where "expr" is an expression that returns an integer. It can just be the name of a column whose column type is MySQL integer. In addition, you may need to add a "PARTITIONS num" clause at the end, where num is a non-negative integer that represents the number of partitions that the table will be divided into. If a PARTITIONS clause is not included, the number of partitions will default to 1.

1. Grammar:

drop table if exists test;CREATE TABLE `test` (  `id` int(11) NOT NULL ,  `user_name` varchar(255) DEFAULT NULL COMMENT '名称',  `user_age` int(11) DEFAULT NULL COMMENT '年龄',  `birthday` date not null) ENGINE=Innodb   DEFAULT CHARSET=utf8mb4partition by hash(id)partitions 4;  
drop table if exists test;CREATE TABLE `test` (  `id` int(11) NOT NULL ,  `user_name` varchar(255) DEFAULT NULL COMMENT '名称',  `user_age` int(11) DEFAULT NULL COMMENT '年龄',  `birthday` date not null) ENGINE=Innodb   DEFAULT CHARSET=utf8mb4partition by hash(year(birthday))partitions 4;  

HashThe partition also has the Hashsame problem as the traditional partition table, and its scalability is poor. MySQLIt also provides a similar Hashpartitioning method-linear Hashpartitioning, only need to add LINEARkeywords when defining the partition . Linear hash function, the difference between it and conventional hash is that the linear hash function uses a linear powers-of-two algorithm, while the conventional hash uses the modulus of the hash function value. number.

drop table if exists test;CREATE TABLE `test` (  `id` int(11) NOT NULL ,  `user_name` varchar(255) DEFAULT NULL COMMENT '名称',  `user_age` int(11) DEFAULT NULL COMMENT '年龄',  `birthday` date not null) ENGINE=Innodb   DEFAULT CHARSET=utf8mb4partition by linear hash(year(birthday))partitions 4;  

5. KEY partitioning: KeyPartitioning Hashis very similar to partitioning, except that the Hashfunction is different . Just Hashreplace the keyword with when defining it Key. The same Keypartitioning also has Hasha linear Keypartitioning method corresponding to linearity . The use of the keyword LINEAR in the KEY partition has the same effect as the use in the HASH partition. The partition number is obtained by the powers-of-two algorithm, not by the modulus algorithm. In addition, when the table has a primary key or a unique index, Keythe column names in parentheses can be omitted , and Mysqlthe selection will be in the order of the primary key-the unique index, and an error will be reported when the unique index is not found.

drop table if exists test;CREATE TABLE `test` (  `id` int(11) NOT NULL ,  `user_name` varchar(255) DEFAULT NULL COMMENT '名称',  `user_age` int(11) DEFAULT NULL COMMENT '年龄',  `birthday` date not null) ENGINE=Innodb   DEFAULT CHARSET=utf8mb4partition by key(id)partitions 4;