[MYSQL] Detailed index

Index is a sorted data structure to help MySQL improve query efficiency

View index

SHOW INDEX FROM `table_name`;

Create index

-- create index 必须声明索引名,而alter table未声明索引名则会以第一个索引字段命名
-- create index 无法创建主键索引,并且一次只能创建一个索引
CREATE INDEX `index_name` ON `table_name` (`column`);
-- 创建主键索引
ALTER TABLE `table_name` ADD PRIMARY KEY (`column`);
-- 创建唯一索引 KEY|INDEX 没有本质区别,只不过如果是创建主键索引只能用PRIMARY KEY
ALTER TABLE `table_name` ADD UNIQUE KEY (`column`);
ALTER TABLE `table_name` ADD UNIQUE INDEX (`column`);
-- 创建组合索引
ALTER TABLE `table_name` ADD INDEX `index_name` (`column`, `column2`);
-- 一次创建多个索引
ALTER TABLE `table_name` ADD INDEX `index_name` (`column`), ADD INDEX `index_name` (`column2`);

Delete index

-- 根据索引名删除
ALTER TABLE `table_name` DROP INDEX `index_name`;
-- 删除主键
ALTER TABLE `table_name` DROP PRIMARY KEY;

Index type

Primary key index

Normal index (Normal): only contains a single column index

Unique index (Unique): The index column value must be unique, and only one null can exist

Composite index : index composed of multiple columns

Full text index (Full Text): Act on CHAR, VARCHAR, TEXT type fields, InnoDB before version 5.7 does not support

Index method

B+TREE index

To talk about B+Tree, you first need to understand B-Tree (multi-fork balanced search tree)

B-Tree

Insert picture description here

feature:

Each page can store multiple index keys

Index keys are distributed in the entire tree, any key value appears and only appears in one node

A large amount of information (key value, pointer, data data) is stored in each node

The search may end in non-leaf nodes

B+Tree

B+Tree is an optimization based on B-Tree

Insert picture description here

The main difference from B-Tree:

  • The number of subtree pointers of non-leaf nodes is the same as the number of keywords
  • Non-leaf nodes do not store data, only indexes and pointers (more keys can be stored per page)
  • The leaf node contains all index fields and data data (the leaf node needs to be queried each time, relatively stable)
  • The leaf nodes are connected by two-way pointers to improve the performance of interval access

Assuming that the index tree is 3 levels high and a row of records occupies 512b, how many records can be stored?

The default page size for InnoDB is 16KB, non-leaf nodes only store index keys and pointers

Assuming that the primary key index is bigint type occupies 8b, a pointer occupies 6b, then a node is to store 16KB * 1024 / (8b + 6b) = 1170 key values

The tree height is 3, then the second layer will have 1170 * 1170 = 136w key values, that is to say, there will be 136w leaf nodes

As mentioned earlier, a page is 16KB, a row of records occupies 512b, then a page of leaf nodes can store 16KB * 1024/512 = 32 records

That is to say, the total number of records that can be recorded by the three-level tree is about 136w * 32 = 4380w records

For a three-level tree, two I/O operations are usually required, because the top page is resident in memory

HASH index

Hash index is realized by hash table, the hash code is calculated through the columns contained in the index, and if there is a hash conflict, it is resolved through the linked list

Insert picture description here

The Hash index used by Mermory by default, InnoDB does not support HASH index, but for frequently accessed tables, InnoDB will build an adaptive HASH index

The hash index has the following problems

  • Since the hash index compares the calculated hash value, it can only be used for equivalent search ( =、in、<=>), and does not support range search
  • It is impossible to avoid sorting operations again. The hash index is unordered and cannot be sorted directly by the index. You need to find the records and sort them again.
  • For composite indexes, all index column queries must be used, because when calculating the hash, it is calculated through all index columns, and the principle of the leftmost prefix cannot be used
  • Cannot use a covering index because the index only contains hash values ​​and row pointers
  • Table scan cannot be avoided, because different index keys have the same hash value, and the actual data in the table needs to be compared. When the hash conflict is serious, the efficiency may not be high

Clustered index

Clustered index : Put data storage and index together, and the leaf nodes of the index structure store row data

Generally speaking, the primary key index is a clustered index, and the leaf nodes store all row data

Non-clustered index : separate data and index storage, the leaf node of the index structure points to the corresponding position of the data

Generally speaking, in addition to the primary key index other indexes are non-clustered indexes, non-clustered indexes store the primary key value, non-clustered index access data usually requires a second search

First find the corresponding primary key through the non-clustered index, and then return to the clustered index to query the record through the primary key (back to the table)

Why does InnoDB suggest that a primary key must be created?

If we define the primary key, InnoDB will choose the primary key for clustered index. If not, it will choose the first unique index that does not contain null values ​​as the clustered index. If it still does not, MySQL will build a ROWID as the clustered index. Relatively high

Why is an integer auto-incrementing primary key recommended?

If the primary key is self-incrementing, then each inserted record will be added to the subsequent position of the current index node in order, and a new page will be opened if the storage is not enough.

If the primary key is not self-increasing, then the probability of each insertion is in the middle position, which will cause the data to move, and even cause the B+ number structure to change and self-balance. Affect the performance of data writing

Integer data occupies a relatively small space, which allows more keys to be stored on each page, and is faster when comparing.

The leaf node of MyISAM's B+Tree clustered index stores the primary key, and the non-clustered index stores the secondary key. The table data is stored separately. The two trees point to the real table data through an address, so for non-clustered Indexes can access data without returning to the table

Composite index

DDL

# 该ddl用于测试组合索引/索引失效/索引覆盖用例
CREATE TABLE `t_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` varchar(32) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  `c` int(11) DEFAULT NULL,
  `d` int(11) DEFAULT NULL,
  `e` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_ABC` (`a`,`b`,`c`),
  KEY `IDX_AE` (`a`,`e`)
) ENGINE=InnoDB;

The leftmost prefix principle and query optimization

# 走索引 a_b_c
EXPLAIN SELECT * FROM t_test WHERE a = 'test' AND b = 123 AND c = 0;
# mysql查询优化器会纠正这条查询语句以什么样的顺序执行效率最高,然后才生成真正的执行计划
# 走索引 a_b_c
EXPLAIN SELECT * FROM t_test WHERE a = 'test' AND c = 0 AND b = 123;
# 走索引,a_b_c 
EXPLAIN SELECT * FROM t_test WHERE c = 0 AND b = 123 AND a = 'test';
# 走索引,a_b
EXPLAIN SELECT * FROM t_test WHERE a = 'test' AND b = 123;
# 走索引,a
EXPLAIN SELECT * FROM t_test WHERE a = 'test' AND c = 0;
# 不走索引,违背了最左前缀原则
EXPLAIN SELECT * FROM t_test WHERE b = 123 AND c = 0; 
# 走索引, a_b 遇到范围查询,后面的索引会失效, 这里也就是c失效
EXPLAIN SELECT * FROM t_test WHERE a = 'test' AND b > 123 AND c = 0;
# 走索引, a_b_c 可以看成先排序在判断范围查询失效的情况
EXPLAIN SELECT * FROM t_test WHERE c > 0 AND b = 123 AND a = 'test';
# 走索引, a_b
EXPLAIN SELECT * FROM t_test WHERE c = 0 AND b > 123 AND a = 'test';

Joint indexabc

The principle of the leftmost prefix, from left to right, if it is broken, the following index fields will not take effect

If there ais only after where, then atake the index

If the where is followed by abthe abindex

If the “where” is followed by acthe aindex, bit will cnot take effect if it is broken.

If there is bcno index after where , ait will bcnot take effect if it is broken

The order of the joint index after where can be disrupted

For example:, abc acb cab cba bac bcamysql will be optimized toabc

Encountered range judgment, such as < >will cause the subsequent index to become invalid

This priority is lower, mysql will optimize the order before judging, for example:, the c > 0 and b = 123 and a = 'cc'index abcis still valid

After testing, if the composite index does not meet the leftmost prefix principle, but meets the covering index, it can still go to the index index (the principle is to be studied)

Index failure

By orconnection causes failure index (or as long as the left and right sides of the presence of a non-indexed field)

EXPLAIN SELECT * FROM t_test WHERE a = 'test' OR b = 123 or c = 0;

The first percent sign %will cause the index to become invalid

EXPLAIN SELECT * FROM t_test WHERE a LIKE '%test';

Type conversion will cause index failure

When there are both numeric and character types, both sides will be converted to floating-point numbers for comparison, and the conversion of strings (left field) to floating-point numbers is uncertain

For details, please refer to: MySQL official document-type-conversion

EXPLAIN SELECT * FROM t_test WHERE a = 1; // a (varchar)   索引失效
EXPLAIN SELECT * FROM t_test WHERE a = '1'; // a (int)   索引生效
EXPLAIN SELECT * FROM t_test WHERE a IN (1, '2'); // a (varchar)   索引失效
EXPLAIN SELECT * FROM t_test WHERE a IN (1, '2'); // a (int)   索引生效

example:

a is a varchar type, and implicit type conversion occurs when the types on both sides do not match. Both strings 100a and 100b are converted to floating-point numbers, both are 100.

Insert picture description here

The index column in the where condition participates in the operation, b+1resulting in bfailure

EXPLAIN SELECT * FROM t_test WHERE a = 'test' AND b+1 > 0; 

The index column in the where condition uses a function

EXPLAIN SELECT * FROM t_test WHERE LOWER(a) = 'test'

If MySQL feels that using a full table scan is faster, it will cause the index not

in 、not in、!=、<> 、is null、is not null The index is used. The reason why there is no use of the index is because there is too much data to return to the table. MySQL believes that the cost of index scan is greater than that of full table scan, which leads to index failure.

Use a covering index, change *to awill not cause back to the table, the index must be taken

EXPLAIN SELECT * FROM t_test WHERE a IS NULL;

Tested in 、not in、!=、<> 、is null、is not nullwill always take the index, the index typerange

Index coverage

The column data to be queried can be obtained only by querying the auxiliary index tree, that is, the selectcolumn data can be obtained only from the index

# count非索引列会全表扫描,可以count索引使用索引覆盖,count(0)如果该表存在索引mysql会自动找一列索引
EXPLAIN SELECT COUNT(d) FROM t_test
# 查询列做组合索引,避免回表
EXPLAIN SELECT a,e FROM t_test WHERE a = 'test'

Index Push Down (ICP)

Here is a synchronization concept. For index range queries, does MySQL take out all eligible non-clustered indexes at one time and then perform a unified return to the table, or does it perform a return to the table once a match is found?

  1. In fact, it returns to the table immediately after matching a record, and then returns the complete record to the service layer
  2. After the server layer receives the record returned by the engine layer, it then determines whether the remaining WHER conditions are established, and if it is established, it will be sent to the client
  3. Then the server layer requests the storage engine to read the next record
  4. It goes over and over again, until the engine layer encounters a non-compliant record, it returns the information that has been read to the server layer, and the server query is completed
  5. The client will display all the data after receiving it

After understanding how the engine layer and the service layer interact, let's look at the index push down

# ALTER TABLE t_user ADD INDEX IDX_NAME_AGE (name, age);
EXPLAIN SELECT * FROM t_user WHERE name LIKE '陈%' AND age = 18;

The index usage in the where condition is divided into three categories:

index key Use the range index to determine the range of the scan (range)

index filterAfter using the index keydetermined range, if there are still conditions that are not met, if these conditions can be filtered through the index

table filter The conditions in where cannot be filtered by the index, only the table can be accessed

Before MySQL5.6, it did not distinguish between index filterand table filter, unified the range index back to the table to find the complete record and then returned to the MySQL Server layer for filtering. After MySQL5.6, index filterand table filterseparated, it index filterdropped to the index layer for filtering, reducing the return to the table and the MySQL Server layer. The record interaction overhead, which is called index push down (ICP)

Index pushdown only occurs in non-clustered indexes, and the number of index fields used does not affect the optimization of index pushdown

For details, see: MySQL official document-index-condition-pushdown-optimization

Pushdown without index:

Insert picture description here

Push down using index:

Insert picture description here

command

# 开启索引下推  默认开启
set optimizer_switch='index_condition_pushdown=on';
# 关闭索引下推
set optimizer_switch='index_condition_pushdown=off';

Mandatory index

Can be used FORCE INDEXto USE INDEXforce the specified index or use to IGNORE INDEXignore the specified index

EXPLAIN SELECT * FROM t_test FORCE INDEX (IDX_ABC) WHERE a = 'test';
EXPLAIN SELECT * FROM t_test USE INDEX (IDX_ABC) WHERE a = 'test';
EXPLAIN SELECT * FROM t_test IGNORE INDEX (IDX_ABC) WHERE a = 'test';