MySQL clustered index and non-clustered index summary

table of Contents

I. Overview

Not much to say, let’s ask a few questions first:

Second, the clustered index

Three, non-clustered index

Fourth, the implementation of clustered index and non-clustered index in MySQL

Five, the difference between clustered index and non-clustered index

Six, applicable scenarios

Seven, summary


I. Overview

Not much to say, let’s ask a few questions first:

  1. What is a clustered index? What is a non-clustered index?
  2. What is the difference between a clustered index and a non-clustered index?
  3. Why does InnoDB only store specific data in the leaf nodes of the primary key index tree, but other index trees do not store specific data, but store the primary key value?
  4. Why does indexing improve retrieval speed?
  5. Why is it not recommended to index fields with low discrimination, such as gender?
  6. Why is it not recommended to use mysql built-in functions, but to handle them in the business code?
  7. Why is the SQL execution still very slow after the index is clearly built?
  8. Why does MySQL recommend using auto-incrementing ID as primary key index instead of UUID, snowflake algorithm, etc.?

Next, we take the above questions to start our topic today-the introduction of MySQL clustered index and non-clustered index.

Second, the clustered index

(1) Concept

Clustered (clustered) index, also called clustered index. It means that the physical order of the data in the database table rows is the same as the logical (index) order of the key values. Because there is only one case for the physical order of a table, there can only be one clustered index for each table.

The clustered index is analogous to our Xinhua dictionary, sorted by the letter az, and all the following words are also sorted in the order of az, this is the clustered index. The leaf nodes of the clustered index store real data rows, so the data in the database can be directly obtained through the clustered index.

(Two), schematic diagram

The following figure is an example of the storage of a clustered index:

As can be seen from the above figure, the leaf node of the clustered index is the corresponding data node, and the data of all the corresponding columns can be directly obtained. The non-clustered index we will introduce later may require a second query when querying, so in In terms of queries, clustered indexes tend to be faster than non-clustered indexes.

(3) How to generate a clustered index

Take MySQL as an example. The clustered index is usually the primary key of the table. This is why InnoDB requires the table to have a primary key. InnoDB will process it according to the following rules:

1. If a primary key is defined, then this primary key is used as a clustered index;

2. If no primary key is defined, then the first unique non-empty index of the table is used as the clustered index;

3. If there is no primary key and no suitable unique index, then innodb will generate a hidden primary key as a clustered index. The hidden primary key is a 6-byte column whose value will increase with the insertion of data ;

(4) Points of Attention

Try to add a clustered index when creating a table. Due to the particularity of the physical order of the clustered index, if you create an index on it, the order of all data rows will be moved according to the order of the index column, which will be very time-consuming and performance.

Three, non-clustered index

(1) Concept

Unclustered (unclustered) index, that is, the logical order of the index in the index is different from the physical storage order of the disk upstream, so a table can have multiple non-clustered indexes.

Unlike the clustered index, the leaf nodes of a non-clustered index do not store specific data pages. The leaf nodes contain index field values ​​and logical pointers to data rows. Therefore, the non-clustered index cannot directly obtain the data, and the data needs to be retrieved through the second search through the logical pointer.

(Two), schematic diagram

The following is a storage diagram of a non-clustered index:

As you can see from the above figure, the leaf node of the non-clustered index does not store the real data, it just stores the address that points to the specific data. The first search can only get the address of the data row, and it needs to go through the second Go to the data file to find the corresponding data [that is, return to the table].

(3) How to solve the secondary query problem of non-clustered index

We can actually use a covering index to solve the secondary query problem of a non-clustered index. By creating an index with more than two columns, you can query the data of the columns in the composite index without performing secondary queries back to the table. For example, a joint index index(col1, col2) is established in the table, as follows:

select col1, col2 from t1 where col1 = '213';

You can avoid the second query, because the corresponding data can already be queried and returned from the index column.

Note: The use of a composite index needs to meet the principle of the leftmost index, that is, if there is no leftmost one or more columns in the where condition when querying, the index will not work.

Fourth, the implementation of clustered index and non-clustered index in MySQL

Let's take a look at the underlying implementation of clustered index and non-clustered index in the MySQL two storage engines.

(1) Innodb engine

InnoDB is a clustered index method, data and index are stored in the same file.

[Main index search process]

First, InnoDB will build an index B+ tree based on the primary key ID as the KEY, as shown in the figure above, and the leaf nodes of the B+ tree store the data corresponding to the primary key ID. For example, when the statement select * from user_info where id=15 is executed, InnoDB Will query this primary key ID index B+ tree, and find the corresponding user_name='Bob'.

[Auxiliary Index Search Process]

For example, if we want to add an index to the user_name field, InnoDB will create a user_name index B+ tree, where the KEY of user_name is stored in the node, and the data stored in the leaf node is the primary key KEY.

Note that the leaf stores the primary key KEY. After the primary key KEY is obtained, InnoDB will go to the primary key index tree to find the corresponding data according to the primary key KEY just found in the user_name index tree. [Double query, back to table operation]

Why does InnoDB only store specific data in the leaf nodes of the primary key index tree, but other index trees do not store specific data, but first find the primary key, and then find the corresponding data in the primary key index tree?

Because InnoDB needs to save storage space, there may be many indexes in a table. InnoDB will generate an index tree for each indexed field. If the index tree of each field stores specific data, then the index data file of this table It becomes very huge (extremely redundant data).

(2), MyISAM engine

MyISAM uses a non-clustered index method, and the data and the index fall on two different files.

As shown in the figure above, MyISAM uses the primary key as the KEY to build the primary index B+ tree when building the table. The leaf nodes of the tree store the physical address of the corresponding data. After we get this physical address, we can directly locate the specific data record in the MyISAM data file.

When we add an index to a field, we will also generate the index tree of the corresponding field. The leaf node of the index tree of the field also records the physical address of the corresponding data, and then uses this physical address to locate in the data file. To specific data records.

Five, the difference between clustered index and non-clustered index

  1. There can only be one clustered index in a table, and there can be multiple non-clustered indexes in a table;
  2. Clustered index storage records are physically continuous, non-clustered index is logically continuous, and physical storage is not continuous;
  3. Through the clustered index, you can find the data you need to find once, while through the non-clustered index, you can only find the primary key value corresponding to the record for the first time. You need to go back to the table and then use the value of the primary key to find the required data through the clustered index; therefore Compared with non-clustered index, clustered index has faster retrieval speed;

Six, applicable scenarios

Application of clustered index:

  • 1. A column with a large number of unique values;
  • 2. Use BETWEEN, >, >=, <or <= to return a column of range values;
  • 3. Columns that are continuously accessed;
  • 4. Queries that return large result sets;
  • 5. Columns that are frequently accessed by queries using joins or GROUP BY clauses;

The following table summarizes when to use a clustered index or a non-clustered index:

Action description

Use a clustered index

Use non-clustered index

Columns are often sorted by group

should

should

Return data in a certain range

should

Should not

One or very few different values

Should not

Should not

Small number of different values

should

Should not

Large number of different values

Should not

should

Frequently updated columns

Should not

should

Foreign key column

should

should

Primary key column

should

should

Frequently modify index columns

Should not

should

Seven, summary

This article mainly summarizes the clustered index and non-clustered index in MySQL, from the concept, data storage, data query process, difference between the two, applicable scenarios, etc., summarizes the difference between the two and their implementation in different storage engines. As we all know, indexes help improve retrieval performance, but too many or improper indexes can also cause system inefficiency or index fragmentation, because indexes also take up disk space. Therefore, we have to remember that not all tables are suitable for indexing. Only tables with a large amount of data are suitable for indexing, and the performance will be improved when they are built on columns with high selectivity. Due to the limited level of the author, if there is something wrong in the article, I hope to correct it. I hope this article will be helpful to everyone.

Reference materials:

Sharing of learning materials

  • MySQL High Performance Books_3rd Edition (Chinese)
  • Inside MySQL Technology-InnoDB Storage Engine_Version 2