Create high-performance indexes | Types of indexes

table of Contents


Index basics---type of index

B-Tree index

Hash index

Spatial data index

Full-text index

Other types of indexes


Index is a mechanism that exists in database fields to improve query efficiency. Simply put, the index is equivalent to a table of contents of a book, helping readers to quickly locate the target content.

A good index design can greatly improve query performance, especially when the amount of data in the table is getting larger and larger, the impact of the index on performance becomes more and more important. In the face of ever-increasing data, index-based optimization is the best way to improve performance, and the changes to applications are orders of magnitude.

Although it may be tempting to create an index for every possible column used in a query, unnecessary indexes waste space and time for MySQL to determine which indexes to use. Indexes also increase the cost of inserts, updates, and deletes, because every index must be updated. The right balance must be found in order to use the best index set to achieve fast query. Therefore, index knowledge is an essential skill for developers.

Index basics---type of index

There are many types of indexes, which can provide better performance for different scenarios. In MySQL, the index is implemented at the storage engine layer instead of the server layer. Therefore, there is no unified indexing standard; different storage engines have different indexing methods.

The following describes the index types supported by MySQL, and their advantages and disadvantages.

B-Tree index

Here is a generalized B-Tree, including btree and b+tree (b plus tree). All types derived from B-trees are as follows:

The B-Tree index uses the B-Tree data structure to store data. Most MySQL engines support this kind of index. B-Tree usually means that all values ​​are stored in order. And the distance from each leaf page to the root is the same.

The B-Tree index can speed up the access to data, because the storage engine no longer needs to scan the entire table to yell at the required data. Instead, it searches from the root node of the index.

The type of query that can use the B-Tree index. B-Tree index is used for full key value, key value range or key prefix search. The key prefix search is only applicable to the search based on the leftmost prefix. Specifically valid for the following types of queries:

  • Full-value matching: Full-value matching refers to matching with all columns in the index. For example, the index is used to find a person whose name is thump and was born on 1900-01-01
  • Match the leftmost prefix: find all people whose last name is Allen, that is, only use the first column of the index
  • Match column prefix: You can also match only the beginning of a column value. For example, the index can be used to find all people whose last name starts with J, and only the first column of the index is used here.
  • Matching range value: Find people whose row name is Allen and Jack. Only the first column of the index is used here.
  • Exactly match one column and range match another column: Find people whose last name is Allen and whose first name starts with the letter K, that is, the first column last_name all matches, the second column first_name matches the range.
  • Index-only query: B-Tree can usually support "index-only query", that is, the query only needs to access the index, and does not need to access the data row.

For detailed knowledge of B-Tree and B+Tree, please refer to the interviewer asking you about B-Tree and B+Tree, and then throw this article to him.

Hash index

The hash index is implemented based on a hash table, and only queries that exactly match all columns are effective. For each row of data, the storage engine calculates a hash code for all index columns. The hash code is a smaller value, and the calculated hash codes for rows with different key values ​​are different. The hash index stores all the hash codes in the index, and at the same time stores a pointer to each data row in the hash table.

In MySQL, only the memory engine supports hash indexes, and other engines do not.


CREATE TABLE `testhash` ( `fname` varchar(50) DEFAULT NULL, `lname` varchar(50) DEFAULT NULL, KEY `fname` (`fname`) USING HASH) ENGINE=MEMORY; INSERT INTO `testhash` VALUES ('Trump', '川建国');INSERT INTO `testhash` VALUES ('jack', '杰克');INSERT INTO `testhash` VALUES ('lucy', '露西');INSERT INTO `testhash` VALUES ('lili', '丽丽');

Suppose the index uses the hash function f() to generate the hash code:

f('Trump')=2323 f('jack')=7437 f('lucy')=8784 f('lili')=2458

Then, the data structure of the hash index is:

Look at the following query:

select lname from testhash where fname ='lucy'

Mysql first calculates that Peter's hash value is 8784, then finds the corresponding row pointer in the hash index, and finds the corresponding data row according to the pointer.

The index only stores the hash code and row pointer, so the data structure of the index is very compact, which also makes the hash index lookup very fast, but the hash index also has its limitations.

Limitations of hash indexes

  • The hash index is implemented based on a hash table, and only queries that exactly match all columns are effective. For each row of data, the storage engine calculates a hash code for all index columns. The hash code is a smaller value, and the calculated hash codes for rows with different key values ​​are different. The hash index stores all the hash codes in the index, and at the same time stores a pointer to each data row in the hash table.
  • The hash index only contains the hash value and row pointer, and does not store the field value, so the value in the index cannot be used to avoid reading the row. However, the speed of accessing rows in memory is very fast, so in most cases this has no obvious impact on performance.
  • Hash index data is not stored in the order of index values, so it cannot be used for sorting.
  • The hash index does not support partial index column matching search, because the hash index always uses the entire content of the index column to calculate the hash value. For example, create a hash index on the data column (A, B). If the query only has data column A, the index cannot be used.
  • Hash index only supports equivalent comparison queries, including =, IN(), and <=>. Does not support task range query, such as WHERE price >100.
  • Access to hash index data is very fast, unless there are many hash collisions. When a hash conflict occurs, the storage engine must traverse all the row pointers in the linked list, compare row by row, and find all rows that meet the conditions.
  • If there are many hash conflicts, some index maintenance operations will be costly. For example, if a hash index is established on a column with very low selectivity, when a row is deleted from the total table, the storage engine needs to traverse each row in the linked list of the corresponding hash value, find and delete the reference of the corresponding row, conflict The more, the greater the cost.

Application scenarios of hash index

I haven't used the hash index, but based on its characteristics, it is suitable for some full-value comparison scenarios, such as the ip blacklist of websites. Friends who have been in actual combat are welcome to add applicable scenarios in the comment area.

InnoDB engine has a special function called "adaptive hash index (adaptive hash index)". When InnoDB notices that some index values ​​are used very frequently, it will create a hash index based on the B-Tree index in memory, so that the B-Tree index also has some advantages of the hash index. Such as fast hash lookup. This is a completely automatic, internal behavior that users cannot control. If necessary, the user can turn off this feature.

If the storage engine does not support hash indexes, you can simulate hash indexes like InnoDB, and you can enjoy some of the convenience of hash indexes. When the data is stored in the database, it is converted to a hash value, and the hash function is manually specified when searching.

Spatial data index

MyISAM tables support spatial indexes and can be used as geographic data storage. Unlike B-Tree indexes, this type of index does not require prefix queries. The spatial index will index data from all dimensions. When querying, you can effectively use any dimension to combine queries. Must use MySQL GIS related functions such as MBRCONTAINS () to maintain the data. MySQL's GIS support is not complete, and most people will not use this feature.

Full-text index

The full-text index is a special type of index that looks for keywords in the text instead of directly comparing the values ​​summarized by the index. The matching method of full-text search and other types of indexes is completely different. It has many details that need attention, such as stop words, stems and plurals, Boolean search, etc. Full-text indexing is more similar to what search engines do, rather than simple WHERE condition matching.

This is a good thing. I remembered that when I was an e-commerce company, I used Solr to configure rules, maintain data, and optimize indexes. After working for a long time, the search results were not good. And later, the number of users did not rise, and solr was overkill. If you knew the full-text index, you might be able to use it for a while. It seems that I have to study more and make less detours.

The simple way to use is as follows:

---建表create table fulltext_test (    id int(11) NOT NULL AUTO_INCREMENT,    content text NOT NULL,    tag varchar(255),    PRIMARY KEY (id),    FULLTEXT KEY content_tag_fulltext(content,tag)  ) ENGINE=MyISAM DEFAULT CHARSET=utf8;---自己写入数据 ---查询 和常用的模糊匹配使用 like + % 不同,全文索引有自己的语法格式,使用 match 和 against 关键字 SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('清华' IN NATURAL LANGUAGE MODE); SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('清华');

The full-text index can set stop words, word segmentation, etc.,

Other types of indexes

There are many third-party storage engines that use different types of data structures to store indexes. For example, TokuDB uses fractal tree index, which is a relatively newly developed data structure, which not only has many advantages of B-Tree, but also avoids some shortcomings of B-Tree. ScaleDB uses Partricia tries, and other storage engine technologies such as InfiniDB and Onfobright use some special data structures to optimize some special queries.

The above introduces the commonly used index types in MySQL. I hope I can help you in my future work.


"High Performance MySQL"

The interviewer asks you about B-tree and B+-tree, and then throw this article to him

Mysql uses full text index (FullText index)

MySQL full-text index