The principle and use of MySQL index, MySQL index data structure, binary tree, BST, AVL, red-black tree, B tree, B+ tree data structure!

Preface

Recently, I have summarized most of the interview questions and answers involved in the interview of Java programmers in response to the knowledge points asked in the interview of Internet companies. I hope to help you review before the interview and find a good job, and save money. You learn in the time you search for information on the Internet.

The full version of Java interview questions address: JAVA back-end interview questions integration

1. What is an index?

The index is just like the directory of a dictionary. We usually go to the directory to find the key radicals or letters and then look it up. It is much faster than looking up the dictionary directly.

2. Why should there be an index?

However, when we use the mysql database, we also query like a dictionary with an index. It must be much faster.

2.1 Question:

1.Where is mysql data stored?

Disk

2. Querying data is slow, where is it usually stuck?

IO

3. Go to the disk to read the data, how much is used to read the data?

Disk read ahead

The principle of locality: data and programs have a tendency to cluster together, and the data that has been visited before is likely to be queried again, spatial locality, temporal locality

Disk read-ahead: When there is data interaction between memory and disk, generally there is a smallest logical unit, page. The size of the page is generally considered by the operating system, 4k or 8k, and when we are interacting with data, we can take an integer multiple of the page to read.
The innodb storage engine reads data every time, reading 16k

4. Where is the index stored?

Disk, when querying data, the index will be loaded into the memory first

5. What information is needed when storing the index? What field values ​​need to be stored?

key: the value stored in the actual data row

File address

offset: offset

6. What kind of data structure should be used to store data in this format?

key-values

Hash table, tree (binary tree, red-black tree, AVL tree, B tree, B+ tree)

7. The mysql index system is not stored in the format just mentioned, why?

OLAP: Online Analytical Processing----Analyze massive historical data and generate decision-making strategies----Data Warehouse—Hive

OLTP: online transaction processing-requires a very short time to return the corresponding results-database-relational database (mysql, oracle)

Three, mysql index data structure

3.1 Hash table:

The structure of HashMap array and linked list is not suitable for indexing reasons:

1. Hash conflicts will cause uneven data hashing and will generate a large number of linear queries, which is a waste of time

2. Range query is not supported. When performing range query, it must be traversed one by one

3. The requirements for memory space are relatively high


Advantages: if it is equivalent query, very fast


Is there a hash index in mysql?

1. The memory storage engine uses a hash index

2.innodb supports adaptive hash

create table test(id int primary key,name varchar(30))
engine='innodb/memory/myisam'
-- 5.1之后默认innodb

复制代码

3.2 Tree:

There are many data structures such as trees, and our common ones are: binary tree, BST, AVL, red-black tree, B tree, B+ tree
. The full version of the Java interview question address: JAVA back-end interview question integration

① Binary tree: disorderly insertion

This is the structure diagram of our tree, but the data insertion of the binary tree is unordered, which means that when you need to find it, you still have to traverse and find one by one.

②BST (Binary Search Tree): The inserted data is in order, the left subtree must be smaller than the root node, and the right subtree must be larger than the root node -------- use binary search to improve efficiency

If so, to query, can search by dichotomy, quickly narrow range to minimize the time complexity but if the inserted sequence is ascending or descending, then the shape of the tree becomes as follows:

At this time, the binary search tree will degenerate into a linked list, and the time complexity will become O(n)

③AVL: Balanced Binary Tree In order to solve the above problems, the shortest subtree and the longest subtree of the tree can be balanced by left or right rotation, and the difference in height cannot exceed 1

As we can see from the figure, when inserting sequentially, it will automatically rotate to achieve a balance but will compensate for the increase in query performance through the loss of insert performance. When we insert a lot of data and there are few queries, Since inserting data will rotate, it will also consume a lot of time. ④ The red-black tree (resolving as many read and write requests as possible) is also the behavior of rotating left and right to balance the tree and changing colors as long as the longest subtree does not exceed twice the shortest subtree Can

Query performance and insertion performance are approximately balanced, but with the insertion of data, the depth of the discovery tree will become deeper, and the depth of the tree will become deeper and deeper, which means that the more IO times, the efficiency of data reading will be affected.

⑤ B-tree In order to solve the problem of too much data insertion and deepening of the tree depth, we use B-tree to turn the original ordered binary tree into an ordered multi-tree

Example: If you want to query select * from table where id=14?

  1. The first step is to load the disk one into the memory and find that 14<16, look for the address disk 2
  2. The second step is to load disk two into the memory, find 14>11, look for the address disk 7
  3. The third step is to load disk seven into the memory, and find that 14=14, read data, take out data, and finish thinking: Is the B-tree perfect? Question 1: B-tree does not support fast search of range query. If we query a range of data and find a boundary of the range, we need to go back to the root node and traverse the search again. We need to traverse multiple times from the root node, even if we find the range Another boundary, query efficiency will be reduced. Question 2: If data stores row records, the size of the row will increase as the number of columns increases. At this time, the amount of data that can be stored in a page will be less, the tree will be correspondingly taller, and the number of disk IOs will be greater. Thinking 2: How many records can be stored in a three-level B-tree? Answer: Assuming that a data is 1k, the innodb storage engine reads 16k data at a time, and the three layers are 16 16 16 = 4096; but often in development, the data of a table is much larger than 4096, should we continue to add layers? Wouldn't it increase the IO

4. Why use B+ tree?

When actually storing table data, how to store it? key complete line of data transformation B + tree

The B+ tree has improved the B tree, putting all the data in the leaf nodes, and the leaf nodes are connected by two-way pointers, and the bottom leaf nodes form a two-way ordered linked list. For example: query range select * from table where id between 11 and 35?

  1. The first step is to load the disk one into the memory and find that 11<28, look for the address disk 2
  2. The second step is to load disk two into the memory, find 10>11>17, look for the address disk 5
  3. The third step is to load disk five into the memory, find that 11=11, read data
  4. The fourth step, continue to query to the right, read disk 5, and find that 35=35, read the data between 11-35, and it can be seen that this range query is much faster than the B-tree

Compare B-tree and B+-tree?

Only put data in leaf nodes

No data is stored in non-leaf nodes

Each node of the B+ tree contains more nodes. The advantage of this is that the height of the tree can be reduced, and the data range can be changed into multiple intervals. The more intervals, the faster the query

Question: Use int or varchar when creating an index?

Answer: It depends on the situation, but remember to make the key as small as possible

Five, the creation of the index

Before creating the index, Let me talk about the storage engine storage engine: represent different data in different forms disks everyone to observe a disk file mysql will find InnoDB: InnoDB data and indexes are stored in a file .idb MyISAM: The index of myisam is stored in the .MYI file, and the data is stored in the .MYD

5.1 clustered index and non-clustered index

Concept: Judging whether it is a clustered index depends on whether the data and index are in a file innodb:

  1. There can be only one clustered index, but there are many non-clustered indexes
  2. When inserting data into innodb, it must contain the key value of an index
  3. The key value of this index can be the primary key. If there is no primary key, then it is a unique key. If there is no unique key, then it is a self-generated 6-byte rowid.

myisam: non-clustered index

MySQL—innodb—B+tree The index and data are stored together, and the corresponding data can be read when the index is found

MySQL—myisam----B+tree The index and the address of the stored data are together, find the index to get the address value, and then find the corresponding data through the address

5.2 Back to the table

Next, I will create a case table to show you

CREATE TABLE user_test(
id INT PRIMARY KEY AUTO_INCREMENT,-- id为主键
uname VARCHAR(20) ,
age INT,
gender VARCHAR(10),
 KEY `idx_uname` (`uname`) -- 索引选择为名字
)ENGINE = INNODB;

INSERT INTO user_test VALUES(1,'张三',18,'男');
INSERT INTO user_test VALUES(NULL,'马冬梅',19,'女');
INSERT INTO user_test VALUES(NULL,'赵四',18,'男');
INSERT INTO user_test VALUES(NULL,'王老七',22,'男');
INSERT INTO user_test VALUES(NULL,'刘燕',16,'女');
INSERT INTO user_test VALUES(NULL,'万宝',26,'男');

复制代码
select * from user_test where uname = '张三';
-- 当我们表中有主键索引的时候,我们再去设置一个uname为索引,那么此时这条sql语句的查询过程应该如下:

复制代码

First query the id based on uname, and then query the row information based on the id. Two B+ trees are used to return to the table. After the key value of the clustered index is queried based on the common index, the cluster will be clustered based on the key value. Obtaining data from the index, we can find that such an operation is a waste of time, so in our daily operations, try to reduce the number of times back to the table

5.3 Covering Index

select id,uname from table where uname = '张三';
-- 根据uname 可以直接查询到id,uname两个列的值,直接返回即可
-- 不需要从聚簇索引查询任何数据,此时叫做索引覆盖

复制代码

5.4 Leftmost match

Before talking about the leftmost match, let's talk about a few noun primary keys (usually one column) --------> joint primary key (multiple columns) index --------> joint index ( (May contain multiple index columns)

-- 假设有一张表,有id,name,age,gender四个字段,id是主键,name,age是组合索引列
-- 组合索引使用的时候必须先匹配name,然后匹配age

select * from table where name = ? and age = ? ;-- 生效
select * from table where name = ?;-- 生效
select * from table where age = ? ;-- 不生效
select * from table where age = ? and name = ? ;-- 生效

--在mysql内部有优化器会调整对应的顺序

复制代码

5.5 Index Push Down

After mysql5.7, a feature supported by default is an example:

select * from table where name = ? and age = ? ;
-- mysql里的三层架构:
-- 客户端:JDBC
-- 服务端:server
-- 存储引擎:数据存储
在没有索引下推之前,根据name从存储引擎中获取符合规则的数据,在server层对age进行过滤
有索引下推之后,根据name、age两个条件从存储引擎中获取对应的数据

复制代码

Analysis: There is the benefit of index push-down. If we have 50 data, we will get 10 data through filtering. If there is no index push-down, we will first get 50 and then exclude to get 10. And with push-down, We will filter them into 10 directly in the storage engine
. The full version of the Java interview questions address: JAVA back-end interview questions integration