Common Interview Questions in the "Workplace Interview" Database (1)-Index

The frequency of the database appearing on the jd of various positions in the Internet company is quite high. For this reason, Mr. Sloth spent some time summarizing some of the more popular database interview questions, hoping to help everyone get an offer~

1. Why use data indexing can improve work efficiency?

The index is arranged in a pre-arranged order, so that efficient algorithms such as binary search can be applied when searching. Normal order search, complexity (O(n), binary search complexity (log2n)). When n is very large, the efficiency difference between the two is extremely different.

for example:

The table contains one million data, and you need to find data for a specific id. In the case of continuous search, an average of 500,000 data is required. And using the dichotomy, you can find it up to twenty times. The efficiency difference between the two is 25,000 times!

When one or some fields need to be frequently used as query conditions, when the table data is large, creating an index can significantly improve the query speed, because the full table scan can be changed to an index scan.

When there is no index, the full table scan will scan all the records one by one, and the index scan can not directly locate until a qualified one is found.

Regardless of whether the data table has an index, first find the required data in the SGA data buffer, when there is no data in the data buffer, the server process will read the disk.

When there is no index, directly read the disk block storing the table data, read the data buffer, and then look for the required data.

When you have an index, first read the index table, find the physical address of the desired data directly through the index table, and then read the data into the data buffer.

2. The difference between B+ tree index and hash index

The tree is a balanced multi-forked tree. The height difference from the root node to each leaf node is within 1, and there are pointers between nodes at the same level, which are ordered, as shown in the following figure:

Common database interview questions, data index, B+ tree index and hash index, advantages and disadvantages of Hash index, usage scenarios of My ISAM and In no DB

The hash index uses a certain hash algorithm to convert the key value into a new hash value. It does not need to be retrieved from the root node to the leaf node one by one like a B+ tree. Only one hash algorithm is required, which is disordered, as shown in the following figure. :

Common database interview questions, data index, B+ tree index and hash index, advantages and disadvantages of Hash index, usage scenarios of My ISAM and In no DB

3. Advantages of Hash Index

Since the index itself only stores the corresponding hash value, the structure of the index is very compact, which also makes the hash value lookup very fast.

4. Hash index restrictions

Hash index is only available under certain circumstances:

  • Be sure to read the line.

The hash index only contains the hash value and the row pointer, and does not store the field value, so the value in the index cannot be used to avoid reading the row. However, the speed of accessing memory rows is very fast, so in most cases, the impact on performance is not obvious.

  • Cannot be used for sorting operations.

Hash index data is not stored in the order of index values, so it cannot be used for sorting.

  • Cannot match a lookup with partially indexed columns.

Hash index also does not support partial index column matching search, because Hash index always uses the content of the entire index column to calculate the Hash value. For example, to create a hash index on a data column (A, B), if a query has only one data column A, then it cannot be used.

  • Only equivalent search is supported.

Hash index only supports equivalent comparison queries, including =, IN(), <=> (note: different operations <> and <=>).

Any range query is also not supported, such as WHEREprice>100.

  • There is a hash conflict.

Unless there are a lot of hash collisions, access to hash index data is very fast (different index column values ​​have the same hash value). When a hash conflict occurs, the storage engine must traverse all row pointers in the linked list and compare row by row until all qualified rows are found.

At the same time, in the case of frequent hash collisions, the cost of certain index maintenance operations is high. For example, if a hash index is established for a column with very low selectivity (hash conflicts are high), when deleting a row from the table, the storage engine needs to traverse each row in the linked list, find and delete the reference to the corresponding row , The more conflicts, the greater the cost.

5. In Mysql, what is the difference between MyISAM and InnoDB?

  • InnoDB supports transactions, but MyISAM does not. This is an important reason why MySQL changed the default storage engine of MyISAM to InnoDB.
  • InnoDB supports foreign keys, but MyISAM does not. The conversion of InnoDB tables to MYISAM containing foreign keys failed.
  • InnoDB is a clustered index, MyISAM is a non-clustered index. The cluster index files are stored on the leaf nodes of the primary key index, so InnoDB must have a primary key, so that the primary key index is very efficient. However, the secondary index requires two queries, first querying the primary key, and then querying the data through the primary key. Therefore, the primary key cannot be too large, because it is too large, and other indexes will be too large.
  • But MyISAM is a group index for the index, the data file is separated, and the index saves the pointer of the data file.
  • The primary keyword index and the secondary index are independent.
  • InnoDB does not save a specific number of rows for the table, execute selectcount (*); a full table scan is required.
  • Moreover, MyISAM has only one variable to store the number of rows in the entire table, and only needs to read this variable when executing the above statement, which is very fast.
  • InnoDB's smallest locking granularity is row locking, and MyISAM's smallest locking granularity is table locking. The update statement locks the entire table, causing other queries and updates to be blocked, thereby limiting concurrent access.
  • This is an important reason why MySQL changed the default storage engine of MyISAM to InnoDB.

6. How to choose (MyISAM or InnoDB)?

  • Whether to support transactions, if you want to choose InnoDB, consider it without MyISAM.
  • If the vast majority of the tables are only read queries, consider MyISAM. If there are read and write functions, then it is best to use InnoDB.
  • After the system crashes, MyISAM is difficult to recover, can it be accepted, if not, choose InnoDB.
  • Innodb has become Mysql's default engine (formerly MyISAM) when MySQL 5.5 started, and its advantages are obvious to all.
  • If you don't know what storage engine to use, use InnoDB, at least not bad.

To learn more about database knowledge, click the full text link:

In addition, scan the QR code to follow the "Slazy School" public account, which will regularly publish some database dry goods.