MySQL basic interview questions (interview record in June 2021)

Article Directory


Storage engine

The difference between MyISAM and InnoDB

MyISAM is MySQL's default database engine (before version 5.5). Although the performance is excellent and it provides a large number of features, including full-text indexing, compression, spatial functions, etc., MyISAM does not support transactions and row-level locks, and the biggest flaw is that it cannot be safely recovered after a crash . However, after version 5.5, MySQL introduced InnoDB (transactional database engine). After MySQL version 5.5, the default storage engine is InnoDB. Most of the time we use the InnoDB storage engine, but in some cases it is also appropriate to use MyISAM, such as read-intensive situations. (If you don't mind MyISAM crash recovery issues).

Comparison of the two:

InnoDB supports transactions, but MyISAM does not.

InnoDB supports foreign keys, but MyISAM does not.

InnoDB is a clustered index, data files and indexes are bundled together, there must be a primary key, and the efficiency of the primary key index is high. MyISAM is a non-clustered index, the data file is separated, the index saves the pointer of the data file, the primary key index and the auxiliary index are independent.

Innodb does not support full-text indexing, while MyISAM supports full-text indexing. MyISAM has higher query efficiency.

Innodb does not save the specific number of rows in the table 1, MyISAM uses a variable to save the number of rows in the entire table.

InnoDB supports row-level locking and table-level locking, the default is row-level locking, MyISAM uses table-level locking.


MySQL index data structures used mainly BTree index and hash indexes . For a hash index, the underlying data structure is a hash table, so when most of the requirements are for single record query, you can choose a hash index, which has the fastest query performance; for most other scenarios, it is recommended to choose a BTree index.

MySQL's BTree index uses the B+Tree in the B tree, but the implementation of the two main storage engines is different.

  1. MyISAM: The data field of the B+Tree leaf node stores the address of the data record. When the search index, in accordance with the first B + Tree search index search algorithm, if the specified key is present, the value of its data field is removed, then the value of the address data field of the data record read response, which is non-clustered Index .
  2. InnoDB: The data file itself is an index file. Compared with MyISAM, the index file and the data file are separated. The table data file itself is an index interface organized by B+Tree. The data field of the leaf node of the tree saves the complete data record. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index. This is the clustered index ( clustered index ).

The realization principle of transaction

Transactions are implemented based on redo log files and rollback logs.

Every time a transaction is committed, all logs of the transaction must be written into the redo log file for persistence. Database 1 can ensure the atomicity and durability of the transaction through the redo log file.

Whenever there is a modification transaction, an undo log is also generated. If a rollback is required, logical operations are performed according to the reverse statement of 1 undo log, such as inserting a record and deleting a record. The undo log mainly realizes the consistency of the database.

What problems do concurrent transactions bring?

In a typical application, multiple transactions are executed concurrently1, often operating the same data to complete their respective tasks, which will cause several 1 problems:

  1. Dirty Read: When a transaction is accessing data and modifies the data, and this modification has not been committed to the database, then when another transaction 1 also accesses the data, and then uses the data, because This data is data that has not yet been committed, so the data read by another transaction is dirty data. Operations based on dirty data may be incorrect.
  2. Lost to modify: refers to when a data is read in a transaction, another transaction also accesses the data, then after the data is modified in the first transaction, the second transaction also modifies the data , So that the result of the modification in the first transaction is lost, so it is called a lost modification. For example, transaction 1 reads A=20 in a table, transaction 2 also reads A=20, transaction 1 modifies A to 1, transaction 2 modifies A to 100, and the final result is A=100, and the data modified by transaction 1 is lost.
  3. Unrepeatable read: Refers to a transaction that reads the same data multiple times . Before the transaction is over, another transaction also accesses the data, then between the two readings of the data in the first transaction , Due to the modification of the second transaction, the value of the data read twice in the first transaction may be different, so that the data of 1 read twice in a transaction is different.
  4. Phantom read: It occurs when a transaction reads a few rows of data , and then another concurrent transaction inserts some data. In subsequent queries, the first transaction will find that there are more records that do not exist. .

The difference between non-repeatable reading and phantom reading

The focus of non-repeatable reading is modification. For example, reading a record multiple times finds that the value of some of the columns is modified. The focus of phantom reading is adding or deleting. For example, reading a record multiple times finds that the number of records increases or decreases.

Transaction isolation level

The SQL standard defines four isolation levels:

  1. READ-UNCOMMITTED (read uncommitted): The lowest isolation level, allowing reading of uncommitted data, which may cause dirty reads, phantom reads, and non-repeatable reads.
  2. READ-COMMITTED): Allows to read data that has been committed by concurrent transactions, which can prevent dirty reads, but phantom reads or non-repeatable reads may still occur.
  3. REPEATABLE-READ (repeatable read): The results of multiple reads of the same field are consistent, unless the data is modified by the transaction itself, which can prevent dirty reads and non-repeatable reads, but phantom reads may still occur. Is MySQL's default isolation level
  4. SERIALIZABLE (serializable): The highest isolation level, completes the isolation level subject to ACID, all transactions are executed one by one, so that there is no interference between transactions, which means that dirty reads, phantom reads and non-repeatable can be prevented read.
Isolation levelDirty readNon-repeatablePhantom reading

Locks used by MyISAM and InnoDB storage engines

  • MyISAM uses table-level locking.
  • InnoDB supports row-level locking and table-level locking. The default is row-level locking.

Comparison of table-level locks and row-level locks

  • Table-level lock: The lock with the largest locking granularity in MySQL. It locks the entire table of the current operation. It is simple to implement, consumes less resources, locks quickly, and does not cause deadlocks. It has the largest locking granularity and triggers lock conflicts. The probability of 1 is the highest, and the degree of concurrency is the lowest.
  • Row-level lock: MySQL locks a lock with the smallest granularity, which only locks for the current operation. Row-level locking can greatly reduce conflicts in database operations. Its locking granularity is the smallest, and the concurrency is the highest, but the cost of locking is also the largest. The locking is slow and deadlocks will occur.

InnoDB storage engine lock algorithm

  1. Record lock: the lock on a single row record
  2. Gap lock: gap lock, lock a range, excluding the record itself
  3. Next-key lock: record+gap locks a range, including the record itself

What is the difference between varchar and char in MySQL

Char is a fixed-length field. If the space of char(10) is applied, no matter how much content is actually stored, the field occupies 10 characters, and varchar is variable length, which means that the application is only the maximum length, which is occupied The space is the actual character + 1, the last character stores how much space is used.

In terms of retrieval efficiency, char> varchar, so in use, if you determine the length of the value of a field, you can use char, otherwise you should use varchar as much as possible. For example, to store the user's MD5 encrypted password, you should use char.

The difference between B+ tree index, hash index, and B tree

  • B-tree: Both leaf nodes and non-leaf nodes store data, and the data structure is an ordered array + balanced multi-tree.
  • B+ tree: Only store data in leaf nodes, and the data structure is an ordered array linked list + balanced multi-branch tree. The key index efficiency of the B+ tree index is relatively average, and the fluctuation range is not as large as that of the B tree. In the case of a large number of duplicate key values, the hash index efficiency is also extremely low, because of the hash collision problem. As long as the B+ tree traverses the leaf nodes, the entire tree can be traversed. Moreover, range-based queries in the database are very frequent, and B-trees do not support such operations (or the efficiency is too low).
  • Hash index: Hash index can only satisfy = and <=> equivalent queries, and cannot use range queries. The hash index is to use a certain hash algorithm to convert the key value into a new hash value. It does not need to look up from the root node to the leaf node level by level like a B+ tree when searching. It only needs one hash algorithm to locate immediately To the corresponding location, the speed is very fast, but the Hash index can not avoid the table scan at any time.

don't drop, delete, truncate

Speed ​​drop>truncate>delete

  • drop: directly delete everything about the table (data, structure, constraints), without logging, it is a DDL operation.
  • truncate: Delete all data in the table (the self-increment id starts from 1 when inserting again), and this operation will not be logged. So it is faster, for DDL operations. Only table can be deleted.
  • Each time a row is deleted from the table during the execution of the delete statement, a log needs to be recorded, which is relatively slow.

MySQL executes the query process

  1. The client sends a request to the MySQL connector through a TCP connection, and the connector will perform permission verification and connection resource allocation for the request.
  2. Check the cache, when judging whether the cache is hit, MySQL will not parse the query statement, but directly use the SQL statement and other original information sent from the client, so any difference in characters, such as spaces, comments, will cause the cache Of misses.
  3. Syntax analysis, in this step, judge whether the SQL is written incorrectly, check whether the table and data column exist, and analyze whether there is ambiguity in the alias.
  4. Optimize, check whether to use the index and generate an execution plan.
  5. Hand it over to the executor, save the data to the result set, and gradually cache the data into the query cache, and finally return the result set to the client.

What situations need to create an index

  1. The primary key automatically creates a unique index.
  2. A field frequently used as a query condition.
  3. Query the fields associated with other tables in the query, and establish an index for the foreign key relationship.
  4. Single key/combined index selection problem, high concurrency tends to create a composite index.
  5. The sorted fields in the query, and the sorted fields are accessed through index to greatly improve the sorting speed.

What situations do not need to create an index

  1. There are too few records in the table.
  2. Frequently add, delete and modify tables.
  3. Table fields with repeated and evenly distributed data should only be indexed for the most frequently queried and most frequently sorted data (if a data type contains too much repeated data, indexing does not make much sense).
  4. Frequently updated fields are not suitable for index creation (will increase the burden of IO).
  5. Fields that are not used in the where condition are not indexed.

What kinds of locks are there in MySQL

According to the division of data locks, it can be divided into row-level locks, table-level locks, page-level locks, and gap locks.

Row-level lock

Row-level locks are the finest type of locks in MySQL, which means that only the rows of the current operation are locked. Row-level locks can greatly reduce conflicts in database operations, with the smallest locking granularity, but the largest locking overhead. Row-level locks are divided into shared locks and exclusive locks.

InnoDB has three row-level (all exclusive locks) locking algorithms:

  1. Record Lock (record lock): the metadata lock, a single row record is locked, which is what we think of as a row lock every day.
  2. Gap Lock (gap lock): Gap lock locks a range, but does not include the record itself. Its lock granularity is larger than that of the entire row of the record lock. It locks multiple rows in a certain range, including Data that does not exist at all. The purpose of the gap lock is to prevent the two current reads of the same transaction from being phantom read, and also to prevent other transactions from adding data in the gap.
  3. Next-Key Lock: It is a combination of record lock and gap lock. It locks a range and locks the record itself. This method is used for row queries, mainly to solve the problem of phantom reading. Next- Key Lock is the default lock of InnoDB.

Table-level lock

Table-level lock is the lock with the largest locking granularity in MySQL, which means to lock the entire table of the current operation. It is simple to implement, consumes less resources, and is supported by most MySQL engines. The most commonly used MyISAM and InnoDB both support table-level locks. Table-level locks are divided into table shared read locks (shared locks) and table exclusive locks (exclusive locks).

Page-level lock

Page-level locks are a kind of lock in MySQL whose locking granularity is between row-level locks and table-level locks. Table-level locks are fast, but there are many conflicts, and row-level locks have few conflicts, but the speed is slow. Page-level locks are a compromise. Strategy to lock an adjacent set of records at a time.

According to the lock sharing strategy, it is divided into: shared lock (read lock), exclusive lock (write lock), intention shared lock, intention exclusive lock.

  1. Read locks (shared locks, shared locks, S locks): For the same piece of data, multiple read operations can be performed at the same time without affecting each other.
  2. Write lock (exclusive lock, Exclusive Locks, X lock): Before the current write operation is completed, it will block other write locks and read locks.
  3. Intention Shared Lock (IS lock, Intention Shared Lock): When the transaction is ready to add an S lock on a certain record, it needs to add an IS lock at the table level first.
  4. Intention Exclusive Lock (IX lock, Intention Exclusive Lock): When a transaction prepares to add an X lock on a record, it needs to add an IX lock at the table level first.

From the locking strategy, it is divided into optimistic locking and pessimistic locking.

Optimistic lock: It is considered that the same data will not be modified during concurrent operations. It is implemented through a program and generally uses a version number or a timestamp.

Pessimistic lock: Contrary to optimistic lock, pessimistic lock pessimistically believes that there will be problems with concurrent operations without locking. Even if there is no modification, it will be considered modified. Therefore, for concurrent operations of the same data, pessimistic locks take the form of locking, and pessimistic locks are divided into table-level locks and row-level locks.

Leftmost matching principle

The leftmost matching principle means that MySQL will follow the leftmost matching prefix matching principle when creating a joint index, that is, the leftmost first. When retrieving data, the matching will start from the leftmost side of the joint index. The bottom layer of the index is a B+ tree, so the bottom layer of the joint index should also be a B+ tree, but the B+ tree node of the joint index 1 stores multiple values ​​separated by commas.

For example, we have a student table, and we have 编号+班级established a joint index based on the college index_major_class. This index is composed of two fields: college number (major) and class (class). He sorts according to major (college number) first, and then sorts according to class (class), if there are fields after the index, continue and so on.


If the where condition of our query is only passed in the class, it will not be able to go to the joint index, but if only the college number is passed, it is possible to go to the joint index (the possible reason is because of the MySQL execution plan and the actual query The implementation process is not so consistent. If you have a small amount of data, you can completely traverse the query, so that it is faster, and you don't need to go to the index).