24 must-master database test questions

1. Why use an auto-increment column as the primary key
1. If we define a primary key (PRIMARY KEY), then InnoDB will choose the primary key as the clustered index.
If the primary key is not explicitly defined, InnoDB will choose the first unique index that does not contain NULL values ​​as the primary key index.
If there is no such CD ⼀ index, InnoDB will choose the built-in 6-byte ⻓ of ROWID as implied clustered index (ROWID as the primary key is writing ⽽ delivery ⾏ record of
growth, unlike the ROWID ORACLE ROWID as the bootable Use, is implicit).
2. The data record itself is stored in the leaf node of the main index (a B+Tree), which requires each piece
of data in the same leaf node (the size is a memory page or a disk page) Records are stored in the order of the primary key.
Therefore, whenever a new record is inserted, MySQL will insert it into the appropriate node and position according to its primary key. If the page reaches the load factor (InnoDB defaults to
15/16), then Open up a new page (node)
3. If the table uses an auto-incremented primary key, every time a new record is inserted, the record will be sequentially added to the subsequent position of the current index node. When it is full, it will be automatically added. movable open
causes a new ⻚
4, when the primary key by using a non-AUTO (card number, or if parts chest upwards student number, etc.), because each Insert the primary key value is approximated by a random, each time a new record is to be inserted The existing index
page is somewhere in the middle.
At this time, MySQL has to move the data in order to insert the new record into the appropriate position. Even the target page may have been written back to the disk and cleared from the cache. To
read back from the disk, this adds a lot of overhead.
At the same time, frequent moving and partitioning operations cause a lot of fragmentation, resulting in an index structure that is not compact enough, and then having to use OPTIMIZE TABLE to rebuild the table and optimize the
filling of the page. .
Ⅱ. Why the use of data index can improve efficiency?
The storage of data index is orderly.
In the case of order, querying a data through index does not require traversal of index records. In
extreme cases, the query efficiency of data index is high. Split query efficiency, approaching log2(N)
3. The difference between B+ tree index and hash index
B+ tree is a balanced multi-tree, the height difference from the root node to each leaf node does not exceed 1, on it and has the same hierarchy inter-node pointer interlinked, are ordered, below

Insert picture description here


hash index is taken uses one predetermined hash algorithm, the key is converted into the new hash value does not need to retrieve similar The B+ tree searches from the root node to the leaf nodes step by step, and only needs one hash algorithm, which is out of order, as shown in the following figure:

Insert picture description here


Fourth, the advantages of hash index:
equivalent query, hash index has Absolute advantage (the premise is: there is no repeated key value, if the key value is repeated, the efficiency of the hash index is very low, because there is the so-called
hash collision problem.)
V. Scenarios where the hash index is not applicable:
No Range query
is supported. Index is not supported to complete the sorting
. The leftmost prefix matching rule of the joint index is not supported.
Generally, the B+ tree index structure is suitable for most scenarios. It is more advantageous to use a hash index in the following scenarios:
in HEAP table, if the data stored repeat is very low (that is to say the base is zoomed), equivalent to the main query column data, there is no scope of inquiry, there is no sort of
climate, particularly suitable are recorded using a hash index , Such as this SQL:

#仅等值查询
select id, name from table where name='李明'; 

The B+ tree index is used by default in the commonly used InnoDB engine, which monitors the usage of the indexes on the table in real time.
If you think that building a hash index can improve query efficiency, you can automatically create a hash index in the "self-adaptive hash index buffer" in memory (the adaptive hash index is turned on by default in InnoDB
).
By observing the search pattern, MySQL will use the prefix of the index key to build a hash index. If most of a table is in the buffer pool, then creating a hash
index can speed up the equivalent query.
Note: Under certain workloads, the performance improvement brought by the hash index search is far greater than the additional monitoring index search situation and
the overhead of maintaining the hash table structure .
But sometimes, under high load conditions, the read/write lock added in the adaptive hash index will also bring competition, just like a highly concurrent join operation. The like operation and%
wildcard operation are also not applicable to the adaptive hash index. You may need to turn off the adaptive hash index.

6. The difference between B tree and B+ tree
1. B tree, each node stores key and data, all nodes form this tree, and the leaf node pointer is nul, and the leaf node does not contain any key information.

Insert picture description here

2. B+ tree, all leaf nodes contain information about all keywords, and pointers to records containing these keywords, and the leaf nodes
are linked in the order of the size of the keyword itself
All non-terminal nodes can be regarded as index parts, and the nodes only contain the largest (or smallest) keyword in the root node of the subtree. (The final node of the B tree also contains
valid information that needs to be found)

Insert picture description here

7. Why is it that B+␐B tree is more suitable for file index and database index of operating system in actual application?
1. B+ disk read and write costs are lower.
The internal node of B+ does not have a pointer to the specific information of the keyword, so its internal node is smaller than the B-tree.
If all the keywords of the same internal node are stored in the same disk block, the more keywords the disk block can hold. The
more keywords you need to find in the memory are read at once . Relatively speaking, the number of IO reads and writes is reduced.
2. The query efficiency of B±tree is more stable.
Since the non-endpoint is not the node that ultimately points to the content of the file, it is just the index of the keywords in the leaf node. Therefore, any keyword search must follow
a path from the root node to the leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency for each data.
Tips: Welcome to follow the WeChat official account: Java backend, which will be pushed on every technical blog.
⼋, MySQL joint index
1, a joint index is an index on two or more columns.
For joint index: Mysql uses the fields in the index from left to right. A query can use only part of the index, but only the leftmost part.
For example, the index is key index (a,b,c). It can support a, a,b, a,b,c 3 combinations for searching, but not supporting b,c for searching. When the leftmost field When it is a constant reference, the
index is valid.
2. By using additional columns in the index, you can narrow the scope of your search, but using an index with two columns is different from using two separate indexes.
The structure of the composite index is similar to that of the phone book. The first name is composed of last name and first name. The phone book is first sorted by last name, and then sorted by first name for people with the same last name.
If you know the last name, the phone book will be very useful; if you know the first and last name, the phone book is more useful, but if you only know the first name but not the last name, the phone book will be useless.
IX. Under what circumstances should you not build or build fewer indexes
? 1. Too few table records
2. Tables that are frequently inserted, deleted, or modified
3. Table fields with repeated data and evenly distributed. If a table has 100,000 records, There is a field A that has only two values ​​of T and F, and the distribution probability of each value is about 50%,
then indexing the field of this table A will generally not increase the query speed of the database.
4. Frequently query with the main field in one piece, but the main field index value is more than the table field
, what is a table partition?
Table partitioning refers to decomposing a table in the database into multiple smaller and easier-to-manage parts according to certain rules. Logically, there is only one table, but the bottom layer
is composed of multiple physical partitions.
The difference between table partitioning and sub-tables
Sub-table: refers to the decomposition of a table into multiple different tables through a set of rules. It is similar to using order records into multiple tables based on time.
The difference between a sub-table and a partition is that a partition logically has only one table, while a sub-table is to decompose one table into multiple tables.

⼗ What are the benefits of table partitioning?
1. Store more data. The data of the partition table can be distributed on different physical devices, effectively using multiple hardware devices. Compared with a single disk or file system
, it can store more data.
2. Optimize queries. When partition conditions are included in the where statement, you can scan only one or more partition tables to improve query efficiency; when sum and count statements are involved, you can also
process multiple partitions in parallel, and finally summarize the results.
3. The partition table is easier to maintain. For example: If you want to delete a large amount of data in batches, you can clear the entire partition.
4. Avoid some special bottlenecks, such as the exclusive access of a single index of InnoDB, ext3 asks for the inode lock competition of your system, etc.
Third, the limiting factors of the partition table
1. A table can only have 1024 partitions at most.
2. In MySQL 5.1, the partition expression must be an integer, or an expression that returns an integer. Support for non-integer expression partitioning is provided in MySQL 5.5.
3. If there are primary key or unique index columns in the partition field, then many primary key columns and unique index columns must be included. That is: the partition field either does not contain the primary key or
index column, or it contains all the primary key and index columns.
4. It is not possible to use foreign key constraints in partitioned tables.
5. MySQL partitioning is applicable to all data and indexes of a table. You can not only partition table data but not index partitions, nor only partition index but not table partitions. , It
is not possible to partition only part of the table data.
4. How to judge whether MySQL currently supports partitioning?
Command: show variables like'%partition%' Operation result:

Insert picture description here

The value of have_partintioning is YES, indicating that the partition is supported.
◯ What are the partition types supported by MySQL?
RANGE partition: This mode allows data to be divided into different ranges. For example, a table can be divided into several partitions
LIST partition by year : This mode allows the system to partition the data by the value of a predefined list. According to the value partition in List, the difference from RANGE is that the range
value of the range partition is continuous.
HASH partition: This mode allows to calculate the hash key of one or more columns of the table, and finally
partition the data area corresponding to different values ​​of this hash code . For example, you can create a table that partitions the primary key of the table.
KEY partition: An extension of the above Hash mode. This Hash Key is produced by the MySQL system.
Sixth, four isolation levels
Serializable (serialization): to avoid dirty reads, non-repeatable reads, and phantom reads.
Repeatable read: It can avoid dirty reads and non-repeatable reads.
Read committed: to avoid the occurrence of dirty reads.
Read uncommitted: The lowest level, no guarantee under any circumstances.
Seven. Regarding the MVCC
MySQL InnoDB storage engine, the implementation is based on a multi-version concurrency control protocol-MVCC (Multi-Version Concurrency Control)
Note: As opposed to MVCC, it is a lock-based concurrency control, Lock-Based Concurrency Control
The most important advantage of MVCC: read without locking, read and write without conflict. In OLTP applications with more reads and less writes, it is very important that reads and writes do not conflict, which greatly increases the concurrency of the system
. At this stage, almost all RDBMSs support MVCC.
LBCC: Lock-Based Concurrency Control, lock-based concurrency control
MVCC: Multi-Version Concurrency Control is
based on a multi-version concurrency control protocol. The purely lock-based concurrency mechanism has low concurrency. MVCC is an improvement on lock-based concurrency control, mainly because it improves
the concurrency in read operations .
◼. In MVCC concurrency control, read operations can be divided into two categories:
Snapshot read (snapshot read): It reads the recordable version (maybe the historical version), without locking (shared read locks locks) Do not add, so it will not block
the writing of other transactions )
Current read: Read the latest version of the record, and the record returned by the current read will be locked to ensure that other transactions will no longer be concurrent Modify this
record.
Nine. Advantages of level locking:
1. There are only a few locking conflicts when accessing different lines in many threads.
2. There are only a small amount of changes when rolling back.
3. You can lock a single line for a long time.
Disadvantages of
high-level and high-level locking: It takes up more memory than high-level or table-level locking.
When used in large parts of a table, lower-level or table-level locking is slow because you have to acquire more locks.
If you frequently perform GROUP BY operations on a large part of the data or must scan the entire table frequently, it is significantly slower than other locks.
With high-level lock, you can also easily adjust the application by supporting different types of locks, because the cost of the lock is smaller than that of the high-level lock.
↼, MySQL optimization
Open the query cache, optimize the query to
explain your select query, which can help you analyze the performance bottleneck of your query statement or table structure. The query result of EXPLAIN will also tell you how the
primary key of your index is used, how your data table is searched and sorted.
When only one piece of data is used, limit 1 is used, and the MySQL database engine will find one piece of data. Stop the search, instead of continuing to find the next
data that matches the record.
Index
the search field . Use ENUM instead of VARCHAR. If you have ⼀ fields, ⽐ such as "gender", "country", "⺠ family", "state" or "Department ⻔", you know that
the value of these fields is limited on it and fixed, then you should Using ENUM rather than VARCHAR
Prepared
Statements Prepared Statements are
very similar to stored procedures. They are a collection of SQL statements that run in the background. We can
get many benefits from using prepared statements, regardless of whether it is a performance issue or a security issue.
Prepared Statements can check some of your bound variables, which can protect your program from "SQL injection" attacks.
Vertical table splitting.
Choose the correct storage engine
. The difference between
key and index. The key is the database. The physical structure, which contains two layers of meaning and functions, one is constraint (emphasis on constraints and standardize the structural integrity of the database), and the other is index (an auxiliary query
Used). Comprising a primary key, unique key, foreign key, etc.
The index is the physical structure of the database, it is only an auxiliary queries, it will bear ⼀ of similar destination time recorded in another table space (InnoDB tablespace in mysql) to when it creates
the configuration memory . If the index is to be classified, it is divided into prefix index, full text index, etc.;
Third, what are the differences between MyISAM and InnoDB in Mysql?
Differences:
1. InnoDB supports transactions, MyISAM does not support
InnoDB, each SQL language is encapsulated as a transaction by default and submitted automatically, which will affect the speed, so it is best to put multiple SQL languages ​​in begin and commit In the
meantime, a transaction is formed;
2. InnoDB supports foreign keys, but MyISAM does not.
Converting an InnoDB table containing foreign keys to MYISAM will fail;
3. InnoDB is a clustered index, and the data file is tied to the index. It must have a primary key. The efficiency of the primary key index is very high.
However, the secondary index requires two queries, the primary key is first queried, and then the data is queried through the primary key. Therefore, the primary key should not be too large, because the primary key is too large, and other indexes will also
be large.
However, MyISAM is a non-clustered index, the data file is separated, and the index saves the pointer of the data file. The primary key index and secondary index are independent.
4. InnoDB does not save the specific number of tables, and a full table scan is required when executing select count(*) from table. MyISAM uses a variable to save the
count of the entire table. You only need to read the variable when executing the above statement, which is very fast;
5. Innodb does not support full indexing, but MyISAM supports full indexing. MyISAM must be high in query efficiency;
How to choose:
whether you want to support transactions, if you want to, please choose innodb, if you don’t need it, you can consider MyISAM;
if most of the tables in the table are only read queries, you can consider MyISAM, if both read and write are frequent, please use InnoDB
after the system Ben collapse, MyISAM more difficult to recover, can accept;
MySQL 5.5 Mysql version began Innodb has become the default engine (before a MyISAM), explain the advantage is destination time for everyone to see, if you do not know what Use
it , Then use InnoDB, at least not bad.
↗ Four. Precautions for database table creation
1. The rationality of field names and field preparations.
Eliminate fields that are not closely related;
field naming must have rules and corresponding meanings (don’t have some English words, some spellings, and similar Fields with unclear meanings like abc);
try not to use abbreviations for field naming (most abbreviations are not clear about the meaning of the field);
do not mix the size of the field (for readability, use underscores for multiple English words) Form connection);
do not use reserved words or keywords
for field names ; keep the consistency of field names and types;
carefully choose numeric types;
-leave a margin for text fields;
2. system special field processing and it is recommended to
add after completion Delete mark (such as operator, delete time);
build a version mechanism;
3. The rationality of the table structure configures
the processing of multi-type fields, that is, whether there are fields in the table that can be decomposed into more unique parts (for example: you can Divided into men and men);
the processing of multi-valued fields, the table can be divided into three tables, so that retrieval and sorting are more regulated, and the integrity of the data is guaranteed!
4, other recommendations
On the larger data fields, only attention immediately table storing anything to affect performance (e.g.: Introduction field);
Using varchar type instead of char, because dynamic allocation ⻓ degree varchar will, char specified ⻓ degree is fixed;
to The table creates a primary key. For tables without a primary key, it has a certain impact on the query and index definition; to
avoid the table field running as null, it is recommended to set the default value (for example: the int type set the default value to 0) on the index query, The efficiency is obvious; to
build indexes, it is best to build on only and non-empty fields. Building too many indexes will have a certain impact on later insertion and update (considering the actual situation to create);