MySQL summary

table of Contents

Database overview

Why use a database/advantages of a database

Data is stored in memory

  • Advantages: fast access speed
  • Disadvantages: Data cannot be saved permanently

Data is saved in file

  • Advantages: permanent storage of data
  • Disadvantages: 1) The speed is slower than memory operations, and frequent IO operations. 2) Inconvenient to query data

Data is saved in the database

  • Data is saved permanently
  • Using SQL statements, queries are convenient and efficient.
  • Convenient data management

What are the three database paradigms

  • The first normal form: each column cannot be split.
  • Second paradigm: On the basis of the first paradigm, non-primary key columns are completely dependent on the primary key, and cannot be part of the primary key.
  • Third paradigm: On the basis of the second paradigm, non-primary key columns only depend on the primary key and do not depend on other non-primary keys.

When designing the database structure, try to comply with the three paradigms. If you do not comply, there must be sufficient reasons. Such as performance. In fact, we often compromise database design for performance.

type of data

What data types does MySQL have

  • Numerical type: Including TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, representing 1 byte, 2 byte, 3 byte, 4 byte, 8 byte integer respectively. Any integer type can be added with the UNSIGNED attribute, indicating that the data is unsigned, that is, a non-negative integer.
  • INT(11) represents an INT type with a length of 11
  • Floating point type: including FLOAT, DOUBLE, DECIMAL.
  • Character type: including VARCHAR, CHAR, TEXT, BLOB
  • VARCHAR is used to store variable-length character strings, it saves more space than fixed-length types: varchar(M), the maximum length cannot exceed M
  • CHAR is fixed-length, and sufficient space is allocated according to the defined string length:
  • Date type: YEAR DATE TIME DATETIME (reflects the local time zone when inserting) TIMESTAMP

What are super keys, candidate keys, primary keys, and foreign keys?

  • Super key: The set of attributes that can uniquely identify the tuple in the relationship is called the super key of the relationship mode. An attribute can be used as a super key, and a combination of multiple attributes can also be used as a super key. Super keys include candidate keys and primary keys.
  • Candidate key: It is the smallest super key, that is, the super key without redundant elements.
  • Primary key: A combination of data columns or attributes in a database table that uniquely and completely identify storage data objects. A data column can only have one primary key, and the value of the primary key cannot be missing, that is, it cannot be a null value (Null).
  • Foreign key: The primary key of another table that exists in one table is called the foreign key of this table.


The difference between MySQL storage engine MyISAM and InnoDB?

Now the default engine isInnoDB

the differenceMyISAMInnodb
Storage structureEach table is stored in three files: frm-table definition, MYD (MYData)-data file, MYI (MYIndex)-index fileIndexes are part of the table space, all tables are stored in the same data file (may be multiple files, or independent table space files), the size of the InnoDB table is only limited by the size of the operating system file, generally 2GB
storageMyISAM can be compressed and the storage space is smallInnoDB tables require more memory and storage, it will establish its dedicated buffer pool in main memory for high-speed buffering of data and indexes
Portability, backup and recoveryBecause MyISAM data is stored in the form of files, it will be very convenient in cross-platform data transfer. Operate on a table separately during backup and restoreThe free solution can be to copy data files, back up binlog, or use mysqldump, which is relatively painful when the amount of data reaches dozens of gigabytes.
Record storage orderSave in order of record insertionInsert in order by the size of the primary key
Foreign keynot supportstand by
Affairsnot supportstand by
Lock supportTable-level lockingRow-level locking, table-level locking, small locking strength, high concurrency
SELECTMyISAM is better
select count(*)Myisam is faster, because myisam maintains a counter internally, which can be called directly.
How the index is implementedB+ tree index, myisam is a heap tableB+ tree index, Innodb is an index-organized table
Clustered indexIs notYes
Hash indexnot supportstand by
Full-text indexstand byMySQL5.6 began to support (Innodb1.2.x)

What are the characteristics of the InnoDB engine?

  • Insert buffer
  • Double write
  • Adaptive hash index (ahi)
  • Read ahead


What is the index?

The index on the InnoDB data table is an integral part of the table space, and MyISAM is a special file. They contain reference pointers to all records in the data table.

Index is a data structure. The database index is a sorted data structure in the database management system to help quickly query and update the data in the database table. The realization of the index usually uses the B tree and its variant B+ tree. More generally speaking, an index is equivalent to a directory. In order to facilitate the search of the contents of the book, a catalog is formed by indexing the contents.

Indexes can greatly improve the retrieval speed of MySQL. For example, when we look up a dictionary, we have retrieved pinyin, radicals, strokes, etc., and then find the corresponding dictionary page number, so that we can open the page number of the dictionary to know the information of all the values ​​of a certain key we want to search. .

What are the advantages and disadvantages of indexes?

Advantages of indexing

  • Can greatly speed up the data retrieval speed, which is also the main reason for creating an index.
  • By using the index, you can use the optimization hider in the query process to improve the performance of the system.

Disadvantages of indexing

  • In terms of time: creating and maintaining indexes takes time. Specifically, when adding, deleting, and modifying data in the table, the index must also be dynamically maintained, which will reduce the execution efficiency of addition/change/deletion;
  • Space: The index needs to take up physical space.

What types of indexes does MySQL have?

1. Divided from the data structure:

  • BTree index (B-Tree or B+Tree index)
  • Hash index

2. From the application level:

  • Primary key index: The index generated on the primary key column (data columns are not allowed to be duplicated, and are not allowed to be NULL. A table can only have one primary key.)
  • Unique index: The value of the index column must be unique, but null values ​​are allowed
  • Ordinary index: that is, an index contains only a single column, and a table can have multiple single-column indexes
  • Composite index: multiple column values ​​form an index, which is specially used for combined search, and its efficiency is greater than index merging

3. According to the physical order of the data in the logical (index) order relationship of the key value:

  • Clustered index: InnoDB's clustered index actually stores the current key value and the entire row of data in the index
  • Nonclustered index

Talk about the underlying implementation of the index?

The Hash index is
implemented based on a hash table. Only queries that exactly match all the columns of the index are valid. For each row of data, the storage engine will calculate a hash code for all index columns, and the Hash index will hash all the columns. The code is stored in the index, and a pointer to each data row is stored in the index table.

Insert picture description here

B-Tree Index The
data is distributed among the nodes of the B-Tree . The

Insert picture description here

B+Tree index
is an improved version of B-Tree, and it is also the storage structure used by the database index.

  • All leaf nodes contain information about all keywords, and pointers to records containing these keywords, and the leaf nodes themselves are linked in order of the size of the keywords from small to large.
  • All non-leaf nodes can be regarded as index parts, and the nodes only contain the largest (or smallest) key in its subtree.
  • In the B+ tree, the insertion and deletion of data objects are only performed on the leaf nodes.
  • B+ tree has 2 head pointers, one is the root node of the tree, and the other is the leaf node with the smallest key code
  • Compared with B-Tree, only two nodes need to be searched for range search and traversed. While B-Tree needs to get all nodes, B+Tree is more efficient in comparison.
Insert picture description here

Why does the index structure use B+Tree by default instead of B-Tree, Hash, Binary Tree, Red-Black Tree?


  • B tree is only suitable for random retrieval, while B+ tree supports both random retrieval and sequential retrieval ;
  • The B+ tree space utilization is higher, which can reduce the number of I/Os, and the disk read and write costs are lower . Generally speaking, the index itself is also very large, and it is impossible to store all of it in the memory, so the index is often stored on the disk in the form of an index file. In this case, disk I/O consumption will occur during the index lookup process. The internal node of the B+ tree does not have a pointer to the specific information of the keyword. It is only used as an index. Its internal node is smaller than the B tree. The number of keywords in the node that can be accommodated by the disk block is more, and it is read into the memory at one time. The more keywords can be searched, and the number of IO reads and writes is reduced. The number of IO reads and writes is the biggest factor affecting index retrieval efficiency;
  • The query efficiency of the B+ tree is more stable . The B-tree search may end at non-leaf nodes. The closer to the root node, the shorter the search time. The existence of the record can be determined as long as the keyword is found. Its performance is equivalent to a binary search in the full set of keywords. In the B+ tree, the sequential search is more obvious. When searching randomly, any keyword search must take a path from the root node to the leaf node. The search path length of all keywords is the same, which leads to the query efficiency of each keyword. quite.
  • While B-tree improves disk IO performance, it does not solve the problem of low efficiency of element traversal . The leaf nodes of the B+ tree are connected together in the order of pointers, and the entire tree can be traversed as long as the leaf nodes are traversed. Moreover, range-based queries in the database are very frequent, and B-trees do not support such operations.
  • When adding or deleting files (nodes), the B+ tree is more efficient . Because the leaf nodes of the B+ tree contain all keywords and are stored in an ordered linked list structure, this can improve the efficiency of addition and deletion.
  • B+ tree does not need to go back to the table to query data when it meets the clustered index and the covering index


  • Hash index is suitable for equivalent query, but it cannot perform range query .
  • Hash index does not support the use of index for sorting
  • Hash index does not support fuzzy query and leftmost prefix matching of multi-column index . The principle is also because the hash function is unpredictable. There is no correlation between the indexes of AAAA and AAAAB
  • Hash index needs to return to the table to query data at any time
  • If there are a large number of duplicate key values, the efficiency of the hash index will be very low , because of the hash collision problem

Binary tree :

  • The height of the tree is uneven and cannot be self-balanced. The search efficiency is related to the data (the height of the tree), and the IO cost is high.

Red-black tree:

  • The height of the tree increases as the amount of data increases, and the IO cost is high.

Tell me about clustered index and non-clustered index?

In InnoDB, the leaf node of the primary key index B+ Tree stores the entire row of data, which is also called a clustered index, that is, the data storage and the index are put together, and the data is found when the index is found.

The leaf nodes of B+ Tree that are not clustered indexes (non-primary key keys) only store the value of the primary key. Need to return to the table query

The difference between clustered index and non-clustered index:

  • The leaf node of the non-clustered index does not store the data in the table, but stores the primary key (row number) corresponding to the column
  • For InnoDB, if we want to find data, we also need to look up in the clustered index based on the primary key. This process of looking up data based on the clustered index is called back table. The first index is generally sequential IO, and the operation of returning to the table belongs to random IO. The more times we need to return to the table, that is, the more random IO times, the more we tend to use a full table scan.
  • Under normal circumstances, the primary key index (clustered index) query will only be checked once, while the non-primary key index (non-clustered index) needs to query the table many times. Of course, if it is a covering index, just check it once
Note: Both the primary key index and the secondary index of MyISAM are non-clustered indexes, while the primary key index of InnoDB is a clustered index, and the secondary index is a non-clustered index. Our own indexes are basically non-clustered indexes.

Will non-clustered indexes always return to table queries?

Not necessarily. This relates to whether all the fields required by the query statement hit the index. If all the fields hit the index, then there is no need to perform a back-to-table query. An index contains (covers) all the values ​​of the fields that need to be queried, and is called a "covering index".

For a simple example, suppose we have established an index on the scores of the student table, then when select score from student where score > 90the query is performed, the score information is already included on the leaf node of the index, and the query back to the table will not be performed again.

What is a joint index? Why do we need to pay attention to the order in the joint index?

MySQL can use multiple fields to create an index at the same time, called a joint index. In the joint index, if you want to hit the index, you need to use them one by one in the order of the fields when creating the index, otherwise the index cannot be hit.

The specific reasons are:

When MySQL uses an index, the index needs to be ordered. Assuming that a joint index of "name, age, school" is now established, the order of the index is: first sort by name, if the name is the same, sort by age, if the value of age is also equal , Then sort by school.

When querying, the index is only strictly ordered by name at this time, so you must first use the name field for equivalent query, and then for the matched column, it is strictly ordered according to the age field, and you can use the age field. Do index search, and so on.

Therefore, you should pay attention to the order of the index columns when creating a joint index. Generally, put the columns with frequent query requirements or high field selectivity first. In addition, individual adjustments can be made according to special cases of query or table structure.

Tell me about MySQL's leftmost prefix principle?

The principle of the leftmost prefix is ​​the leftmost first. When creating a multi-column index, according to business needs, the most frequently used column in the where clause is placed on the leftmost side. mysql will always match to the right until it encounters a range query (>, <, between, like) and stops matching. For example, a = 1 and b = 2 and c > 3 and d = 4if an index of (a, b, c, d) order is established, d will not use the index, if it is established (a, b, d, c) index can be used, the order of a, b, d can be adjusted arbitrarily.

= And in can be out of order. For example a = 1 and b = 2 and c = 3, the (a,b,c) index can be created in any order, and the mysql query optimizer will help you optimize it into a form that the index can recognize

Tell me about the prefix index?

Because the fields we index may be very long, this takes up memory space and is not conducive to maintenance. So we thought that if only the public part of the front of a very long field was used as an index, it would have a super double effect. However, we need to note that order by does not support prefix index.

The process is:

  • First calculate the selectivity of the complete column: select count(distinct col_1)/count(1) from table_1
  • Then calculate the selectivity of different prefix lengths: select count(distinct left(col_1,4))/count(1) from table_1
  • After finding the optimal length, create a prefix index: create index idx_front on table_1 (col_1(4))

Do you understand index pushdowns?

MySQL 5.6 introduced index push-down optimization. It is turned on by default, use SET optimizer_switch = ‘index_condition_pushdown=off’; it can be turned off.
With index push down optimization, you can reduce the number of back to the table

Only valid for secondary indexes in InnoDB

The examples and explanations given
in the official documents are as follows: There is a secondary index in people_table, (zipcode,lastname,firstname)and the query is SELECT * FROM people WHERE zipcode=’95054′ AND lastname LIKE ‘%etrunia%’ AND address LIKE ‘%Main Street%’;

  • If the index push technology is not used, MySQL will zipcode=’95054’query the corresponding data from the storage engine and return it to the MySQL server, and then the MySQL server lastname LIKE ‘%etrunia%’ and address LIKE ‘%Main Street%’will determine whether the data meets the conditions based on it.
  • If the index push-down technology is used, MYSQL will first return the matching zipcode=’95054’index, and then lastname LIKE ‘%etrunia%’ and address LIKE ‘%Main Street%’judge whether the index meets the conditions based on it. If the conditions are met, the corresponding data is located according to the index, if not, it is directly rejected.

How to check whether the MySQL statement uses an index?

Use execution plan. Add a explaincommand before the statement to see if the index is used when the statement is executed

Why is the official recommendation to use self-growing primary keys as indexes?

Combining the characteristics of B+Tree, the auto-incrementing primary key is continuous, and the page splitting is minimized during the insertion process. Even if the page splitting is required, only a small part of it will be split. And can reduce the movement of data, each insertion is inserted to the end. In short, it is to reduce the frequency of splitting and moving.

How to create an index?

There are three ways to create an index.

1. Create an index when executing CREATE TABLE

CREATE TABLE user_index2 (
 id INT auto_increment PRIMARY KEY,
 first_name VARCHAR (16),
 last_name VARCHAR (16),
 id_card VARCHAR (18),
 information text,
 KEY name (first_name, last_name),
 FULLTEXT KEY (information),
 UNIQUE KEY (id_card)

2. Use the ALTER TABLE command to increase the index.

ALTER TABLE table_name ADD INDEX index_name (column_list);

ALTER TABLE is used to create ordinary indexes, UNIQUE indexes or PRIMARY KEY indexes.

Among them, table_name is the name of the table whose index is to be added, and column_list indicates which columns to index. When there are multiple columns, each column is separated by a comma.

The index name index_name can be named by yourself. By default, MySQL will assign a name based on the first index column. In addition, ALTER TABLE allows multiple tables to be changed in a single statement, so multiple indexes can be created at the same time.

3. Use the CREATE INDEX command to create.

CREATE INDEX index_name ON table_name (column_list);

What should I pay attention to when creating an index?

  • Non-empty fields should be designated as NOT NULL unless you want to store NULL. In mysql, columns with null values ​​are difficult to query optimization because they make indexes, index statistics, and comparison operations more complicated. You should replace the null value with 0, a special value, or an empty string;
  • The column of the field with large discrete value (the degree of difference between the values ​​of the variable) is placed in front of the joint index, and the difference value of the field can be viewed through the count() function. The larger the return value, the more the unique value of the field. The degree of dispersion is high;
  • The smaller the index field, the better : the data storage of the database is based on the page as the unit. The more data stored in one page, the more data obtained by one IO operation, the higher the efficiency.

What are the principles of indexing?

  • Leftmost prefix matching principle
  • Try to choose a column with high discrimination as the index. The formula for discrimination is count(distinct col)/count(*)
  • Index columns cannot participate in calculations, keep the columns "clean"
  • Try to expand the index as much as possible, do not create a new index

Can index query definitely improve query performance?

Generally, querying data through indexes is faster than a full table scan. But we must also pay attention to its cost.

The index needs space for storage and regular maintenance. Whenever a record is added or decreased in the table or the index column is modified, the index itself will also be modified. This means that INSERT, DELETE, and UPDATE for each record will cost 4 or 5 more disk I/Os.
Because indexes require additional storage space and processing, those unnecessary indexes will slow down the query response time. Using index query may not necessarily improve query performance, index range query (INDEX RANGE SCAN) is suitable for two situations:

  • Based on a range of retrieval, the general query returns a result set that is less than 30% of the number of records in the table.
  • Retrieval based on non-unique index.

Under what circumstances do not take the index (index failure)?

  1. Use != or <> to cause index failure
  2. Index failure caused by inconsistent types
  3. Index failure caused by function,
    such as: SELECT * FROMuser WHERE DATE(create_time) = '2020-09-03';;
  4. Index failure caused by operator: SELECT * FROMuser WHERE age - 1 = 20;
    If you perform (+, -, *, /,!) on the column, then the index will not be taken.
  5. Index failure caused by OR: SELECT * FROMUser WHEREname = '张三' OR height = '175';
    OR causes the index to be in a specific situation. Not all ORs invalidate the index. If the OR is connected to the same field, the index will not fail, otherwise the index will fail.
  6. Index failure caused by fuzzy search: When the SELECT * FROMuser WHEREname LIKE '%冰';
    % is placed before the matching field, it will not be indexed, but will be indexed when it is placed behind.
  7. NOT IN, NOT EXISTS cause index failure


What is a database transaction?

A transaction is an indivisible sequence of database operations and the basic unit of database concurrency control. The result of its execution must change the database from a consistent state to another consistent state.

A transaction is a logical set of operations, either all of them are executed or none of them are executed.

What are the four characteristics of things (ACID)?

  • Atomicity: Transaction is the smallest unit of execution and division is not allowed. The atomicity of the transaction ensures that the actions are either all completed or completely ineffective;
  • Consistency: Before and after the transaction is executed, the data remains consistent, and the results of multiple transactions reading the same data are the same;
  • Isolation: When concurrently accessing the database, a user's transaction is not interfered by other transactions, and the database is independent among concurrent transactions;
  • Persistence: After a transaction is committed. Its changes to the data in the database are persistent, and even if the database fails, it should not have any impact on it.

What are the concurrency problems of transactions? How to solve

  1. Dirty read: One transaction reads the update data that has not yet been committed by another transaction, resulting in different results for multiple queries
  2. Non-repeatable read: One transaction reads the update data that has been submitted by another transaction, resulting in inconsistent results for multiple queries
  3. Phantom read: A transaction reads the insert data that has been submitted by another transaction, resulting in different results for multiple queries

Solve concurrency problems by setting the isolation level

What is the isolation level of a transaction? What is the default isolation level of MySQL?

Isolation levelIsolation descriptionDirty readNon-repeatablePhantom reading

The default REPEATABLE_READ used by Mysql can be read repeatedly

The realization of the transaction isolation mechanism is based on the lock mechanism and concurrent scheduling. Among them, concurrent scheduling uses MVVC (Multi-Version Concurrency Control), which supports features such as concurrent consistent read and rollback by saving the modified old version information.

InnoDB storage engine in distributed transactions under conditions generally used ** SERIALIZABLE (serialization) ** isolation level.


What are the locks of the database?

According to the granularity of the lock

Innodb supports row-level locks and table-level locks
MyISAM supports table-level locks

  • Row-level lock Row-level lock is the finest type of lock in Mysql, which means that only the row of the current operation is locked. Row-level locks can greatly reduce conflicts in database operations. The locking granularity is the smallest, but the locking overhead is also the largest. Row-level locks are divided into shared locks and exclusive locks. Innodb supports row-level locks
  • Features: high overhead and slow locking; deadlock will occur; the smallest locking granularity, the lowest probability of lock conflicts, and the highest degree of concurrency.
  • Table-level locks Table-level locks are the lock with the largest locking granularity in MySQL, which means to lock the entire table of the current operation. It is simple to implement, consumes less resources, and is supported by most MySQL engines. The most commonly used MYISAM and INNODB both support table-level locking. Table-level locks are divided into table shared read locks (shared locks) and table exclusive write locks (exclusive locks).
  • Features: low overhead, fast locking; no deadlock; large locking granularity, the highest probability of issuing lock conflicts, and the lowest concurrency.

According to the type of lock

  • Shared lock: also known as read lock. When the user wants to read data, add a shared lock to the data. Multiple shared locks can be added at the same time.
  • Exclusive lock: Also called write lock. When the user wants to write data, an exclusive lock is added to the data. Only one exclusive lock can be added, and it is mutually exclusive with other exclusive locks and shared locks.

How is the row lock of the InnoDB engine implemented in MySQL?

select * from tab_with_index where id = 1 for update;

for update can complete row lock locking according to conditions, and ID is a column with an index key, otherwise the table lock

InnoDB is based on the index to complete the row lock

Tell me about the lock algorithm of the InnoDB storage engine?

  • Record lock: the lock on a single row record
  • Gap lock: gap lock, lock a range, excluding the record itself
  • Next-key lock: record+gap locks a range, including the record itself

For specific applications

  • Innodb uses next-key lock for row queries
  • Next-locking keying in order to solve the Phantom Problem
  • When the query index contains unique attributes, downgrade next-key lock to record key
  • The purpose of Gap lock design is to prevent multiple transactions from inserting records into the same range, which will lead to phantom reading problems.
  • There are two ways to explicitly close the gap lock: (except for foreign key constraints and uniqueness checks, only record lock is used in other cases) A. Set the transaction isolation level to RC B. Set the parameter innodb_locks_unsafe_for_binlog to 1

What is a deadlock in the database? How to deal with it?

Deadlock refers to a phenomenon in which two or more transactions occupy each other on the same resource and request to lock each other's resources, leading to a vicious circle.

Common ways to solve deadlock

  • If different programs will access multiple tables concurrently, try to agree to access the tables in the same order, which can greatly reduce the chance of deadlock.
  • In the same transaction, try to lock all the required resources at once to reduce the probability of deadlock;
  • For business parts that are very prone to deadlocks, you can try to upgrade the lock granularity and reduce the probability of deadlocks through table-level locking;

If the business is not handled well, you can use distributed transaction locks or use optimistic locks

What are the optimistic and pessimistic locking of the database? How did it happen?

Optimistic concurrency control (optimistic locking) and pessimistic concurrency control (pessimistic locking) are the main technical methods used for concurrency control.

  • Pessimistic lock: Assuming that concurrency conflicts will occur, shield all operations that may violate data integrity. When the data is queried, the transaction is locked until the transaction is committed. Implementation method: use the lock mechanism in the database
  • Optimistic locking: Assuming that no concurrency conflicts will occur, only check whether data integrity is violated when the operation is submitted. When modifying data, the transaction is locked, and the lock is performed by version. Realization method: generally use the version number mechanism or CAS algorithm to achieve.

Use scenarios of two locks

  • Optimistic locking is suitable for the case of relatively few writes (read more scenarios), that is, when conflicts really rarely occur, this can save the overhead of locks and increase the overall throughput of the system
  • Use pessimistic lock in more writing scenarios

SQL optimization

How to locate and optimize the performance of SQL statements? Has the created index been used? Or how can we know the reason why this statement is running slowly?

MySQL provides explaincommands to view the execution plan of the statement

The execution plan is to display the details of the execution of the SQL statement by the database engine, including whether to use an index, what index to use, and related information about the index used, etc.

For query statements, the most important optimization method is to use indexes

Insert picture description here

type (very important, you can see if there is an index) Access type: at least reach the range level, the requirement is the ref level, if it can be consts, the best.

  • There is at most one matching row (primary key or unique index) in a consts single table, and the data can be read during the optimization phase.
  • ref refers to the use of a normal index (normal index)
  • range Search for the range of the index

possible_keys The index that may be used. Note that it may not be used. If there is an index on the field involved in the query, the index will be listed.

key shows the index actually used by MySQL in the query. If the index is not used, it is displayed as NULL.

The life cycle of SQL?

  1. The application server establishes a connection with the database server
  2. The database process gets the request sql
  3. Analyze and generate the execution plan, execute
  4. Read data to memory and perform logic processing
  5. Send the result to the client through the connection in step one
  6. Close the connection and release resources

How to optimize large table data query

  1. Optimization mode, sql statement + index;
  2. Add cache, memcached, redis;
  3. Master-slave replication, separation of read and write;
  4. Vertical split, according to the degree of coupling of your modules, divide a large system into multiple small systems, that is, distributed systems;
  5. Horizontal segmentation. For tables with a large amount of data, this step is the most troublesome and can test the technical level. A reasonable sharding key must be selected. In order to have good query efficiency, the table structure must also be changed, and a certain amount of redundancy must be applied. Also change, try to bring the sharding key in sql, locate the data to the limited table to check, instead of scanning all the tables;

How to deal with oversized paging?

Use delayed associations or sub-queries to optimize super-multi-page scenarios.

Note: MySQL does not skip the offset rows, but takes the offset+N rows, then returns the offset rows before giving up, and returns N rows. When the offset is particularly large, the efficiency is very low, or it controls the total number of pages returned. , Or perform SQL rewriting on the number of pages that exceed a certain threshold.

Positive example: first quickly locate the id segment that needs to be obtained, and then associate:

select * from table where age > 20 limit 1000000,10
select * from table where id in (select id from table where age > 20 limit 1000000,10)

How to optimize data access during query?

Too much access to data leads to reduced query performance

  • Determine if the application is retrieving more data than needed, possibly too many rows or columns
  • Confirm whether the MySQL server is analyzing a large number of unnecessary data rows

Avoid making the following SQL statement errors

  • Query unnecessary data. Solution: use limit to solve
  • Multi-table association returns all columns. Solution: Specify the column name
  • Always return all columns. Solution: Avoid using SELECT *
  • Repeat the query for the same data. Solution: data can be cached, and the cache can be read directly next time
  • Are you scanning for additional records. Solution: Use explain for analysis. If you find that the query needs to scan a large amount of data, but only a few rows are returned,
    you can use the following techniques to optimize:
  • Use index covering scan to put all the columns in the index, so that the storage engine can return the result without going back to the table to get the corresponding row.
  • Change the structure of the database and table, modify the data table paradigm
  • Rewrite the SQL statement so that the optimizer can execute the query in a better way.

Database optimization

Why optimize

  • The throughput bottleneck of the system often appears in the access speed of the database
  • As the application runs, there will be more and more data in the database, and the processing time will be correspondingly slower
  • The data is stored on the disk, and the speed of reading and writing cannot be compared with the memory

Optimization principle: reduce system bottlenecks, reduce resource occupation, and increase system response speed.

Talk about the optimization of the database structure

  • Decompose a table with many fields into multiple tables : If the frequency of use of some fields is very low, you can separate these fields to form a new table
  • Increase the intermediate table : For tables that require frequent joint queries, an intermediate table can be established to improve query efficiency
  • Add redundant fields : Reasonable adding redundant fields can improve query speed

What will he do if the MySQL database cpu soars to 500%?

When the cpu soars to 500%, first use the operating system command top command to observe whether it is caused by mysqld occupancy. If not, find out the high occupancy process and perform related processing.

If it is caused by mysqld, show processlist and check the status of the session running inside to see if there is SQL running that consumes resources. Find out the SQL that consumes high, and see if the execution plan is accurate, if the index is missing, or if it is caused by too much data.

Generally speaking, these threads must be killed (while observing whether the cpu usage rate drops), and after corresponding adjustments (such as adding indexes, changing sql, and changing memory parameters), run these SQL again.

It is also possible that each SQL consumes not much resources, but suddenly, a large number of sessions are connected to cause the cpu to soar. In this case, you need to analyze with the application why the number of connections will increase sharply, and then make corresponding adjustments. For example, limit the number of connections, etc.

How to optimize large tables? There are nearly tens of millions of data in a table, CRUD is slow, how to optimize?

When the number of records in a single MySQL table is too large, the CRUD performance of the database will decrease significantly. Some common optimization measures are as follows:

Upgrade hardware, upgrade network, read and write separation, index optimization

Master-slave replication, read/write separation: The classic database splitting scheme, the main library is responsible for writing, and the slave library is responsible for reading;

Cache: Use MySQL cache. In addition, for heavyweight and less updated data, you can consider using application-level cache such as Redis;

Sub-database and sub-table: Split the original independent database into several databases, and split the large data table into several data tables, so that the data volume of a single database and a single data table is reduced, so as to achieve the purpose of improving database performance.

The principle and process of MySQL master-slave replication/read-write separation?

Master-slave replication: The DDL and DML operations in the master database are transferred to the slave database through the binary log (BINLOG), and then these logs are re-executed (redo); thus, the data in the slave database is consistent with the master database.

The role of master-slave replication

  • If there is a problem with the master database, you can switch to the slave database.
  • Read and write separation at the database level can be performed.
  • You can perform daily backups on the slave database.

Problems solved by MySQL master-slave replication

  • Data distribution: Start or stop replication at will, and distribute data backups in different geographical locations
  • Load balancing: reduce the pressure on a single server
  • High availability and failover: help applications avoid single points of failure
  • Upgrade test: You can use a higher version of MySQL as a slave library

Basic principle process

  • Main thread: binlog thread-record all the statements that change the database data and put them into the binlog on the master;
  • Slave thread: io thread-after using start slave, it is responsible for pulling the binlog content from the master and putting it into its own relay log;
  • From the thread: SQL execution thread-execute the statement in the relay log;
Insert picture description here

Binary log: Binary log of the main database

Relay log: Relay log from the server

  • The first step: the main database writes the operation record serially to the binlog file before each transaction update data is completed.
  • Step 2: Start an I/O thread from the database. This thread opens an ordinary connection on the master. The main job is binlog dump process. If the read progress has kept up with the master, it enters the sleep state and waits for the master to generate new events. The ultimate goal of the I/O thread is to write these events to the relay log.
  • The third step: SQL Thread will read the relay log and execute the SQL events in the log sequentially, so as to be consistent with the data in the main database.

How does the sub-database sub-table do? What is the problem with sub-table sub-database? Is middleware useful? Do you know their principle?

A single database is too large. At this time, we need to see whether there is more data due to more tables, or more data in a single table.

  • If it is because there are too many tables and too much data, use vertical segmentation and split into different databases according to the business.
  • If it is because the data volume of a single table is too large, then use horizontal segmentation, that is, divide the data of the table into multiple tables according to a certain rule, or even multiple tables on multiple databases

Split vertically:

  • Vertical table splitting : divide a table into multiple tables according to fields, and each table stores some of the fields.
  • Vertical sub-library : Vertical sub-library is aimed at splitting different businesses in a system, such as one library for user, one library for product Producer, and one library for order. After segmentation, put it on multiple servers instead of one server

Horizontal split

  • Level table : For a single table with a huge amount of data (such as an order table), divide it into multiple tables according to certain rules (Range, Hash modulo, etc.). But these tables are still in the same database, so database-level database operations still have IO bottlenecks
  • Horizontal sub-database : split the data of a single table into multiple servers, each server has a corresponding database and table, but the data set in the table is different. Horizontal sub-database sub-table can effectively alleviate the performance bottleneck and pressure of single machine and single database, and break through the bottleneck of IO, number of connections, hardware resources, etc.
  • Rule:
    - the Range : for example based on ID, a table from 0 to 10,000, 10,001 to 20,000 a table
    - the Hash : take for example ID, hash modulo assigned to different databases
    - geographical area
    - time : historical data to a Put in another library

Problems caused by sub-database and sub-table

  • Transaction support : after sub-database and sub-table, it becomes a distributed transaction
  • Cross-database join : The common way to solve this problem is to implement it in two queries. Find the id of the associated data in the result set of the first query, and initiate a second request based on these id to get the associated data
  • Cross-node count, order by, group by, and aggregation function issues : don't merge the results on the application side after getting the results on each node
  • Data migration, capacity planning, expansion and other issues
  • ID issues will no longer rely on the database's own primary key generation mechanism
  • Cross-node sorting problem : the data needs to be sorted and returned in different shard nodes, and the result sets returned by different shards are summarized and sorted again, and finally returned to the user

Middleware: such as MyCat

  • MyCat mainly intercepts SQL , then goes through certain rules of fragmentation analysis, routing analysis, read-write separation analysis, cache analysis, etc., and then sends SQL to the back-end real data block, and the returned results are processed appropriately Return to the client