MySQL advanced query interception analysis [Explain]

The slow log analysis process is as follows:
[1] Observe (run at least one day) to see the production of slow SQL.
[2] Turn on the slow query log and set the threshold, such as capturing SQL statements that exceed 5 seconds.
[3] explain + slow SQL analysis.
[4] show profile query the execution details and life cycle of slow SQL in MySQL.
[5] The operation and maintenance manager or DBA adjusts the database parameters.

One, query optimization


[1] Always small tables drive large tables (there is a connection problem), similar to Nested Loop. For example: when the data set of table B is smaller than that of table A, In is better than Exist. Because table B is executed first, table A is executed.

SELECT * FROM A WHERE ID IN (SELECT ID FROM B)

When the data set of table A is smaller than table B, Exists is better than In. Because table A is executed first, table B is executed.

SELECT * FROM A WHERE EXISTS (SELECT 1 FROM B WHERE B.ID = A.ID)

This grammar can be understood as putting the data of the main query into the subquery for conditional verification, and the result of the verification (TURE OR FLASE) is used to resolve whether the data result of the main query is retained.

[2] Order by keyword optimization: Try to use the Index method for sorting, which is efficient because MySQL sorts by scanning the index, and avoids using fileSort (file sorting) for sorting. Prerequisites for using index:
   ■ The order by statement uses the leftmost prefix rule of the index;
   ■ The combination of the where clause and the order by clause conditional column meets the leftmost column of the index;
[ filesort has two algorithms ] : If it is not on the index column, MySQL It is necessary to start two-way sorting and single-way sorting.
   ■   Two-way sorting: Before MySQL4.1, two-way sorting was used, which literally means to scan the disk twice, and finally get the data. Read the columns contained in the pointer and order by, sort them, then scan the list that has been sorted, and read the corresponding data output from the list again according to the sorted values ​​in the list. In a word, read the sort field from the disk, sort in the buffer, and read other fields from the disk.
   ■   Single-way sorting : read the columns required by the query from the disk, sort them on the buffer according to the order by column, and then scan the columns in the sort to output. It is fast and avoids reading data twice.
[Conclusions and extended questions]:The single-channel algorithm is generally better than the two-channel algorithm. But the single-channel algorithm has a problem: when the total size of the acquired data> the capacity of sort_buffer, it needs to get the sorting (multi-channel) multiple times, which is not as efficient as the two-channel sort.
[Optimization strategy] : ①. Increase the setting of the sort_buffer_size parameter. ②. Increase max_length_for_sort_data (because single-way sorting creates a fixed buffer for each row to be sorted, the maximum length of the varchar column is the value specified by max_length_for_sort_data, not the actual size of the sorted data) parameter setting. ③. Taboo use select * query when using order by.
[ Summary ] :

Two, slow query log


[1] What slow query log: MySQL slow query log is a log record provided by MySQL. It is used to record statements whose corresponding time exceeds the threshold in MySQL. Specifically, if the running time exceeds long_query_time worth SQL, it will be recorded in Slow query log. The default value is: 10s, usually set to 2s or 5s;
[2] By default: MySQL database does not open the slow query log, you need to manually open it. Of course, if it is not required for tuning, it is generally not recommended to enable it, because enabling the slow query log will more or less bring about a certain performance impact . The slow query log supports writing log records to a file.

show VARIABLES like '%slow_query_log%';

Turn on: set global slow_query_log = 1; Only valid for the current database, if you restart MySQL, it will be invalid. If you want to take effect permanently, you must set the configuration file my.cnf. Add under [mysqlid]:

slow_query_log=1;slow_query_log_file=/var/lib/mysql/xxx-slow.log

Check the set threshold time, and judge whether the query time is greater than the threshold time in MySQL, and record it if it is greater than the threshold time.

show VARIABLES like '%long_query_time%';

Set the threshold time. After modification, you need to reconnect or reopen the session to take effect.

set global long_query_time=2

You can use select sleep(4) to test. The contents of the log file are as follows:

Query how many slow query records the current system has

show global status like '%slow_queries %';

If you want to permanently set time under [mysqlid]:

long_query_time=3;log_output=FILE;

[3] MySQL analysis tool mysqldumpslow: View the help information of sql. Commonly used commands are as follows:
   ①, get the most 10 records of the returned record set: mysqldumpslow -sr -t 10 /var/lib/mysql/xxx.log
   ②, get the 10 most visited SQL: mysqldumpslow -sc -t 10 / var/lib/mysql/xxx.log
   ③、Get the top 10 query statements containing left joins sorted by time: mysqldumpslow -st -t 10 -g'left join' /var/lib/mysql/xxx.log
   ④、 It is recommended to use these commands in combination with | and more, otherwise the screen will burst: mysqldumpslow -sr -t 10 /var/lib/mysql/xxx.log | more

Three, Explain statement analysis


The explain keyword can simulate the optimizer to execute SQL query statements, so as to know how MySQL processes SQL statements. Analyze the performance bottleneck of the query statement or table structure. Execution statement: explain + SQL statement. The header information is as follows:|

1. ID parameters


The sequence number of the select query contains a set of numbers, indicating the order in which the select clause or operation table is executed in the query. Three cases:
[1] Same id: the execution order is from top to bottom;

explain select t2.* from t1,t2,t3 where t1.id = t2.id and t1.id = t3.id and t1.other_column = '';

[2] The id is different: if it is a subquery, the id serial number will increase, the larger the id, the higher the priority, and the earlier it will be executed;

explain select t2.* from t2 where id = (select id from t1 where id = (select t3.id from t3 where t3.other_column = ''));

[3] The same and different ids exist at the same time: If the ids are the same, they can be considered as a group and executed from top to bottom; in all groups, the larger the id, the higher the priority, the earlier the execution; [3] The same and different ids exist at the same time : If the id is the same, it can be considered as a group and executed from top to bottom; in all groups, the larger the id, the higher the priority, and the first to execute;

explain select t2.* from (select t3.id from t3 where t3.other_column = '') s1,t2 where s1.id = t2.id;

2. Select_type type


Mainly used to distinguish the complexity of ordinary queries, joint queries, sub-queries, etc.
[1] SIMPLE: simple select query, the query does not contain subqueries or UNION;
[2] PRIMARY: if the query contains any complex self-query, the outermost query is PRIMARY;


[3] SUBQUERY: in SELECT or WHERE contains subqueries;
[4] dERIVED: FROM subquery included in the list are marked as dERIVED (derivative) the MySQL recursively executing the sub-queries, the results into temporary table;


[5] UNION: if the second SELECT Appears after UNION, it is marked as UNION, if UNION is included in the sub-query of the FROM clause, the outer SELECT will be marked as DERIVED;
[6] UNION RESULT: SELECT that obtains the result from the UNION table;

3. table


Show that this row of data is about that table

4. type


From best to worst: system>const>eq_ref>ref>range>index>ALL , generally reach the rang level, preferably reach the ref level.
[1] system: The table has only one row of records (system table), which does not usually appear;
[2] const: It can be found once through the index, and const is used to compare primary and unique indexes. Because it only matches one row of data, it is fast;


[3] eq_ref: unique index scan, for each index key, there is only one record in the table that matches it. Often used for primary key or unique index scans. eg: CEO department;


[4] ref: non-unique index scan, return all rows matching a single value;


[5] rang: only retrieve rows in a given range, and use an index to select rows. Generally, queries such as between, <, >, in, etc. appear in the where statement. This range scan index is better than a full table scan, because it only needs to start at one point of the index and end at another point, without scanning all indexes;


[6] index: Full Index Scan, the difference between index and ALL is that the index type only traverses the index tree , Index files are usually smaller than data files. index is read from the index, and All is read from the hard disk;


[7] ALL: read from the disk

5. Possible_keys and key


possible_keys: Shows the indexes that may be applied to this table, one or more, if there are indexes on the query field, they will be listed, but they may not be actually used by the query.
key: The index actually used. If the value is NULL, the index is not used; if the covering index is used in the query, the index only appears in the key list.
[ Covering index ] : All the fields after select have indexes, which improves query efficiency, and the premise order and number must be consistent;
[ Understanding method one ] : The data column of select can be obtained from the index without reading There is no need for redundant data rows. MySQL can use the index to return the fields in the select list without having to read the data file again according to the index. In other words, the query column needs to be covered by the built index.
[ Understanding method two ] : Indexes are a way to find rows efficiently, but general databases can also use indexes to find data in a column, so it does not have to read the entire data row. After all, the index leaf nodes store the data they index; You can get the data you want by reading the index, so there is no need to read rows. An index contains (or covers) data that meets the query result is called a covering index.
[ Note when using a covering index ] : If you use a covering index, be sure to take only the required columns in the select list, not select *, because if all fields are indexed together, the index file will be too large and the query performance will decrease.

6, key_len


Indicates the number of bytes used in the index, and the length of the index can be found through this column. Without compromising accuracy, the shorter the length, the better. The value displayed by key_len is the maximum possible length of the index field, not the actual length, that is, key_len is actually calculated according to the table definition, not checked out in the table.

7, ref


The column showing the index is used, if possible, a constant. Those columns or constants are used to find the value on the index.

8, rows


According to table statistics and index selection, roughly estimate the number of rows to find the required records. It is often used to check during optimization. Use this value to compare with the actual number of rows returned. If there is a big difference, you need to tune.

9. Extra


Contains important information that is not suitable for display in other columns.
[1] Using fileSort: It means that MySQL will use an external index to sort the data instead of reading it according to the index in the table. The sorting operation that MySQL cannot use the index to complete is called "file sorting". (Appearing means bad)
[2] Using temporary: Use a temporary table to save intermediate results, MySQL uses a temporary table when sorting query results. Reused for sorting order by and grouping query group by.
[3] Using index: Indicates that the convering index is used in the corresponding select operation to avoid accessing the data rows of the table. The efficiency is good!


[4] using where, using index: The query column is covered by the index, and the where filter condition is one of the index columns but not the leading column of the index. In Extra, it is Using where; Using index, which means that it cannot be queried directly through index lookup. To the eligible data; the


query column is covered by the index, and the where filter condition is a range of the leading column of the index column, which also means that the eligible data cannot be queried directly through the index


[5] NULL (neither the Using index nor the There is no Using where Using index, and no using where): the query column is not covered by the index, and the where filter condition is the leading column of the index, which means that the index is used, but some fields are not covered by the index, and must be "back to the table". To achieve, it is not purely using the index, nor is it completely useless. The Extra is NULL (no information)


[6] Using where: Index search is used in the query condition. The column to be queried is not covered by the index. Where is the leading column of the non-indexed filter condition. In Extra, it is Using where. order_id is also an index.


The query column is not covered by the index, the where filter condition is non-indexed column, and the Extra is Using where. It means to filter the where condition by index or table scan. On the other hand, there is no available index search. Of course, the cost of index scan + table return and table scan should also be considered here. The type here is all, indicating that MySQL considers a full table scan to be a relatively low cost.


[7] Using index condition: the query column is not all in the index, and the where condition is a range of the leading column. The


query column is not completely covered by the index, and the query condition can be used in the index (for index search).


[8] Using join buffer : The connection cache is used.
[9] Impossible where: The where clause is always false and cannot be used to get any element.
[10] select tables optimized away: In the absence of the GROUP BY clause, optimize the MIN/MAX operation based on the index.
[11] distinct: optimize distinct operation. Stop looking for the same action when the first match is found.

Four, batch data script


The difference between a function and a stored procedure: a function has one return value, and a stored procedure can have 0 to n return values.

[1] Set the parameter log_bin_trust_function_creators (when the binary log is enabled, this variable will be enabled. It controls whether the creator of the stored function can be trusted and will not create a stored function that writes to the binary log and causes unsafe events. If set to 0 (default Value), users must not create or modify stored functions unless they have SUPER privileges other than the CREATE ROUTINE or ALTER ROUTINE privileges. Setting to 0 also forces the use of the DETERMINISTIC feature or the READS SQL DATA or NO SQL feature to declare the limits of the function. If the variable is variable Set to 1, MySQL will not enforce these restrictions on the creation of stored functions. This variable also applies to the creation of triggers) If an error is reported when creating a function: this function has none of DETERMINISTIC......

Since enabling slow log query also enables bin-log, you must specify a parameter for function. Check whether it is turned on: If it is not turned on, turn it on as follows. If you want to set up for a long time, configure it in the configuration file as the above configuration.

show variables like '%log_bin_trust_function_creators%';set global log_bin_trust_function_creators=1;

[2] The creation function ensures that each piece of data is different: ①, returns a random string:

DELIMITER $$CREATE FUNCTION rand_string(n INT) RETURNS(255)BEGIN DECLARE chars_str VARCHAR(100) DEFAULT 'QWERTYUIOPLKJHGFDSAZXCVBNMmnbvcxzasdfghjklpoiuytrewq'; DECLARE return_str DEFALUT ''; DECLARE i INT DEFALUT 0; WHILE i<n DO  SET return_str = CONCAT(return_str,SUBSTRING(chars_str,FLOOR(1+RAND()*52),1));  SET i=i+1; END WHILE;RETURN retrun_str;END $$
Delete function: drop function xxx;

②, create a stored procedure:

DELIMITER $$CREATE PROCEDURE insert_emp(IN start INT(10),IN max_num INT(10)) BEGIN DECALRE i INT DEFAULT 0; SET autocommit = 0; REPEAT SET i = i+1; INSERT INTO emp() VALUES(); UNIT i = max_num END REPEAT; COMMITEND $$

③, call the stored procedure

CALL insert_emp(100,10);

Five, Show Profile


It is MySQL that can be used to analyze the resource consumption of the statement execution in the current session. It can be used for the measurement of SQL tuning. The default is closed state, and save the last 15 run results.

[ Analysis steps ] :
   ■ Whether it is supported, check whether the current MySQL version supports it. show variables like'profiling' ;

Common query parameters of show profile : ① ALL: Display all overhead information.
② BLOCK IO: Display block IO overhead.
③ CONTEXT SWITCHES: context switching overhead.
④ CPU: Display CPU overhead information.
⑤ IPC: Display sending and receiving overhead information.
⑥ MEMORY: Display memory overhead information.
⑦ PAGE FAULTS: Display page fault cost information.
⑧ SOURCE: Display the overhead information related to Source_function, Source_file, and Source_line.
⑨ SWAPS: Display the overhead information of exchange times.
[ Conclusion ] : ① The query result of converting HEAP to MYISAM is too large, and the memory is insufficient and moved to the disk. ②, Creating tmp table Create a temporary table. ③ Copying to tmp table on disk Copy the temporary table in the memory to the disk. ④, locked.

6. Global query log : (Never enable this function in a production environment)


[1] Configuration enabled: in the my.cnf configuration file.

#开启general_log=1#记录日志文件路径general_log_file=/path/logfile#输出格式log_output=FILE

[2] Coding: Command: set global general_log=1; set global log_output='TABLE'; SQL statements written afterwards will be recorded with the general_log table in the MySQL library. View command: select * from mysql.general_log;