Make a record of SQL optimization

Part1: Generate test data

Here is a SQL script that contains the content of creating a table and inserting data, and creating a test table to demonstrate the optimization of SQL

-- 创建表结构
DROP TABLE IF EXISTS index_test;
CREATE TABLE index_test (
	id BIGINT ( 20 ) PRIMARY KEY NOT NULL AUTO_INCREMENT,
	USER VARCHAR ( 16 ) DEFAULT NULL,
	psd VARCHAR ( 64 ) DEFAULT NULL 

) ENGINE = MyISAM DEFAULT CHARSET = utf8;
-- 创建存储过程
DELIMITER $$
DROP PROCEDURE IF EXISTS `insert_data`$$
CREATE PROCEDURE `insert_data`(IN tableName varchar(500),IN num INT)
BEGIN
  DECLARE n INT DEFAULT 1;/*定义一个变量,存储当前执行的次数*/
    WHILE n <= num DO
        SET @sqlStr = CONCAT("INSERT INTO ",tableName,"(USER,psd) VALUES(CONCAT(",n,",'用户'),password(",n,"))");
        PREPARE stmt FROM @sqlStr;
        EXECUTE stmt;
        set n=n+1;
  end while;
END $$
-- 执行插入数据
call insert_data('index_test',2000000);

The test environment is ready, and there is a table with two million pieces of data. Now begin to analyze and optimize SQL performance issues through a common SQL statement

Insert picture description here

After generating the data, remember to remove the id primary key, because the primary key has an index by default, we will use id for testing later

image-20210603164949315

Part2: Identify performance issues

2.1 Looking for slow-running SQL statements

Here is to analyze the performance problems based on the SQL I wrote, so the current SQL is the SQL to be optimized by default

SELECT
	t.id,
	t.USER,
	t.psd 
FROM
	index_test t 
WHERE
	t.id = 457365;

Run it first to see how long it takes to query in the current library

Insert picture description here

It can be seen from the above results that it took about 0.256 seconds to execute SQL in the system

2.2 Confirm inefficient queries

  • After discovering a query problem that may be inefficient, you should first confirm whether the query is slow to execute each time it is repeated, because it is necessary to rule out that the so-called inefficient query is not an individual phenomenon caused by other factors such as locking or system bottlenecks. .

2.2.1 Run the SQL statement and record the execution time

  • Use MySQL command line mode or some client tools such as Navicat Premium to run SQL repeatedly to verify
  • I ran it locally several times, and the time was almost 0.25 seconds (250 milliseconds). And for queries whose execution time is greater than 10 milliseconds, the query results returned by the MySQL command line client are sufficient.
  • PS: The method of repeated operation can only be used for select statements, because it will not modify the existing data. If the inefficient query statements are UPDATE or DELETE statements, you can simply rewrite them as select statements to complete the verification. For example, if delete from inventory where item_id = 1 during SQL query, you can modify the delete statement to select * from inventory where item_id = 1 to simulate the performance of the original statement.

2.2.2 Generate a query execution plan (Query Execution Plan, QEP)

When MySQL wants to execute a SQL query, it will first check the syntax of the SQL statement, and then construct a QEP, which determines the way MySQL obtains information from the underlying storage engine. If you want to view the QEP constructed by the MySQL query optimizer for the SQL statement, you only need to add the EXPLAIN keyword prefix to execute the select statement .

image-20210603164554474

Note: In most cases, EXPLAIN does not run the actual SQL statement; however, there are exceptions when the optimizer needs to execute part of the SQL statement to determine how to construct a QEP, such as using derived tables in the FROM clause , The word DERIVED displayed in the select_type column at this time.

If you don't know much about QEP, you can mainly look at the two columns of index and the number of rows affected .

  • The key column in the result of EXPLAIN can display the index used by the query statement. Any query statement that does not use an index can be considered as a SQL query that has not been tuned enough.
  • The number of affected rows is displayed in the rows column. This value can be used to estimate the amount of data that the query needs to read, which is directly related to the execution time required by the query.
  • The ALL value displayed in the type column of the result is also a sign of potential performance problems. Depending on the storage engine used, the number of rows affected may be an estimated value or an accurate value. Even if the number of rows affected is an estimated value (when using the InnoDB storage engine to manage table storage), under normal circumstances, we can also use this estimated value as a basis for the optimizer to judge.
Insert picture description here

From the EXPLAIN example I gave above, the key column does not show the index value. This SQL is a single-table select statement, which can be understood as scanning the entire table and looking for rows that meet the WHERE clause, and then sort the results. To achieve this, the value of the rows column can be regarded as an approximation.

Part3: Optimize the query

  • Identify a prerequisite for any optimization process when running a slow SQL query

3.1 Things that should not be done

When you have no clue or can't find any way to optimize, you may think of adding an index to the conditions of the where statement, such as

# 已有表结构创建索引语法
ALTER table 表名称 ADD INDEX 索引名称(列名称)

ALTER TABLE index_test ADD INDEX idIndex(id);
image-20210603164818042
  • Warning : Don't use it directly in the production environment without verification! ! Because there are many factors to consider when deciding whether to add a new index and deploy it. (Here I executed this statement to add an index) It took 6.8 seconds to complete this data definition language (DDL). During this period, all additional requests to add and modify data for the table are blocked due to the blocking operation during the Alter statement. According to the execution order of other data manipulation languages ​​(DML), the select statement will also be blocked and cannot be completed at this time. If the table is larger, an alter statement may take several hours or even days to complete! ! Another factor to consider is the performance overhead of DML statements when a table has multiple indexes.

3.2 Confirm optimization

After adding the index above, re-execute the query SQL to see the effect

Insert picture description here

The time is only 19 milliseconds, and the speed has increased by nearly 13 times

Then use EXPLAIN to check the SQL execution

Insert picture description here

Let’s go back and talk about why the primary key id should be removed after adding the test data. One is because we want to use a pure table structure and add an index without an index to compare the before and after effects. The other is, if the primary key index is retained, the assumption will be The SQL where query is adjusted to other fields, and then indexes are added to other fields to compare the effect. The efficiency may not necessarily decrease, but may also increase. This is also the previously mentioned new index cannot be added casually, it can be applied to the production environment after verification (I tested it here, keep the id primary key, add the index to the psd, and then the efficiency becomes lower after the query)

To compare the execution effect of EXPLAIN before and after using the id index, the MySQL optimizer selects the index specified by the key column value. It is estimated that the number of affected SQL execution has changed from 2 million to 1, which is a reduction of two million times.

3.3 The right way

Adding indexes to a table has many advantages, including performance optimization. Before deciding to add an index, there are generally two places to check, one is to verify the existing table structure, and then to confirm the size of the table.

Shortcut: execute the following sql to obtain the above information

Insert picture description here
Insert picture description here

From the return result of the above command, the current table structure includes an index of the id column. You can also get an approximate value of the table size by knowing the Data_length and Rows information through the second command.

3.4 Alternatives

  • The correct method of optimizing SQL includes understanding and verifying the purpose of this SQL statement and the SQL statement related to the table. By performing this analysis, you will find that the application code that executes the SQL statement already contains supplementary information to improve the query.

ows information to get an approximate value of the table size.

3.4 Alternatives

  • The correct method of optimizing SQL includes understanding and verifying the purpose of this SQL statement and the SQL statement related to the table. By performing this analysis, you will find that the application code that executes the SQL statement already contains supplementary information to improve the query.