SQL optimization (for example, Oracle can refer to Mysql)

本篇博客主要针对Mysql5.X版本
If the content of this blog is incomplete, please see the official websitehttps://dev.mysql.com/doc/refman/8.0/en/optimization.html

1. Reasons for SQL optimization

Low performance, too long execution time, too long waiting time, poor SQL statement (connection query), index failure, unreasonable server parameter settings (buffer, number of threads)

Second, the analysis of SQL statements

Usually we write a query SQL statement, such as the following statement:

select dinstinct  ..from  ..join ..on ..where ..group by ...having ..order by ..limit ..

The actual parsing sequence is as follows:

from .. on.. join ..where ..group by ....having ...select dinstinct ..order by limit ...

Three, SQL optimization engine

We can query which mysql engines are available through the following command:

#支持哪些引擎?
show engines ;
#查看当前使用的引擎
show variables like '%storage_engine%' ;

We mainly analyze two engines: InnoDB (default), MyISAM

InnoDB (default): transaction priority (suitable for high concurrent operations; row lock)
MyISAM: performance priority (table lock)

According to business needs, we can choose different engines. The following is the method of specifying the engine of the database object, which is specified when creating the table:

	create table tb(
		id int(4) auto_increment ,
		name varchar(5),
		dept varchar(5) ,
		primary key(id)		
	)ENGINE=MyISAM AUTO_INCREMENT=1
	 DEFAULT CHARSET=utf8   ;

Four, SQL optimized index

主要就是在优化索引

1. Description of the index

Index: The index
equivalent to the book's directory
is a data structure that helps MYSQL to efficiently obtain data. Indexes are data structures (trees: B-trees (default), Hash trees...)
. The disadvantages of indexes:
1. The index itself is very large and can be stored in memory/hard disk (usually hard disk)
2. Indexes are not applicable in all situations: a. Small amount of data b. Frequently updated fields c. Rarely used fields
3. Indexes will reduce the efficiency of additions, deletions, and modifications (additions, deletions, and modifications).
Index advantages:
1 Improve query efficiency (reduce IO usage)
2. Reduce CPU usage ( …Order by age desc, because the B-tree index itself is a good sorting structure, so it can be used directly when sorting)

2. Index classification

Primary key index: cannot be repeated. id cannot be null.
Unique index: cannot be repeated. id can be a null
single-valued index: single column, age; a table can have multiple single-valued indexes, name.
Composite index: an index composed of multiple columns (equivalent to a secondary directory: z: zhao) (name,age) (a,b,c,d,...,n)

3. The way to create an index

注意:如果一个字段是primary key,则改字段默认就是 主键索引

Method 1: create index type index name on table (field)

#单值:
create index   dept_index on  tb(dept);
#唯一:
create unique index  name_index on tb(name) ;
#复合索引
create index dept_name_index on tb(dept,name);

Method 2: alter table table name index type index name (field)

#单值:
alter table tb add index dept_index(dept) ;
#唯一:
alter table tb add unique index name_index(name);
#复合索引
alter table tb add index dept_name_index(dept,name);

4. Delete the index

drop index 索引名 on 表名 ;
drop index name_index on tb ;

5. Query the index

#查询索引:
show index from 表名 ;
show index from 表名 \G

Five, SQL statement performance analysis

Analyze the execution plan of SQL: explain, which can simulate the execution of SQL statements by the SQL optimizer, so that developers can know the status of the SQL they have written

 explain +SQL查询语句
Insert picture description here


Property description:

id: number
select_type: query type
table: table
type: type
possible_keys: index used for prediction
key: index actually used
key_len: length of index actually used
ref: reference between tables
rows: amount of data queried by index
Extra : Extra information

1. id, table and the query order of the table when querying multiple tables

For example, we create the following three tables

create table course
(
cid int(3),
cname varchar(20),
tid int(3)
);
create table teacher
(
tid int(3),
tname varchar(20),
tcid int(3)
);

create table teacherCard
(
tcid int(3),
tcdesc varchar(200)
);

insert into course values(1,'java',1);
insert into course values(2,'html',1);
insert into course values(3,'sql',2);
insert into course values(4,'web',3);

insert into teacher values(1,'tz',1);
insert into teacher values(2,'tw',2);
insert into teacher values(3,'tl',3);

insert into teacherCard values(2,'twdesc') ;
insert into teacherCard values(3,'tldesc') ;

At this time, there are 4 data in the course table, 3 data in the teacher table, and 2 data in the teachercard table.

(1) id: the id value is the same (the smaller the amount of data, the first to query)
#多表关联
explain select t.* from teacher t,course c,teacherCard tc where t.tid=c.tid and t.tcid=tc.tcid and (c.cid=2 or tc.tcid=3 );
Insert picture description here


In fact, this situation is to look at the table in the order from top to bottom!

(2) The id value is different (the larger the id value, the more priority the query (essential: in the nested subquery, first check the inner layer and then the outer layer))
#子查询
explain select tc.tcdesc from teacherCard tc where tc.tcid = 
(select t.tcid from teacher t where  t.tid =  
	(select c.tid from course c where c.cname = 'sql')
);
Insert picture description here


According to the above information, we know that the id of the course table is the largest, so it is executed first.

(3) The id values ​​are the same, but there are differences (the larger the id value, the more priority; the id value is the same, executed in order from top to bottom)
#子查询+多表
explain select t.tname ,tc.tcdesc from teacher t,teacherCard tc where t.tcid= tc.tcid and t.tid = (select c.tid from course c where cname = 'sql') ;
Insert picture description here
Execution order c->t->tc

2.select_type: query type

PRIMARY: contains the main query in the subquery SQL (outermost)
SUBQUERY: contains the subquery in the subquery SQL (not the outermost)
simple: simple query (does not include subqueries, unions)
derived: derived queries (used Temporary table) explain select cr.cname from (select * from course where tid in (1,2)) cr;
union: In the from subquery, if there is table1 union table2, then table1 is derived and table2 is union ##explain select cr.cname from (select * from course where tid = 1 union select * from course where tid = 2) cr;
union result: inform developers that there is a union query between those tables

3. Type level:

system> const> eq_ref> ref> (fulltext> ref_or_null> index_merge> unique_subquery> index_subquery >) range> index> ALL
(1) system (ignored): the main query with only one data in a system table or a derived table with only one data
create table test01
(
	tid int(3),
	tname varchar(20)
);

insert into test01 values(1,'a') ;
alter table test01 add constraint tid_pk primary key(tid) ;
Insert picture description here


注:有些版本的mysql运行是const

explain select tid from test01 where tid =1 ;
Insert picture description here
(3) eq_ref: unique index: for each index key query, return matching unique row data (there are only 1, not more, not 0)
alter table teacherCard add constraint pk_tcid primary key(tcid);
alter table teacher add constraint uk_tcid unique index(tcid) ;

explain select t.tcid from teacher t,teacherCard tc where t.tcid = tc.tcid ;
Insert picture description here


We found that the type is ref but not eq_ref. The reason is that the teacher table has one more data than the teachercard table.

Insert picture description here
delete from teacher where tcid=1;
Insert picture description here
(4) ref: non-unique index, for each index key query, return all matching rows (0, more)

For the case, please see the error example in (3) !

(5) range: Retrieve rows in the specified range, where is followed by a range query (between,> <>=, special: in sometimes fails, and thus turns to no index all)
Insert picture description here
(6) index: query data in all indexes
explain select tid from teacher ; --tid 是索引, 只需要扫描索引表,不需要所有表中的所有数据
Insert picture description here
(7) all: query data in all tables
explain select cid from course ;  --cid不是索引,需要全表所有,即需要所有表中的所有数据
Insert picture description here
summary:
system/const: There is only one piece of data in the
result eq_ref: multiple pieces of result; but each piece of data is unique;
ref: multiple pieces of result; but each piece of data is 0 or more pieces;

4.possible_keys: The index that may be used is a kind of prediction, which is not accurate.

5.key: the index actually used

alter table  course add index cname_index (cname);

explain select tc.tcdesc from teacherCard tc,course c,teacher t 
where c.tid = t.tid and t.tcid = tc.tcid and c.cname = 'sql' ;
Insert picture description here

6.key_len: the length of the index

Role: used to determine whether the composite index is fully used (a, b, c)

create table test_kl
(
	name char(20) not null default ''
);
alter table test_kl add index index_name(name) ;
explain select * from test_kl where name ='' ;   -- key_len :60

Because 1 character stands for 3 bytes in utf8! ! ! 20*3=60

Insert picture description here


The name field above cannot be null, so what if the field can be null?

alter table test_kl add column name1 char(20) ;  --name1可以为null
alter table test_kl add index index_name1(name1) ;
explain select * from test_kl where name1 ='' ; 

-If the index field can be Null, 1 byte will be used for identification.

Insert picture description here


So what about the joint index?

drop index index_name on test_kl ;
drop index index_name1 on test_kl ;
#增加一个复合索引 
alter table test_kl add index name_name1_index (name,name1) ; 

explain select * from test_kl where name1 = '' ; --121
explain select * from test_kl where name = '' ; --60
Insert picture description here


Insert picture description here


Discuss what will happen when changing to varchar type?

alter table test_kl add column name2 varchar(20) ; --可以为Null 
alter table test_kl add index name2_index (name2) ;

explain select * from test_kl where name2 = '' ;  --63

20*3=60 + 1(null) +2 (use 2 bytes to identify variable length) = 63

Insert picture description here

7. ref: pay attention to distinguishing from the ref value in type

Function: Indicate the field referenced by the current table.
select …where ac = bx; (where bx can be a constant, const)


alter table course  add index tid_index (tid) ;
explain select * from course c,teacher t where c.tid = t.tid  and t.tname ='tw' ;
Insert picture description here

8.rows: The number of data queried by index optimization (the number of data actually queried through the index)

Insert picture description here

9.Extra

(i).using filesort: high performance consumption; an "extra" sorting (query) is required. Commonly found in order by statements.
create table test02
(
	a1 char(3),
	a2 char(3),
	a3 char(3),
	index idx_a1(a1),
	index idx_a2(a2),
	index idx_a3(a3)
);
explain select * from test02 where a1 ='' order by a1 ;

explain select * from test02 where a1 ='' order by a2; --using filesort

Insert picture description here


explain select * from test02 where a1 ='' order by a1;

Insert picture description here
Summary: For a single index, if the sort and search are in the same field, using filesort will not appear; if the sort and search are not the same field, using filesort will appear;
avoid: where fields are ordered by those fields 2

Composite Index (Best Left Prefix)
Creating a composite index (a, b, c) is equivalent to creating three indexes (a), (a, b), (a, b, c)

drop index idx_a1 on test02;
drop index idx_a2 on test02;
drop index idx_a3 on test02;

alter table test02 add index idx_a1_a2_a3 (a1,a2,a3) ;
explain select *from test02 where a1='' order by a3 ;  --using filesort
explain select *from test02 where a2='' order by a3 ; --using filesort
explain select *from test02 where a1='' order by a2 ;
explain select *from test02 where a2='' order by a1 ; --using filesort

Summary: Try to satisfy where and order by are used in the order of the compound index, not across columns or out of order.

(ii). Using temporary: The performance loss is large, and the temporary table is used. Generally appear in the group by statement.
	explain select a1 from test02 where a1 in ('1','2','3') group by a1 ;
	explain select a1 from test02 where a1 in ('1','2','3') group by a2 ; --using temporary
Insert picture description here
(iii). using index: performance improvement; index coverage (covering index).

Reason: do not read the original file, only get the data from the index file (no need to return to the table query)
as long as all the columns used are in the index, the index is covered using index

explain select a1,a2 from test02 where a1='' or a2= '' ; --using index  
Insert picture description here
(iv) using where (requires back to the table query)

Assuming age is an index column,
but the query statement select age,name from …where age =…, this statement must return to the original table to check the Name, so it will display using where.
This can verify the best left prefix:
according to the index idx_a1_a2_a3 (a1,a2 ,a3) The nature of the best left prefix, using where will not appear in the following cases,

Insert picture description here


Insert picture description here


Insert picture description here


but once the query without the left prefix or some conditions make the index invalid ( see the following content will explain ), using Where will appear, such as The following statement, because the table does not have the a2 index, so there will be using where.

Insert picture description here
(v). impossible where: where clause is always false

explain select * from test02 where a1='x' and a1='y';

Insert picture description here

Six. Optimization case

(1) Single table optimization
create table book
(
	bid int(4) primary key,
	name varchar(20) not null,
	authorid int(4) not null,
	publicid int(4) not null,
	typeid int(4) not null 
);

insert into book values(1,'tjava',1,1,2) ;
insert into book values(2,'tc',2,1,2) ;
insert into book values(3,'wx',3,2,1) ;
insert into book values(4,'math',4,2,3) ;	

Query bids with authorid=1 and typeid of 2 or 3, and sort them in descending order:

select bid from book where typeid in(2,3) and authorid=1  order by typeid desc ;
Insert picture description here


1. Optimization: add index

alter table book add index idx_tab (typeid,authorid,bid); --虽然可以回表查询bid,但是将bid放到索引中 可以提升使用using index ;
Insert picture description here


2. Optimize again (because some versions of sql above run out of index): ideas. Because the range query in is sometimes implemented, the order of the index is exchanged, and typeid in(2,3) is put at the end.

	drop index idx_tab on book;
	alter table book add index idx_atb (authorid,typeid,bid);
	explain select bid from book where  authorid=1 and  typeid in(2,3) order by typeid desc ;
Insert picture description here
Summary: a. The best prefix is ​​to maintain the consistency of the index definition and use order b. The index needs to be optimized gradually c. The range query containing In is placed at the end of the where condition to prevent failure.
In this example, Using where (required to return to the original table); Using index (not required to return to the original table): Reason, where authorid=1 and typeid in (2, 3) authorid is in the index (authorid, typeid, bid) , So there is no need to return to the original table (can be found directly in the index table); and typeid is also in the index (authorid, typeid, bid), but the range query containing in has invalidated the typeid index, so it is equivalent to There is no typeid index, so you need to return to the original table (using where);
for example, if there is no In below, it will not appear using where
explain select bid from book where authorid=1 and typeid =3 order by typeid desc;
Insert picture description here


it can also be proved by key_len In can invalidate the index.

(2) Two-table optimization
create table teacher2
(
	tid int(4) primary key,
	cid int(4) not null
);

insert into teacher2 values(1,2);
insert into teacher2 values(2,1);
insert into teacher2 values(3,3);

create table course2
(
	cid int(4) ,
	cname varchar(20)
);

insert into course2 values(1,'java');
insert into course2 values(2,'python');
insert into course2 values(3,'kotlin');

For example, we want to optimize the following sql

select *from teacher2 t left outer join course2 c on t.cid=c.cid where c.cname='java';

To which table should the index be added?

我们遵循小表驱动大表的原则, The index is established on the frequently used field (t.cid=c.cid in this question shows that the t.cid field is frequently used, so index the field) [Generally, for the left outer join, add an index to the left table; right outer join , Add an index to the right table]
When writing...on t.cid=c.cid, put the table with a small amount of data on the left (assuming that the amount of data in the t table is small at this time)
	alter table teacher2 add index index_teacher2_cid(cid) ;
	alter table course2 add index index_course2_cname(cname);
Insert picture description here
(3) Three tables optimize ABC
a.小表驱动大表  b.索引建立在经常查询的字段上

7. Summary of Joint Index

create table test03
(
  a1 int(4) not null,
  a2 int(4) not null,
  a3 int(4) not null,
  a4 int(4) not null
);

alter table test03 add index idx_a1_a2_a3_4(a1,a2,a3,a4) ;

Result analysis:

	explain select a1,a2,a3,a4 from test03 where a1=1 and a2=2 and a3=3 and a4 =4 ; --推荐写法,因为 索引的使用顺序(where后面的顺序) 和 复合索引的顺序一致

	explain select a1,a2,a3,a4 from test03 where a4=1 and a3=2 and a2=3 and a1 =4 ; --虽然编写的顺序 和索引顺序不一致,但是 sql在真正执行前 经过了SQL优化器的调整,结果与上条SQL是一致的。
	--以上 2个SQL,使用了 全部的复合索引

	explain select a1,a2,a3,a4 from test03 where a1=1 and a2=2 and a4=4 order by a3; 
	--以上SQL用到了a1 a2两个索引,该两个字段 不需要回表查询using index ;而a4因为跨列使用,造成了该索引失效,需要回表查询 因此是using where;以上可以通过 key_len进行验证

	explain select a1,a2,a3,a4 from test03 where a1=1 and a4=4 order by a3; 
	--以上SQL出现了 using filesort(文件内排序,“多了一次额外的查找/排序”) :不要跨列使用( where和order by 拼起来,不要跨列使用)


	explain select a1,a2,a3,a4 from test03 where a1=1 and a4=4 order by a2 , a3; --不会using filesort
-Summary: i. If (a,b,c,d) the composite index and the order of use are all the same (and not used across columns), then the composite index is all used. If it is partially consistent (and not used across columns), use a partial index.

8. Some principles to avoid index failure

(1) Composite index
b.复合索引,尽量使用全索引匹配
(2) Do not perform any operations (calculations, functions, type conversions) on the index, otherwise the index will fail
select …where Ax =…; --assuming Ax is an index
Don’t: select …where Ax 3 =…;
explain select * from book where authorid = 1 and typeid = 2; – at2 indexes are used
explain select * from book where authorid = 1 and typeid 2 = 2 ;--a1 index is used
explain select * from book where authorid 2 = 1 and typeid 2 = 2 ;---- 0 index is used
explain select * from book where authorid*2 = 1 and typeid = 2 ;---- 0 indexes are used, reason: for a composite index, if the left side fails, all the right side fails. (a,b,c), for example, if b fails, then bc fails at the same time.
 		drop index idx_atb on book ; 
		alter table book add index idx_authroid (authorid) ;
		alter table book add index idx_typeid (typeid) ;
		explain select * from book where authorid*2 = 1 and typeid = 2 ;
Insert picture description here
(3) The composite index cannot use not equal to (!= <>) or is null (is not null), otherwise all itself and all on the right side will be invalid.

If there is a> in the composite index, the index on itself and the right index are all invalidated.

explain select * from book where authorid = 1 and typeid =2 ;