Hundred-day sprint for interviews with Java gang big factory-accumulate every day, three questions every day [Day4] —— MySQL1

Hello everyone, I am Chen Haha, and I have been drifting north for five years. Friends who know me know that I was born in a non-major class, was a monk halfway through, and the university is also very poor! Coming to Beidiao with this background, you don’t know what you will experience 🙃🙃.

I can’t agree. I believe everyone is the same as me 都有一个大厂梦. As a senior Java player, I know the importance of interviews. Next, I am going to spend 100 days, based on the high-frequency interview questions in Java post interviews, 每日3题and take you in the form of Go through the popular interview questions and the appropriate answers. Of course, I won't go too deep, because I'm afraid I can't remember! !

因此,不足的地方希望各位在评论区补充疑惑、见解以及面试中遇到的奇葩问法, I hope these 100 days will allow us to fly over qualitatively and rush into the big factory together! ! , Let us learn together (juan)! ! !
Insert picture description here


This column Java developers Kong high-frequency face questions from each of the following major technology Java基础知识stack: 集合容器, 并发编程, JVM, Spring全家桶, MyBatis等ORMapping框架, MySQL数据库, Redis缓存, RabbitMQ消息队列, , Linux操作技巧and so on.

Finally arrived the long-awaited MySQL series, so comfortable, it is a sunny Friday~~~COOL!

Written in the front, the students in the group often mentioned: How do you generally ask about the database in the interview?

Although we are not a big company, we have interviewed many friends. We generally start with sql optimization, based on the content of the answer, in-depth principles, and then look for indexes and transactions. The things that have been actually optimized, the foundation is good, generally we will not ask after two points~

As for why not ask again, 是因为问太多毫无意义!!, You will have some experience after reading this article.

Interview question 1: What do you know about database optimization?

Seriously answer:

In a high-concurrency environment, the database is the most sensitive place. Everything that nginx load balancing, Server cluster, MQ message queue, Redis cache cluster, and database master-slave cluster do is to reduce database access pressure. but! The premise is to have a robust database and underlying code, so that the preliminary preparation is no longer a fancy.

Insert picture description here

The cost performance is as shown in the figure above, and our optimization priorities for the database are roughly as follows:

  • High: Starting from SQL optimization and index optimization, optimizing slow SQL and making good use of indexes are the top priorities;
  • Middle: After SQL optimization, it is to design the structure of the data table, divide the horizontal and vertical tables into the database, and process the magnitude of the data;
  • Low: By modifying the database system configuration, maximize the use of server memory and other resources;
  • Low: The above method does not work, that is, the server resource bottleneck, add the machine.
Cost optimization: hardware> system configuration> database table structure> SQL and index.
Optimization effect: hardware <system configuration <database table structure <SQL and index.

In-depth inquiry:

Follow-up 1: What skills do you have on SQL optimization?

Simply put, for SQL optimization, there are three points:

  • Maximize the use of indexes;
  • Avoid full table scans as much as possible;
  • Reduce invalid data queries;

First, we must be clear about the SELECT statement-execution order:

<table name> # Select a table, and convert multiple table data into one table through Cartesian product.
<Filter condition> # Filter the virtual table of the Cartesian product
JOIN<join, left join, right join…>
<join table> # Specify join, which is used to add data to the virtual table after on. For example, left join will The remaining data of the table is added to the virtual table
<where condition> # filter the above virtual table
<grouping condition> # grouping
<SUM(), etc. aggregate function> # used to judge having clauses, and write such aggregate functions It is the
<grouping filter> written in the having judgment # aggregate and filter the grouped results
<return data list> # The single column returned must be in the group by clause, except for the aggregate function
<sort condition> # Sort
<Line limit>

SQL optimization strategy:

Disclaimer : The following SQL optimization strategies are suitable for scenarios with a large amount of data. If the amount of data is small, it is not necessary to take this as the criterion to avoid superfluous.

1. Avoid scenes that don't go index

  1. Try to avoid fuzzy queries at the beginning of the field, which will cause the database engine to abandon the index and perform a full table scan. as follows:
SELECT * FROM t WHERE username LIKE '%陈%'

Optimization method: try to use fuzzy query behind the field. As follows: (The reason involves the principle of the leftmost prefix of the B+Tree index. You can refer to "The principle of MySQL leftmost matching, the principle that all brothers should know" )

SELECT * FROM t WHERE username LIKE '陈%'

If the requirement is to use fuzzy query in front,

  • Use the MySQL built-in function INSTR(str, substr) to match, which is similar to indexOf() in java. For the position of the index of the query string, please refer to "MySQL Fuzzy Query Usage (Regular, Wildcard, Built-in Functions, etc.) "
  • Use FullText full-text index, search with match against
  • If the amount of data is large, it is recommended to use ElasticSearch, solr, and the retrieval speed of hundreds of millions of data volume is in seconds
  • When the amount of data in the table is small (thousands of items), don't be too fancy, just use like'%xx%'.
  1. Try to avoid using or, which will cause the database engine to abandon the index for full table scans. as follows:
SELECT * FROM t WHERE id = 1 OR id = 3

Optimization method: Union can be used instead of or. as follows:

  1. Try to avoid the judgment of the null value, which will cause the database engine to give up the index and perform a full table scan. as follows:

Optimization method: You can add a default value of 0 to the field, and judge the value of 0. as follows:

SELECT * FROM t WHERE score = 0
The importance of setting the field to not null is explained here. For details, please refer to the previous blog post "Leader Tearfully Admonishes Me: Remember to use not null for MySQL table fields, or you will get rid of the baggage."
  1. Try to avoid performing expressions and function operations on the left side of the equal sign in the where condition, which will cause the database engine to abandon the index and perform a full table scan.

You can move expressions and function operations to the right of the equal sign. as follows:

-- 全表扫描
SELECT * FROM T WHERE score/10 = 9
-- 走索引
SELECT * FROM T WHERE score = 10*9
  1. When the amount of data is large, avoid using the where 1=1 condition. Usually in order to facilitate the assembly of query conditions, we will use this condition by default, and the database engine will abandon the index and perform a full table scan. as follows:
SELECT username, age, sex FROM T WHERE 1=1

优化方式: Judge when assembling sql with code, remove where if there is no where condition, and add and if there is a where condition.

  1. Do not use <> or != for query conditions

When querying using index columns as conditions, you need to avoid using judgment conditions such as <> or !=. If you really need to use the unequal symbol, you need to re-evaluate the index to avoid creating an index on this field, and replace it with other index fields in the query conditions.

  1. The where condition contains only the non-leading columns of the composite index

As follows: The composite (joint) index contains three columns, key_part1, key_part2, and key_part3, but the SQL statement does not include the index pre-column "key_part1". According to the leftmost matching principle of MySQL joint index, joint index will not be used. .

select col1 from table where key_part2=1 and key_part3=2
Students who understand the principle can refer to " The principle of MySQL leftmost matching, the principle that all brothers should know"
  1. Implicit type conversion causes no index to be used

In the following SQL statement, because the index pair column type is varchar, but the given value is a numeric value, it involves implicit type conversion, causing the index to be incorrect.

select col1 from table where col_varchar=123; 
Students who understand the principle can refer to "The Explosive MySQL Implicit Conversion-The Invisible Blade, The Most Deadly"
  1. The order by conditions must be consistent with the conditions in where, otherwise order by will not use the index for sorting
-- 不走age索引
SELECT * FROM t order by age;
-- 走age索引
SELECT * FROM t where age > 0 order by age;
  • The first step: generate an execution plan based on where conditions and statistical information, and get the data.
  • Step 2: Sort the obtained data. When executing the processing data (order by), the database will first check the execution plan of the first step to see if the order by field uses an index in the execution plan. If so, you can use the index order to directly obtain the sorted data. If it is not, the sorting operation is performed again.
  • Step 3: Return the sorted data.

When the field in the order by appears in the where condition, the index will be used instead of secondary sorting. More precisely, when the field in the order by uses the index in the execution plan, the sort operation is not required.

This conclusion is not only valid for order by, but also valid for other operations that require sorting. Such as group by, union, distinct, etc.

Insert picture description here

Two, some other optimizations of the SELECT statement

  1. 避免出现select *

First of all, the select * operation is not a good SQL writing habit in any type of database.

Using select * to remove all columns will make the optimizer unable to complete such optimizations as index coverage scans, will affect the optimizer's choice of execution plan, will increase network bandwidth consumption, and will bring additional I/O, memory and CPU Consumption.

It is recommended to put forward the number of columns actually required by the business, and specify the column names to replace select *.

For details, see "Why Everyone Says SELECT * is Low Efficiency"
  1. 避免出现不确定结果的函数

Specifically for business scenarios such as master-slave replication. Since in principle the statements executed by the main library are copied from the library, the use of functions with uncertain results such as now(), rand(), sysdate(), current_user() and so on can easily lead to inconsistent data between the main library and the slave library. In addition, for functions with uncertain values, the generated SQL statements cannot use query cache.

  1. 多表关联查询时,小表在前,大表在后

In MySQL, the table association query after executing from is executed from left to right (the opposite of Oracle). The first table will involve a full table scan, so put the small table first, scan the small table first, and the scan is faster and more efficient. Higher, after scanning the large table, perhaps only scanning the first 100 rows of the large table meets the return condition and returns.

For example: Table 1 has 50 pieces of data, and Table 2 has 3 billion pieces of data; if the entire table scans Table 2, you will have a meal, right?

Insert picture description here
  1. 使用表的别名

When connecting multiple tables in a SQL statement, please use the alias of the table and prefix the alias to each column name. This can reduce the parsing time and reduce the grammatical errors caused by the ambiguity of the friend column names.

  1. 用where字句替换HAVING字句

Avoid using the HAVING sentence, because HAVING will only filter the result set after all the records are retrieved, while where is to scrub the records before aggregation. If the number of records can be limited by the where sentence, this can be reduced. Overhead. Conditions in HAVING are generally used to filter aggregate functions. In addition, the conditions should be written in the where clause.

  • The difference between where and having: group functions cannot be used behind where
  1. 调整Where字句中的连接顺序

MySQL uses left-to-right, top-down order to parse the where clause. According to this principle, the conditions for filtering more data should be put forward, and the result set should be reduced as quickly as possible. By the way, I heard that the syntax parser of version 5.7 has realized the automatic adjustment of conditions after where. For scenarios with many query conditions, it is recommended not to make this attempt.

Follow-up 2: Well, then why don't you suggest to use SELECT *?

In the "Alibaba Java Development Manual (Taishan Edition)" (extract code: hb6i) in the Ali code specification, the MySQL part describes the statement:

4-1. 【强制】In the table query, never use * as the field list of the query, which fields must be clearly written out.
  • Increase the cost of query analyzer parsing.
  • The added or subtracted fields are easily inconsistent with the resultMap configuration.
  • Useless fields increase network consumption, especially text type fields.

1. Unneeded columns will increase data transmission time and network overhead

Using the "SELECT *" database needs to parse more objects, fields, permissions, attributes and other related content. In the case of complex SQL statements and more hard parsing, it will cause a heavy burden on the database.

Increase network overhead; * Sometimes useless and large text fields such as log and IconMD5 are mistakenly brought, and the data transmission size will increase geometrically. If the DB and the application are not on the same machine, this overhead is very obvious.

Even if the mysql server and the client are on the same machine and the protocol used is still tcp, the communication requires additional time.

2. For useless large fields, such as varchar, blob, text, io operations will be added

To be precise, when the length exceeds 728 bytes, the excess data will be serialized to another place first, so reading this record will add an io operation. (MySQL InnoDB)

3. Lost the possibility of the optimization of the "covering index" strategy of the MySQL optimizer

SELECT * eliminates the possibility of covering indexes, and the "covering index" strategy based on the MySQL optimizer is extremely fast and efficient, and is a highly recommended query optimization method in the industry.

Insert picture description here

Break between classes, "Angler 1" Coordinates: Beijing Yuandadu Site . 😂

Interview question 2: What do you think of the sub-database and sub-table?

Seriously answer:

  • Sub-database: Split a single database instance into multiple database instances, and distribute data to multiple database instances.
  • Split table: split a single table into multiple tables, and divide the data into multiple tables.

You know, for large-scale Internet projects, the level of data may not be what we can think of. It is common for the daily amount of new data to exceed 10 million, and it is unrealistic to rely on a single MySQL server. You Xiang Yu is in Niu B, and you can't stand the four teammates hanging up! ! Xiang Yu:? ? ?

Insert picture description here

With the increasing amount of business data and website QPS, the pressure on the database is also increasing. The stand-alone database will soon reach the storage and concurrency bottleneck, and database performance needs to be optimized. The sub-database and table are divided and conquered. Strategy, the purpose of sub-database is to reduce the storage pressure and scalability of a single MySQL instance, and sub-table is to solve the bottleneck problem of the query after the data of a single table is too large. Frankly speaking, these problems are also the "fault" of all relational databases. .

Common strategies 垂直分表include: 水平分表, 垂直分库, 水平分库, .
Insert picture description here

1. Vertical sub-table

Vertical sub-table, or called 竖着切表, do you feel that the strategy is based on fields! The fields in the table are split into different tables (main table and extended table) mainly according to the activity and field length of the field.


  • The structure of each table is different;
  • The data of each table is different,
  • There is an associated field, generally a primary key or foreign key, used to associate 兄弟表data;
  • The union of all sibling tables is the full data of the table;

Scene :

  1. 有几个字段属于热点字段,更新频率很高, To cut these fields into a table separately, otherwise the innodb row lock is disgusting and locks you up~~ such as the balance field in the user table? No, my balance is very stable, it is always 0. .
  2. 有大字段,如text, The storage pressure is very high. After all, the innodb data and the index are the same file; at the same time, I like to use SELECT *. You know, this disk IO consumption is like a play, and no one can bear it.
  3. 有明显的业务区分,或表结构设计时字段冗余;When some friends saw the first point, they found that Chen Haha was a rookie, how could there be a balance field in the user table? Obviously there is a problem! Hurry up and spray Chen haha ​​in the comment area~~ Then smiled and found that it was a little tail, really shameless, isn't it? . Yes, so we need to separate the specific fields for different businesses, so as to facilitate the subsequent expansion of the business.

2. Level score table

Level score table, also called "cross cut". . Segmentation is performed based on row data, generally according to the self-capacity of a certain column.

For example, the mobile phone number table can be divided by the first two or the first three digits. For example 131、132、133 → phone_131、phone_132、phone_133, there are 11 digits (10 billion) in the mobile phone number. It is normal for a large amount to be large. Is it? This cut will cut a large table into dozens of small tables, and the data volume will not come down. A classmate asked, how do I know which table to check with my mobile phone number? At first glance, you didn't pay attention to the red dots in the first two rows. Why do you have red dots? For example, if I check 13100001111, I will intercept the first three digits and dynamically splice them to the query table name.


  • The structure of each table is the same;
  • The data of each table is different, there is no intersection;
  • The union of all tables is the full data of the table;

Scenario : The data volume of a single table is too large or the growth rate is very fast, which has affected or will affect the efficiency of SQL query, increased the burden on the CPU, and reached the bottleneck in advance. Remember that the sooner the level table, the better, don't ask me why. .


Insert picture description here


What you need to pay attention to is that 传统的分库和我们熟悉的集群、主从复制可不是一个事儿; a multi-node cluster replicates a library into N libraries, so as to achieve load balancing of multiple MySQL services through read-write separation. Actually, it is built around one library. This library is called the Master main library. . The sub-library is different. The sub-library divides the main library into N, for example, into two, and then configures 2N slave nodes for the two main libraries.

3. Vertical sub-library

Vertical database segmentation, the classic segmentation method, is based on table segmentation, usually by splitting new business modules or integrated public modules, such as the single sign-on and authentication modules we are most familiar with. Familiar taste, I remember once I cut some useless watches to a server with good performance. I used this server exclusively for learning. Later, I didn’t know which dog leg informed it~ I **you**, yes Kind of stand out, you fucking thing 😅😅.

Insert picture description here


  • The table of each library is different;
  • The table is different, the data is even more different~ There is no intersection;
  • Each library is relatively independent and modular

Scenario : When a separate business module can be abstracted, public time (such as dictionary, public time, public configuration, etc.) can be abstracted, or when you want to have your own server?

4. Horizontal sub-library

Based on the row data, the data in one library is split into multiple libraries. Experiencing a large sub-meter? Frankly speaking, this strategy is not practical, because it will be very unfriendly to background development and has many pitfalls. It is not recommended to be adopted, just understand it.


  • The structure of each library is the same;
  • The data of each library is different, there is no intersection;
  • The union of all libraries is the full amount of data;

Scenario : The absolute concurrency of the system is increasing, and the CPU memory pressure is high. Sub-metering is difficult to fundamentally solve the problem of volume, and there is no obvious business attribution to vertical sub-database, and the main library disk is close to saturation.

In fact, in actual work, before choosing a database and table strategy, we should think of caching, read-write separation, SQL optimization, etc., because these can solve problems more directly and at a lower cost. 要记住动表就是动根本,你永远不知道这张表后面会连带多少历史遗留问题, If it is a very large project, and if you encounter some problems, you should propose to the manager to divide the database and the table, and be careful to be called to death~

In-depth inquiry:

Follow-up 1: It's meaningless, I really don't want to ask him MySQL questions 🙃🙃

Insert picture description here

Break between classes, "The Angler 2" Coordinates: Beijing Liangma River . 😂😂

Interview question 3: What are the ways that MySQL deletes data?

Seriously answer:

There are three commonly used deletion methods: delete through the delete, truncate, and drop keywords; these three can be used to delete data, but they are used in different scenarios.

In-depth inquiry:

Follow-up 1: Tell me about the difference between delete, truncate and drop

1. In terms of execution speed

drop> truncate >> DELETE

2. In principle

DELETE from TABLE_NAME where xxx

DELETE belongs to the database DML operation language. It only deletes data without deleting the structure of the table. It will go through the transaction and trigger the trigger when it is executed;

In InnoDB, DELETE does not actually delete the data. MySQL actually only marks the deleted data as deleted. Therefore, when delete deletes the data in the table, the space occupied by the table file on the disk will not be If it becomes smaller, the storage space will not be released, but the deleted data row is set to be invisible. Although the disk space is not released, you can still reuse this space when inserting data next time (reuse → overwrite).

When DELETE is executed, the deleted data will be cached in the rollback segment first, and the transaction will take effect after the commit;

delete from table_name deletes all the data of the table. For MyISAM, the disk space will be released immediately, and InnoDB will not release the disk space;

For delete from table_name where xxx conditionally delete, neither InnoDB nor MyISAM will release disk space;

After using the delete operation optimize table table_namewill immediately free up disk space. Whether it is InnoDB or MyISAM. Therefore, in order to achieve the purpose of freeing up disk space, execute optimize table operation after delete.

The delete operation is executed line by line, and the delete operation log of the line is recorded in the redo and undo tablespaces for rollback and redo operations. A large number of generated logs will also take up disk space.

  • truncate
Truncate table TABLE_NAME

truncate: belongs to the database DDL definition language, does not take the transaction, the original data is not placed in the rollback segment, and the operation does not trigger the trigger.


truncate table table_name immediately releases disk space, whether it is InnoDB or MyISAM. Truncate table is actually a bit similar to drop table and then creat, but the process of creating table is optimized, for example, the table structure file already exists before. So the speed should be close to the speed of the drop table;

truncate can quickly empty a table. And reset the value of auto_increment.

But for different types of storage engines need to pay attention to:
  • For MyISAM, truncate resets the value of auto_increment (auto-increment sequence) to 1. While delete后表仍然保持auto_increment.
  • For InnoDB, truncate will reset the value of auto_increment to 1. The table still keeps auto_increment after delete. But 在做delete整个表之后重启MySQL的话,则重启后的auto_increment会被置为1...

In other words, InnoDB's table itself cannot persist auto_increment. After deleting the table, auto_increment is still stored in the memory, but it is lost after restarting, and can only start from 1. In fact, auto_increment after restart will start from SELECT 1+MAX(ai_col) FROM t.

  1. Use truncate carefully, especially when there is no backup, if you delete the online form by mistake, remember to contact Civil Aviation of China in time, and the booking phone number:400-806-9553
  • drop
Drop table Tablename

drop: belongs to the database DDL definition language, same as Truncate;


drop table table_name releases the disk space immediately, whether it is InnoDB or MyISAM; the drop statement will delete the constraints, triggers, and indexes on which the structure of the table is dependent; the stored procedures/functions that depend on the table will be Reserved, but becomes invalid.

Use drop carefully. If you want to delete the brother who runs off the table, please perform the operation after the ticket is successfully booked! Booking Tel: 400-806-9553

It can be understood that for a book, delete is to tear up the catalog, truncate is to tear down the content of the book and burn it, and drop is to burn the book.

Insert picture description here

Daily summary

Today we reviewed the MySQL类three frequently tested questions in interviews . Do you know it well? 对了,如果你的朋友也在准备面试,请将这个系列扔给他,如果他认真对待,肯定会感谢你的!!Okay, let’s stop here today 记得在评论区留言:打卡。, to encourage students who have lost their studies .

Summary of MySQL series articles and "MySQL Jianghu Road | Column Directory"