The principle of MySQL auto-increment primary key auto_increment and the positioning of gap discontinuity in auto-increment primary key

1. Background:

1.1. Business description and SQL:

In order to save the information reported by the machine (the business requirement is that each machine only needs to save the latest record), the original SQL statement is as follows (where machineId is a unique index, and the id of t_report_pad is an auto-increasing primary key of type bigint):

insert into t_report_pad    (machine_id,pad_device_model,app_version,app_version_code,pad_version,pad_version_code,upgrade_status,reason,create_time,update_time) values    (#{machineId},#{padDeviceModel},#{appVersion},#{appVersionCode},#{padVersion},#{padVersionCode},#{upgradeStatus},#{reason},now(),now())

In this case, when the same machine_id sends SQL multiple times, it will cause a unique primary key conflict problem, which will cause the subsequent machine information to be unable to be added. In order to solve this situation, the business uses insert… on duplocate key update statement optimization The original statement and the modified SQL statement are as follows. Since then, the business functions have been used normally.

insert into t_report_pad     (machine_id,pad_device_model,app_version,app_version_code,pad_version,pad_version_code,upgrade_status,reason,create_time,update_time)values    (#{machineId},#{padDeviceModel},#{appVersion},#{appVersionCode},#{padVersion},#{padVersionCode},#{upgradeStatus},#{reason},now(),now())ON DUPLICATE KEY UPDATE    pad_device_model=#{padDeviceModel},    app_version=#{appVersion},    pad_version=#{padVersion},    pad_version_code=#{padVersionCode},    app_version_code=#{appVersionCode},    upgrade_status=#{upgradeStatus},    reason=#{reason},    update_time=now()

1.2. Problem-Data type conversion exception:

After running for a period of time, it is found that the value of the auto-incrementing primary key id in the t_report_pad table has exceeded the maximum value of the int type, and because the int type is used in the business code to receive, the business function is abnormal (the code here and the type of ID in MySQL The inconsistency is because the DBA has modified the data type of the ID beforehand, but the developer forgot to synchronously modify the type of the Java object in the code), so a version was urgently fixed to solve the online problem.

1.3. In-depth analysis:

(1) Phenomenon:

After in-depth analysis, it is found that the amount of data in the DB table is only 600W, but the auto-incrementing primary key ID is as high as 2.1 billion, and many auto-incrementing primary keys are not continuous. Why does this happen? Is there a bug in the code? Or is there a problem with the auto-increment primary key step size setting in MySQL? Or something else? In the spirit of breaking the casserole question to the end, after querying multiple sources, we finally found the reason for this phenomenon: for this kind of self-incrementing primary key gap, it is neither a code bug nor a DB parameter setting problem, but In mysql, in order to optimize the performance of auto_increment self-incrementing primary key under high concurrency, it is caused by the default configuration of an algorithm mode (the algorithm mode is configured by the innodb_autoinc_lock_mode parameter), especially when using insert ... on duplicate key update syntax , It is easy to have the primary key gap, which is a normal usage phenomenon.

(2) Analysis process:

① Analysis process 1-Check whether the business logic in the code is normal: Check whether the value of the ID column is manually set in the SQL statement. It can be easily seen from the above SQL statement that the value of id is not manually set. The default is to use self Increase the value generated by the primary key. Then check whether there is a corresponding row record in the delete table in the code, so as to derive that there is a primary key gap, which does not exist after checking.

② Analysis process 2-View the step length setting of the auto-incrementing primary key:


show session variables like 'auto_increment%'

The results are as follows:

The result shows that the step size of the auto-increment primary key of DB is set normally, and the default is auto-increment 1, so the primary key gap problem is not caused by this situation.

③ Analysis process 3-the algorithm mode setting of the self-incrementing primary key (the problem that is finally located):

This parameter is only affected by the innodb_autoinc_lock_mode parameter, see the second and third parts of the article for details

2. Detailed explanation of MySQL's auto_increment:

1. The basic characteristics of auto_increment:

The attributes of the auto_increment type in MySQL are mainly used to automatically generate IDs for records in a table.

(1) When inserting a record, if a value is explicitly specified for the auto_increment data column, there will be two situations:

  • Case 1: If the inserted value is the same as the existing number, an error exception will occur, because the value of the auto_increment data column must be unique;
  • Case 2: If the inserted value is greater than the existing number, it will be inserted into the data column, and the next number will be incremented from this new value. In other words, some numbers will be skipped.

(2) If the maximum value of the auto-increment sequence is deleted, the value is reused when inserting a new record.

(3) If you use the update command to update the auto-increment column, and the column value is duplicated with the existing value, an error will occur. If it is greater than the existing value, the next number is incremented from this value.

2. Regarding the lock table operation brought by MySQL's auto_increment:

(1) Before MySQL 5.1.22, MySQL's "insert-like" will use a table-level auto-increasing lock (AUTO-INC Lock) to lock the entire table until the end of the statement (and Not the end of the transaction). During this period, other insert-like, update and other statements will be blocked, so it is recommended to use the program to divide these statements into multiple statements and insert them one by one to reduce the lock table time at a single time.

insert-like statement: insert, insert… select, replace, replace… select, load data, insert... values(), values()

(2) After MySQL 5.1.22, MySQL has been improved, and the parameter innodb_autoinc_lock_mode is introduced , and the lock table logic of MySQL is controlled by this parameter.

3. Innodb_autoinc_lock_mode parameter description of MySQL:

The innodb_autoinc_lock_mode parameter can be used to configure the mode of the auto-increment locking algorithm, thereby making a trade-off between a predictable sequence of auto-increment values ​​and the maximum concurrency of insert operations.

3.1. Classification of insert statement:

The insert-like statement mentioned earlier refers to all the statements that can add rows to the table. If it is further subdivided, it can be divided into three types:

  • (1) Simple inserts: The number of rows to be inserted can be determined in advance by analyzing the insert statement, including insert, insert... values(), values() statements
  • (2) Bulk inserts: The number of inserts cannot be determined by analyzing insert statements, including insert… select, replace… select, and load data statements.
  • (3) mixed-mode inserts: I’m not sure whether I need to allocate auto_increment id, such as insert into t (id,name) values ​​(1,'a'),(null,'b'),(5,'c') and insert... on duplicate key update. For the latter, its worst case is actually an update followed by the insert statement, where the assigned value of the auto_increment column may not necessarily be used in the update phase

3.2. Innodb_autoinc_lock_mode mode description:

  • 0: This represents the tradition, that is, the traditional mode (table lock will be generated every time)
  • 1: This means consecutive, that is, continuous mode (a lightweight lock will be generated, and simple insert will obtain a batch of locks to ensure continuous insertion)
  • 2: This means interleaved, that is, interleaved mode (the table will not be locked, and one will be processed one by one, with the highest concurrency)

(1) Tradition mode (innodb_autoinc_lock_mode=0):

This method is the same as before MySQL 5.1.22, mainly to provide a backward compatibility capability. In this mode, all insert statements must acquire a table-level auto_inc lock at the beginning of execution, and the lock will not be released until the end of the statement (not the end of the transaction) to ensure that the distribution of the self-increasing value is predictable and continuous , Repeatability, and ensure that the auto-increment value assigned to any given statement is continuous. However, the auto_inc lock needs to be maintained until the end of the statement, resulting in poor concurrency.

In terms of the security of master-slave replication, in the case of statement-based replication, this means that when SQL statements are replicated on the slave server, the auto-increment column uses the same value as on the master server. Because in this mode, the execution results of multiple INSERT statements are deterministic, SLAVE can reproduce the same data as MASTER, ensuring the security of statement-based replication.

(2) Consutive mode (innodb_autoinc_lock_mode=1):

In the default mode after MySQL 5.1.22, the concurrency is relatively high. The advantage of this mode is that the auto_inc lock does not need to be held until the end of the statement under certain circumstances, and the lock can be released early as long as the statement obtains the corresponding value. In this mode:

  • Simple inserts: Since the number of inserts can be obtained in advance, the table-level auto-inc lock is avoided by obtaining the required number of auto-increment values ​​under the control of the mutex mutex, and then enough auto_increment id is allocated at one time and only locked The process of assigning id, not the operation process of the entire statement.
  • bulk inserts: Because the number of inserts cannot be determined, the same table-level locking as the previous mode is used. InnoDB assigns a new value to the auto_increment column at one time when processing each SQL
  • mixed-mode inserts: Get the number of inserts that need to be inserted in the worst case by analyzing the statement, and then allocate enough auto_increment id at one time, which also only locks the process of id allocation. However, in this way, too many IDs will be allocated, resulting in "waste".
  • For example, insert into t1 (id,c2) values ​​(1,'a'), (null,'b'), (5,'c'), (null,'d'); will allocate 4 ids at once , Regardless of whether the user specifies part of the id;
  • insert… on duplocate key update One-time allocation, regardless of whether the duplicate key will only be updated in the future insertion process.

Regarding the security of master-slave replication, whether it is statement-based or row-based, this mode can also guarantee the security of statement-based master-slave replication.

(3) Interleaved mode (innodb_autoinc_lock_mode=2):

In this mode, all insert-like statements will not use table-level auto-inc locks. This mode is to allocate one by one, only lock the process of id allocation, and can execute multiple statements at the same time, which is the best performance And the most expandable lock mode. The difference between it and innodb_autoinc_lock_mode = 1 is that it will not pre-allocate more than one.

Since self-increasing values ​​can be generated for multiple statements at the same time (that is, cross-statement cross-numbering), the auto_incremant values ​​generated by rows inserted in the same statement may be discontinuous, that is, there are gaps. For example, when executing "bulk inserts", there may be gaps in the auto-increment value assigned to any given statement. But if the statement is "simple inserts", the number of rows to be inserted can be known in advance, so there will be no gaps.

Finally, in terms of the security of master-slave replication, when binlog_format is statement-based (SBR for short, statement-based replication), there will be a problem because it is allocated one by one. When executed concurrently, "bulk inserts" When assigning, it will be assigned to other inserts at the same time, resulting in a master-slave inconsistency (the execution result of the slave library is different from the execution result of the master library), because the binlog will only record the initial insert id. But if binlog_format is row-based, there will be no problem.

3.3. Summary of master-slave security issues of innodb_autoinc_lock_mode:

(1) Master-slave security: If binlog_format uses row-based or mixed mode replication, all auto-increment lock modes are safe, because row-based replication is not sensitive to the execution order of SQL statements (mixed mode will When encountering unsafe statements, the row-based replication mode is used), so you can set innodb_autoinc_lock_mode = 2 to get better concurrency. But when the binlog_format is based on statement replication statement-base, innodb_autoinc_lock_mode = 1 can be set to ensure replication security while obtaining the maximum concurrency of simple insert statements

(2) The setting of the innodb_autoinc_lock_mode parameter is for the innoDB storage engine. In the case of the myisam engine, no matter what kind of auto-increment id lock is a table-level lock.

3. Circumstances that cause gaps in the self-increment column

Through the description of the above article, we can finally determine that the reason for the gap in the self-incrementing primary key is that the insert... on duplicate key update in mixed-mode inserts, the worst case is actually an update after the insert statement. Among them, the assigned value of the auto_increment column may not be used in the update phase, which leads to gaps in the auto-increment primary key.

1. In order to prevent the same online problems, what solutions are there?

(1) For the primary key type of the old table, when modifying the field type in the DB table, the object attribute type in the code must be modified synchronously

(2) Subsequent to the primary key type of the new table, directly set to bigint, the reason is:

  • When the amount of data in the mysql table is small, the primary key of int type and the primary key of bigint type have negligible performance and storage pressure on the DB.
  • If there is a lot of data in the table but has not reached the maximum value of int 2.1 billion, but there is DB performance or disk storage pressure, the improvement brought by using int instead of bigint is actually minimal.
  • When the amount of data in the mysql table exceeds 2.1 billion, it is necessary to upgrade the primary key of the int type to the bigint type.

(3) Reasons for not using uuid:

In general, for MySQL database, it is recommended to use auto-incrementing ID for primary keys, because in MySQL's InnoDB storage engine, the primary key index is a clustered index, and the leaf nodes of the B+ tree of the primary key index store the primary key value and data in order. If The primary key index is an auto-increment ID, which only needs to be arranged backwards in order. If it is a UUID, the ID is randomly generated, which will cause a large amount of data movement when data is inserted, a large amount of memory fragmentation, and a decrease in insertion performance. In addition, UUID occupies a larger space, and the more indexes created, the greater the impact.

2. Other situations that cause gaps in auto-increment columns:

(1) If the transaction that generates the auto-increment value is rolled back, those auto-increment values ​​will be lost. Once the values ​​are generated for the auto-increment column, regardless of whether the "insert-like" statement is completed and the containing transaction is rolled back, these values ​​cannot be rolled back, that is, the missing values ​​are not reused. Therefore, there may be gaps in the values ​​stored in the auto_increment column of the table.

(2) If the user specifies null or 0 for auto_increment in insert, InnoDB will treat the row as an unspecified value and generate a new value for it.

(3) If a negative value is manually assigned to the auto_increment column, the behavior of the auto increment mechanism will not be triggered.

(4) The step length setting of the DB self-incrementing primary key

Reference article: