Transaction ACID, isolation level, dirty read, phantom read and non-repeatable read

ACID of the transaction

Atomicity

The SQL statements executed in a transaction either all succeed or all fail, and it is impossible to partially succeed.

Consistency

The data before and after the execution of the transaction is the same, that is to say, the SQL statement in the transaction cannot be executed only partly. This kind of payment generally occurs when the transaction is abnormally interrupted, and the server is abnormally down. At this time, it is necessary to ensure that the transaction being executed is committed or rolled back after the database is restarted.

Isolation

Do not interfere with each other during the execution of multiple transactions.

Durability

After the transaction is submitted successfully, it is permanently saved on the disk. After the transaction is submitted successfully, the data is not saved. The abnormal downtime will be restored according to the log. The rollback should be rolled back and the commit should be submitted.

The isolation level of the transaction (efficiency decreases in order)

Read uncommitted

A transaction can read the uncommitted data of other transactions. Dirty read

Read submitted

In a transaction, data that has been submitted by other transactions can be read, and phantom reads will occur, which cannot be read repeatedly.

Repeatable

In a transaction, the same filter condition reads the same data every time.

Serialization

Transactions are executed in order of submission.

Dirty reads, phantom reads and non-repeatable reads

Use the following table to reproduce dirty reads, phantom reads and non-repeatable reads

create table testtable (
	id bigint primary key auto_increment comment '主键',
	bus1 int comment '业务字段1',
	bus2 varchar(20) comment '业务字段2'
);

Dirty read

Under the read uncommitted isolation level, transaction A can read the uncommitted data of transaction B. Assuming that transaction B is finally rolled back, the data read by transaction A is regarded as dirty read because this data is not persisted.

-- 事务A insert一条语句,不进行commit 
-- 设置当前session事务不自动提交
set SESSION autocommit = 0;

START TRANSACTION;
insert into testtable values(NULL,2,'我是事务A2');

bus1 = 2Data read by transaction B

-- 事务B
-- 设置当前session的数据库隔离级别为读未提交
set session transaction isolation level read uncommitted;
select * FROM  testtable where bus1 = 2;

Result:

Insert picture description here


You can see that the data of transaction A that has not yet been committed is queried. At this time, we roll back transaction A, execute transaction B again, select rollback;and execute:

Insert picture description here


at this time, execute the query of transaction B:

Insert picture description here


at this time, the uncommitted data of transaction A cannot be queried. This is a dirty read. Transaction B reads the data, but in the end the data is not committed but rolled back. We can't find this data in the library.

Non-repeatable

Under the read-committed isolation level, transaction A performs an update (add, modify, delete) operation on a certain piece of data, and the data read by transaction B before transaction A is committed is inconsistent with the data read after transaction A is committed.
Let's reproduce this scenario:
first set the transaction isolation level to read committed:

set session transaction isolation level read committed;

Then execute a query:

select * FROM  testtable where bus1 = 1;

The result at this time is:

Insert picture description here


Then transaction A starts and commits:

-- 设置当前session事务不自动提交
set SESSION autocommit = 0;

START TRANSACTION;
update testtable set bus2  = '我是事务A提交后' where bus1  = 1;

commit;
Insert picture description here


Then transaction B reads again with the same conditions: it is

Insert picture description here


found that the data submitted by transaction A has been read. In this case, transaction B first reads the data, then transaction A executes and commits, and then transaction B reads the data again , It is found that the data read at this time is inconsistent with the data read last time. This is non-repeatable reading.

Phantom reading

In the repeatable read transaction isolation level, when the same range is queried twice in the same transaction, the row that did not exist in the previous query is found in the next query.
There are two points different from non-repeatable reading

  1. The phantom reading is only for the number of rows, not for whether the content in each row is updated
  2. The isolation level of phantom read is repeatable read, while the isolation level of non-repeatable read is read submitted.
    Let’s reproduce this scenario:
    In order to remove the interference, I empty the data table and insert a piece of data:
TRUNCATE testtable; 
START TRANSACTION;
insert into testtable values(NULL,1,'我是事务A');
COMMIT;

Then write a range query to compare the data before and after:

-- 事务B
-- 设置当前session的事务隔离级别为可重复读
set SESSION autocommit = 0;
set session transaction isolation level repeatable read;
START TRANSACTION;
select testtable.* FROM  testtable where bus1 >= 1 and bus1  <= 3;
-- 每遇到一条记录 就sleep 30秒  当前条件中会查出一条记录
select SLEEP(30) ,testtable.* from testtable where bus1 = 1; 
select testtable.* FROM  testtable where bus1 >= 1 and bus1  <= 3;
commit;

After execution, it will select SLEEP(30) ,testtable.* from testtable where bus1 = 1;sleep here for 30 seconds, and then execute:

-- 设置当前session事务不自动提交
set SESSION autocommit = 0;
START TRANSACTION;
insert into testtable values(NULL,2,'我是事务A在事务B中插入的数据');
COMMIT;

After waiting for 30 seconds, the result of the first select statement of transaction B is found: the result of the

Insert picture description here


second select statement is: the result of the

Insert picture description here


third select statement is:

Insert picture description here


aside the second select is used for sleep for 30 seconds, the first One is the same as the third, what about a good phantom reading ? ? ? ? ? ? ? ?

Innodb's MVCC mechanism

If the storage engine is Innodb, the above-mentioned phantom read under the repeatable read isolation level does not exist, because Innodb uses MVCC multi-version concurrency control. In the same transaction, the first select statement will let Innodb compare the following from from The table takes a snapshot, and the reading of this table is frozen at the moment when the snapshot is taken. No matter how the table is read in the future, the data after the snapshot cannot be read. Therefore, the phantom reading is under the Innodb storage engine. nonexistent.

Primary key conflict scenario

There are opinions on the Internet that when transaction A is queried, a certain primary key does not exist, transaction B inserts this primary key, and then transaction A also inserts this primary key, causing conflicts. This is phantom reading. Reproduce this scenario.

  1. Empty the table testtable, and then insert 4 pieces of data:
insert into testtable values(1,11,'第一条数据');
insert into testtable values (3,22,'第二条数据');
insert into testtable values (4,33,'第三条数据');
insert into testtable values (10,44,'第四条数据');
  1. Query all data in transaction A, then update the data with id=3, and then query all the data without committing the transaction. Transaction B adds a new piece of data with primary key id=2, submit the transaction, and look at the result:
-- 事务A
-- 设置当前session的事务隔离级别为可重复读
set SESSION autocommit = 0;
set session transaction isolation level repeatable read;
START TRANSACTION;
select * from testtable ;
-- select * from testtable t2 ;
-- insert into testtable values(2, '66','第六条数据');
-- commit;
-- 事务B
-- 设置当前session的事务隔离级别为可重复读
set SESSION autocommit = 0;
set session transaction isolation level repeatable read;
START TRANSACTION;
insert into testtable values(2, '55','第五条数据');
commit;

Execute transaction A first, then execute transaction B, and then execute the last 3 lines of transaction A:

-- 事务A
-- 设置当前session的事务隔离级别为可重复读
-- set SESSION autocommit = 0;
-- set session transaction isolation level repeatable read;
-- START TRANSACTION;
-- select * from testtable ;
select * from testtable t2 ;
insert into testtable values(2, '66','第六条数据');
commit;

In the query statement of transaction A, the result is the same as the previous query:

Insert picture description here


However, when it is executed to insert, the primary key conflict is reported.

Insert picture description here


I don’t know if this is considered a phantom read, but it can be explained that under the RR isolation level, because the first select of transaction A is before transaction B, a snapshot will be formed at this time, so the operation of transaction B is invisible (MVCC Mechanism), if transaction A is in the following situation, you can see the data submitted by transaction B:

-- 事务A
-- 设置当前session的事务隔离级别为可重复读
 set SESSION autocommit = 0;
 set session transaction isolation level repeatable read;
 START TRANSACTION;

update testtable set bus1 = '77' where id = 3;
 -- select * from testtable ;
--  insert into testtable values(2, '66','第六条数据');

-- commit;

The operation steps are the same as above. The difference between this sql and the previous transaction A is that the first select is replaced with update, the reason is that the transaction will be officially started in the first sql, if only START TRANSACTION;it will not start. After executing transaction B, when you execute the commented code, you can see the data submitted by transaction B.

Insert picture description here


At this time, it is obvious that the primary key conflicts when executing the following insert.
As for whether this is considered a phantom reading, the benevolent sees the benevolent and the wise.
But can this kind of primary key conflict problem be solved? The answer is yes.

select lock in share mode and select for update

The select lock in share statement will take the qualified data as the sharing mode, and it is the current read (that is, reading the latest database data, not snapshot reading). Updates are not allowed in the sharing mode.

  1. Clear data and reinsert
TRUNCATE table testtable;

insert into testtable values(1,11,'第一条数据');
insert into testtable values (3,22,'第二条数据');
insert into testtable values (4,33,'第三条数据');
insert into testtable values (10,44,'第四条数据');
  1. Compare snapshot read and current read
  • This is a snapshot read, read the data in the current table
-- 事务A
set autocommit = 0;
start transaction;

select * from testtable t ;


-- select * from testtable lock in share mode;
Insert picture description here


Then update with another session

-- 事务B
set autocommit = 0;
start transaction;

update testtable set bus2 = '更新更新更新' where id = 3;

commit;

Now go back to transaction A and execute the commented statement:

 select * from testtable lock in share mode;

It can be seen that the latest data is read: it is different

Insert picture description here


from the above without lock in share mode, then we will read the full table data again in the current transaction, without lock in share mode, this is the case:

Insert picture description here


this time is a snapshot Read, the read is the same as the first read. This is the difference between the current read and the snapshot read.
At this time, all data is locked by lock in share mode. Let's try to update it. I modified the sql of transaction B and re-executed it. The result suggested: the

Insert picture description here


modified sql is as follows:

-- 事务B
set autocommit = 0;
start transaction;

update testtable set bus2 = '我是lock in share mode更新测试' where id = 3;

commit;

In this case, all rows in the filter range can be locked. If the transaction A is longer, it will particularly affect the update performance, and it will not be affected if it is read. There is also a select for update, which is called an exclusive lock, which means
complete Exclusive, not even reading, this kind of lock is similar to pessimistic lock, even more ruthless than lock in share mode.

So is there a way to solve this problem? Yes, it is Next-Key Lock

Row lock, gap lock, Next-Key Lock

Row lock: The lock is added directly to a row.
Gap lock: The lock is added to the non-existent free space, which may be between two index records, or the space before the first index record or after the last index record.
Next-Key lock: It is used together with row lock + gap lock.
For example, in the above example, if you lock id=3, then **[1,3),(3,4]** will be added a gap lock, the lock itself does not lock the current row, the current row is locked by the row lock.

By default, Mysql does not turn on Next-Key Lock, you can use the following command to view

show variables like 'innodb_locks_unsafe_for_binlog';
Insert picture description here


I personally think that the reasons why Mysql does not open the gap lock by default are as follows:

  1. Gap lock affects performance. If the lock locks the largest id in the table, then the largest id is locked to positive infinity, which affects the normal insertion of data by other transactions, resulting in performance degradation
  2. The scenario of first querying whether it gradually exists, and then inserting it is not very likely to appear in the database. One reason is that databases designed with self-incrementing primary keys generally do not insert data in the gap, and let the database self-increase, and the other One reason is that if this kind of problem does occur, it is generally solved through distributed locks, and is rarely solved at the database level. After all, this is an operation that greatly affects database throughput.