MySQL log system: how a SQL "update statement" is executed

redo log and binlog:

Execute the following update statement:

mysql> create table T (ID INT PRIMARY KEY, c INT);
mysql> update T set c = c+1 where ID = 2;

Unlike the query process, the update process also involves two important log modules:
redo log (redo log) and binlog (archive log).

The technology used by redo log becomes the'WAL technology' , Write-Ahead Logging. Its key point is: 'write the log first, then write to the disk' .

The redo log is unique to the InnoDB storage engine. With the redo log, InnoDB can guarantee that even if the database is restarted abnormally, the previously submitted records will not be lost. This capability becomes'crash-safe' .

There are three differences between binlog and redo log:

① redo log is unique to InnoDB storage engine, binlog is implemented by MySQL Server layer, all storage engines can be used;
② redo log is a physical log, which records "what is modified on a certain data page" ( What is the result of the modification), binlog is the logic log, which records the original logic of the statement (what is the operation statement);
for example: assuming c=0 in the original library, after executing the update statement:
The content stored in the redo log is "c =1", "update c = c + 1" is stored in the binlog;
③ The redo log is written cyclically, and overwriting will occur when the space is full; the binlog is additional writing and will not be overwritten.

The execution flow chart of an update statement:

Insert picture description here

The execution process of an update statement: (refined process of two-phase commit)

① The executor first calls the interface of the storage engine "fetch the row ID=2". If the data page of the row ID=2 is in the memory, the storage engine directly returns it to the executor; otherwise, the storage engine needs to go first Read the memory into the disk, and then return;
② After the executor gets the row of data returned by the storage engine, it updates it, adds 1 to the value of c to get the new data, and then calls the interface of the storage engine to write Enter this row of data;
③ After the storage engine receives the new result of the row of data written by the executor, it first saves the update record in the memory, and writes the update record to the redo log to update the status of the redo log For prepare . Then it returns the result to the executor;
④ After the executor knows that the storage engine has successfully written the update record to the redo log, it starts to write the update record to the binlog;
⑤ redo log calls fsync to write to disk;
⑥ binlog call fsync writes to the disk;
⑦ After the executor successfully writes to the binlog, the storage engine updates the status of the redo log to commit , and the update is complete.

Two-phase commit:

"Two-phase commit" refers to the "prepare + commit" two-phase state of redo log. The purpose is to prevent unexpected downtime between binlog and redo log, resulting in inconsistent content in the two logs, and ensure that binlog and redo log are all written Submit after completion.

Why are there two kinds of logs?

Speaking of history, the storage engine MySQL first used was MyISAM, and binlog was at the server layer. All storage engines can be used. MyISAM does not support crash-safe, and binlog can only be used for archiving. Later, InnoDB was introduced as a third-party plug-in, and redo log was implemented by itself to support crash-safe.

Why does redo log have crash-safe capability, but binlog does not?

This is determined by the characteristics of the two. Redo log is a fixed-size, "circular write" log file, which records the physical log; binlog is an unlimited size, "additional write" log file, which records the logical log.

A big difference between redo log and binlog is that one is circular writing and the other is additional writing. In other words, redo log will only record logs that have not been flushed, and the data that has been flushed to disk will be deleted from the limited-sized log file of redo log; binlog is an additional log, which stores the full amount of logs.

When the data crashes and you want to restore the data that has not been flushed but has been written to the redo log and binlog to the memory, the binlog cannot be restored. Although binlog has a full amount of logs, there is no sign for InnoDB to determine which data has been flushed and which data has not.

For example, suppose two logs recorded by binlog:

update T set c = c + 1 where ID = 2;
update T set c = c + 1 where ID = 2;

If the database crash occurs when the first update statement has been flushed but the second update statement is not flushed, after restarting, the database cannot determine which of the two statements has been written to the disk and which is not. Into the disk, it is impossible to restore the database state based on the binlog.

But the redo log is different. As long as the data written to the disk is erased from the redo log, after the database is restarted, the data in the redo log can be directly restored to the memory. This is why redo log has crash-safe capabilities, but binlog does not.

Reference content: