Why is the standby database delayed for several hours?

The cause of the accident mentioned in the previous article, whether it is occasional query pressure or backup, the impact on the standby database delay is generally on the order of minutes, and it can catch up after the standby database returns to normal.

However, if the execution log speed of the standby database is continuously lower than the log generation speed of the main database, the delay may be hours. And for a main library that is under constant high pressure, the standby library may never catch up with the rhythm of the main library.

This involves the topic of this article: the parallel replication capability of the standby database.

  • Main and standby flow chart

The two black arrows in the picture:

  • One represents that the client writes to the main library, and the
    parallelism is higher than the next
  • The other represents the sql_thread execution transfer log on the standby database (relay log)

The reason that affects the concurrency on the main library is the lock. Since the InnoDB engine supports row locks, except for extreme scenarios where all concurrent transactions are updating the same row (hot row), its support for business concurrency is still possible. Therefore, when you perform performance testing, you will find that the concurrent stress test thread 32 has a higher overall throughput than the single thread.

The log is executed in the standby database, that is, the logic of sql_thread updating data (DATA) on the standby database in the figure. If a single thread is used, it will cause the standby database to apply logs not fast enough, causing delays in the active and standby.

Prior to version 5.6, MySQL only supported single-threaded replication, which resulted in severe master-slave delays when the main database concurrency was high and the TPS was high.

The evolution history of MySQL multithreaded replication

All multi-threaded replication mechanisms are to split the sql_thread with only one thread in the picture into multiple threads, that is, all conform to the following model:

coordinator is the original sql_thread , but it no longer updates the data directly, it is only responsible Read transit logs and distribute transactions. The one that actually updates the log becomes a worker thread. The number of work threads is determined by the parameter slave_parallel_workers . It is recommended to set it as the best between 8~16 (32-core physical machine). After all, the standby database may also need to provide read queries, which cannot take up the CPU.

Can the transaction be distributed to the workers by polling?

No way. Because, after the transaction is distributed to the workers, different workers are executed independently. However, due to the scheduling strategy of the CPU, the second transaction may be executed before the first transaction. At this time, these two transactions update the same row, that is, the two transactions on the same row are executed in the opposite order on the main database and the standby database, resulting in inconsistency between the main and standby databases.

Can multiple update statements of the same transaction be distributed to different workers for execution?

No way. For example, a transaction updates each row in table t1 and table t2. If the two update statements are assigned to different workers, although the final result is the same as the master and slave, if the execution of table t1 is completed, the standby database has a query, then You will see that this transaction "updates half of the results", destroying the isolation of the transaction logic.

Therefore, when the coordinator is distributed, it needs to meet:

  • Can not cause update coverage. This requires that two transactions that update the same row must be distributed to the same worker
  • The same transaction cannot be separated and must be placed in the same worker

Parallel replication strategy of MySQL 5.5

The official MySQL 5.5 version does not support parallel replication. But someone wrote: a distribution strategy by table and a distribution strategy by row to help understand the iteration of the parallel replication strategy of the official version of MySQL.

Distribution strategy by table

If two transactions update different tables, they can be parallelized. Because the data is stored in the table, it is distributed according to the table to ensure that two workers will not update the same row.

Of course, if there is a cross-table transaction, the two tables should be considered together

  • Table Parallel Copy Process Model

Each worker thread corresponds to a hash table, which stores the tables related to the transactions currently in the "execution queue" of this worker. hash

  • key:库名.表名
  • value: number, how many transactions in the queue modify this table

When a transaction is assigned to a worker, the tables involved in the transaction will be added to the corresponding hash table. After the worker is executed, this table will be removed from the hash table.

Figure above hash_table_1 said it worker_1 the "pending transaction queue", there are four transactions involving db1.t1 table, there is a transaction involving db2.t2 table.

Assuming the situation in the figure, the coordinator reads a new transaction T from the transit log, and the rows modified by the transaction involve tables t1 and t3.

Now use the distribution process of transaction T to look at the distribution rules:

  1. Since transaction T involves modifying t1, and there are already other transactions Tx in the worker_1 queue modifying t1, T and the Tx transaction in the queue need to modify the data of the same table: T conflicts with worker_1
  2. According to this logic, judge whether T and each worker queue conflict in sequence, and you will find that transaction T also conflicts with worker_2
  3. T conflicts with more than one worker, and the coordinator thread enters waiting
  4. Each worker continues to execute while modifying hash_table. Assuming that the transaction that involves modifying t3 in hash_table_2 is completed, the hash_table_2 will be removeddb1.t3
  5. In this way, the coordinator will find that the worker conflicting with T is only worker_1 (no more than 1), so it will be assigned to worker_1
  6. The coordinator continues to read the next transit log and continues to allocate transactions

That is, when each transaction is distributed, the conflict relationship with all workers is as follows:

  • If it does not conflict with all workers, the
    coordinator thread will assign this transaction to the idlest worker;
  • The
    coordinator thread that conflicts with more than one worker enters the waiting state until there is only one worker that conflicts with this transaction.
  • The
    coordinator thread that only conflicts with a worker will assign the transaction to the worker with the conflicting relationship.

The distribution scheme by table works well in scenarios where multiple tables are evenly loaded. But if you encounter a hot table, for example, when all update transactions involve a certain table, all transactions will be assigned to the same worker, which becomes single-threaded replication.

Distribute by line

To solve the problem of parallel replication of hot tables, a parallel replication solution by row is needed.

  • Ideas
    If the two transactions do not update the same row, they can be executed in parallel on the standby database. So this mode requires binlog to be in row format.

At this time, to determine whether a transaction T and a worker conflict, the rule used is not "modify the same table", but "modify the same row".

The data structure of row-by-row replication is similar to that of table-based replication, and each worker is assigned a hash.
The key just distributed by line is 库名+表名+唯一键的值.

However, the unique key that only the primary key id enough, consider the following scenario, t1 addition to the main key, and a unique index a:

  `id` int(11) NOT NULL,
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `a` (`a`)

insert into t1 values(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5);

To perform two transactions in the main library:

update t1 set a=6 where id=1;
update t1 set a=1 where id=2;

The primary keys of the rows to be updated by these two transactions are different, but if they are assigned to different workers, session2 may be executed first. At this time, the value of a of the row with id=1 is still 1, and a unique key conflict will be reported.

Therefore, based on the row strategy, the unique key needs to be considered in the transaction hash table, that is, the key should be 库名+表名+索引a的名字+a的值.

Execute at t1

update t1 
set a=1 
where id=2

Record the value of each field before and after the modification of the entire row of data in the binlog. Therefore, when the coordinator parses the binlog of the statement, there are three items in the hash table of the transaction:

  • key=hash_func(db1+t1+"PRIMARY"+2), value=2
    value=2 is because the row id value before and after the modification is unchanged, and it appears twice
  • key=hash_func(db1+t1+"a"+2), value=1
    will affect the rows of this table a=2
  • key=hash_func(db1+t1+"a"+1), value=1
    will affect the row of this table a=1

Compared with the table-based parallel distribution strategy , the row-based parallel strategy consumes more calculations when deciding thread distribution.
These two schemes have some constraints:

  • To be able to parse out the table name, primary key value and unique index value from binlog. That is, the main library binlog must be row
  • The table must have a primary key
  • No foreign key
    If the table has a foreign key, the rows updated by cascade will not be recorded in the binlog, and the conflict detection will be inaccurate.

Fortunately, these are the DB production and use specifications.

The parallelism of the row-by-row distribution strategy is higher. However, if it is a large transaction, the line distribution strategy has the following problems:

  • Consumption of memory. For
    example, if a statement needs to delete 1 million rows of data, then the hash table has to record 1 million items.
  • It consumes CPU to
    parse binlog and then calculate the hash value. For large transactions, the cost is very high

Therefore, the distribution strategy by row needs to set a threshold. If a single transaction exceeds the set threshold for the number of rows (for example, if the number of rows updated by a single transaction exceeds 10w rows), it will temporarily degenerate to a single-threaded mode. The logic of the degeneration process is roughly like this of:

  1. coordinator temporarily hold this business
  2. Wait for all workers to complete execution and become an empty queue
  3. The coordinator directly executes this transaction
  4. Restore parallel mode

MySQL 5.6 official parallel replication strategy

The supported granularity is parallel by library. After understanding the two strategies mentioned earlier, you will know that in the hash table used to determine the distribution strategy, the key is the database name.

The parallel effect of this strategy depends on the pressure model. If there are multiple DBs in the main database, and the pressure of each DB is balanced, the effect of using this strategy will be very good.

Compared with distributing by table and row, this strategy has the following advantages:

  • The construction of the hash value is very fast.
    Only the library name is needed, and there are not many DBs on an instance.
  • The binlog format is not required
    because the binlog in the statement format can also easily get the library name

Failure in the following scenarios:

  • The tables on your main database are all placed in the same DB
  • Different DBs have different hotspots. For example, one is a business logic library and the other is a system configuration library, which can not achieve parallel effects.

In theory, you can create different DBs, evenly divide the tables of the same heat into these different DBs, and force this strategy. Not because of the need to move data specifically, this strategy is not used much.

MariaDB's parallel replication strategy

MariaDB's parallel replication strategy takes advantage of redo log group commit (group commit) optimization:

  • Transactions that can be submitted in the same group will never modify the same row
  • Transactions that can be executed in parallel on the main database, must also be executed in parallel on the standby database

In terms of implementation, MariaDB does this:

  1. The transactions submitted together in a group have the same commit_id, and the next group is commit_id+1
  2. commit_id is written directly to binlog
  3. When transferred to the standby database application, the transaction with the same commit_id is distributed to multiple workers for execution
  4. After all the execution of this group is completed, the coordinator goes to take the next batch.

The strategy was quite amazing at the time. Because the previous thinking in the industry was to "analyze binlog and split it into workers". The goal of MariaDB's strategy is to "simulate the parallel mode of the main library."

But there is a problem, it does not achieve the goal of "true simulation of the main library concurrency". On the main library, when a group of transactions are committing, the next group of transactions is in the "executing" state at the same time.

Looking at the figure below, assuming the execution of the three groups of transactions in the main library, trx4, trx5 and trx6 are being executed when trx1, trx2 and trx3 are submitted. In this way, when the first set of transactions is committed, the next set of transactions will soon enter the commit state.

  • Main library parallel transaction

According to MariaDB's parallel replication strategy, the execution effect on the standby database is as follows:

When the standby database is executed, the second set of transactions can only be executed after the first set of transactions are completely executed, so that the system throughput is insufficient.

In addition, the program is easily dragged down by big issues. Assuming trx2 is a very large transaction, then in the standby database application, after trx1 and trx3 are executed, they can only wait for trx2 to be completely executed before the next group can start execution. During this time, only one worker thread is working, which is a waste of resources.

But even so, this strategy is still a beautiful innovation. Because it has very little modification to the original system, and the realization is elegant.

Parallel replication strategy of MySQL 5.7

After the implementation of MariaDB parallel replication, the official MySQL 5.7 version also provides similar functions. The parameter slave-parallel-type controls the parallel replication strategy:

    means using the parallel strategy by library of MySQL 5.6 version;
    similar to MariaDB's strategy. However, the MySQL 5.7 strategy is optimized for parallelism. This optimization idea is also very interesting.

Can all transactions that are in the "executing state" at the same time be parallel?
Can't. Because, there may be transactions in a lock waiting state due to lock conflicts. If these transactions are allocated to different workers on the standby database, there will be inconsistencies between the standby and main databases.

The core of MariaDB's strategy is that "all transactions in the commit" state can be parallelized. The transaction is in the commit state, indicating that it has passed the lock conflict check.

  • Review the two-phase commit.

You don't have to wait until the commit phase. As long as you can reach the redo log prepare phase, it means that the transaction has passed the lock conflict check.

Therefore, the idea of ​​MySQL 5.7 parallel replication strategy is:

  • Transactions that are in the prepare state at the same time can be parallelized when the standby database is executed
  • The transaction in the prepare state and the transaction in the commit state can also be parallel when the standby database is executed

Parameters submitted by the binlog group:

  • binlog_group_commit_sync_delay
  • binlog_group_commit_sync_no_delay_count

These two parameters are used to deliberately lengthen the time of binlog from write to fsync, so as to reduce the number of binlog writes. In the parallel replication strategy of MySQL 5.7, they can be used to create more "transactions in the prepare phase at the same time." This increases the parallelism of standby database replication. That is to say, the two parameters can be used to make the submission of the main library slower "intentionally", and to make the standby database execute faster. When MySQL 5.7 deals with the delay of the standby database, you can consider adjusting the values ​​of these two parameters to achieve the purpose of improving the replication concurrency of the standby database.

Parallel replication strategy for MySQL 5.7.22

In the MySQL 5.7.22 version released in April 2018, MySQL added a new parallel replication strategy, parallel replication based on WRITESET .

A new parameter binlog-transaction-dependency-tracking has been added to control whether to enable this policy:

    represents the strategy introduced earlier, and judges whether it can be parallelized by entering prepare and commit at the same time.
    means that for each row involved in the transaction, the hash value of this row is calculated to form a set writeset. If two transactions do not operate on the same row, that is to say, their writesets have no intersection, they can be parallelized.
    has an additional constraint on the basis of WRITESET, that is, two transactions executed by the same thread in the main library should be executed in the same sequence when the standby library is executed.

For unique identification, this hash value is passed 库名+表名+索引名+值. If there are other unique indexes on a table in addition to the primary key index, then for each unique index, the writeset corresponding to the insert statement must add one more hash.

This is similar to the row-by-row distribution strategy based on MySQL 5.5 that we introduced earlier. However, the official MySQL implementation still has great advantages:

  • The writeset is written directly to the binlog after the main library is generated, so that when the standby library is executed, there is no need to parse the binlog content (the row data in the event), which saves a lot of calculation
  • There is no need to scan the binlog of the entire transaction to determine which worker to distribute to, which saves memory
  • Since the distribution strategy of the standby database does not depend on the binlog content, it is also possible that the binlog is in the statement format.

Therefore, the parallel replication strategy of MySQL 5.7.22 is guaranteed in terms of versatility.

Of course, for the scenarios of "no primary key on the table" and "foreign key constraint", the WRITESET strategy cannot be parallelized, and it will temporarily degenerate into a single-threaded model.


Why should there be multi-threaded replication? This is because the ability of single-threaded replication is lower than that of multi-threaded replication. For the main library with high update pressure, the standby library may not catch up with the main library. From the perspective of the phenomenon, the value of seconds_behind_master on the standby database is getting larger and larger.

After introducing each parallel replication strategy, I also shared with you the advantages and disadvantages of different strategies:

If you are a DBA, you need to choose different strategies according to different business scenarios;
if you are a business developer, I hope you can get inspiration from it and use it in your usual development work.
From these analysis, you will also find that large transactions will not only affect the main database, but also one of the main reasons for the delay of standby database replication. Therefore, in your usual development work, I suggest that you try to reduce the operation of large transactions and break them into small ones.

The new standby database parallel strategy of the official MySQL 5.7 version has modified the content of binlog, which means that the binlog protocol is not upward compatible. This factor needs to be taken into account when switching between active and standby and version upgrades.