MySQL8.0 asynchronous replication source asynchronous_connection_failover

MySQL replication source, in simple terms, is to configure multiple available replication sources. When a replication source is unavailable (downtime, replication link interruption), it automatically selects a new source according to the weight to continue synchronization. A bit similar to MongoDB's replication mechanism, but there are still some differences.

1. The copy source starts

The following official explanation:
Starting from MySQL8.0.22, asynchronous connection failover mechanism can be used. After an existing connection from a replica to its source fails, an asynchronous (source-to-replica) replication connection to the new source is automatically established according to the weight. Asynchronous connection failover mechanism can be used to synchronize with multiple MySQL servers or server groups that share data and keep replicas in sync. The list information of the source server is stored on the target node.

MySQL 8.0.23 version supports group replication topology, by automatically monitoring changes to group members and distinguishing between primary and secondary servers. When a group member is added to the source list and defined as part of a managed group, the asynchronous connection failover mechanism will update the source list to keep it consistent with member changes, and automatically add and remove group members when they join or leave . Only the majority of online group members are used to connect and obtain status. If the last member of the management group leaves the group, the member will not be deleted automatically so that the configuration of the management group is preserved.

2. Implementation

Before understanding the replication source, I still remember the master slave terminology event in July 2020. In response to this part, the official gave a new naming method. The MySQL version 8.0.23 started to be replica and source. Of course, the previous method is also supported. The 5.7 version does not have this treatment.
The following is the new copy command mode:

image.png
image.png
Copy source:

To activate the copy channel, you need to turn on this key parameter

 

SOURCE_CONNECTION_AUTO_FAILOVER=1

Switching principle

When an existing connection to the source fails, the replica first retries the same connection, using the SOURCE_CONNECT_RETRY and SOURCE_RETRY_COUNT parameters. When these attempts are exhausted, the asynchronous connection failover mechanism will take over.

The asynchronous connection failover mechanism is activated after the connection of the replica to the source fails, and it issues a START REPLICA statement to try to connect to a new source.

Currently, if the replication I/O thread is stopped due to source stop or network failure, the connection will be interrupted. In any other case, such as when the replication thread is stopped by a STOP REPLICA statement, the connection will not fail over.

3 key points of parameters

  • SLAVE_NET_TIMEOUT: indicates that the slave has not received any data (including binlog, heartbeat) from the master within the slave_net_timeout time. The slave thinks that the connection is broken and will reconnect.
  • SOURCE_CONNECT_RETRY parameter specified. The default value is 60s. Indicates the time interval for reconnection. After slave_net_timeout times out, reconnect immediately,
  • SOURCE_RETRY_COUNT indicates the maximum number of reconnections. The default value is 86400 times.

Suitable values ​​are SOURCE_RETRY_COUNT=3 to retry the connection 3 times and S
OURCE_CONNECT_RETRY=10 with an interval of 10 seconds.

interface

Add and delete copy source operations:

  • Master-slave replication function:
    asynchronous_connection_failover_add_source
    asynchronous_connection_failover_delete_source
#asynchronous_connection_failover_add_source(channel, host, port, network_namespace, weight) SELECT asynchronous_connection_failover_add_source('channel2', '127.0.0.1', 3310, '', 80);SELECT asynchronous_connection_failover_delete_source('channel2', '127.0.0.1', 3310, '');
  • MGR function:
    asynchronous_connection_failover_add_managed
    asynchronous_connection_failover_delete_managed
#asynchronous_connection_failover_add_managed(channel, managed_type, managed_name, host, port, network_namespace, primary_weight, secondary_weight)SELECT asynchronous_connection_failover_add_managed('channel2', 'GroupReplication', 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa', '127.0.0.1', 3310, '', 80, 60);SELECT asynchronous_connection_failover_delete_managed('channel2', 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa');

The host name, port number, network namespace, and weighted priority of the MySQL instance (1-100, 100 is the highest priority) in order to be added or deleted from the source list of the channel.
For MGR, you can also specify the type of management service (currently only "group replication" is available) and the identifier of the management group (for "group replication", this is the value of the system variable group_replication_group_name), you only need to add a group member, the copy will The remaining members of the current group members are automatically added.

System view:

The source information is stored in the mysql library
1.mysql.replication_asynchronous_connection_failover
2.mysql.replication_asynchronous_connection_failover_managed
3. TRUNCATE TABLE is not allowed in the official tips of the above two tables. Test the actual situation can be truncate.

The view is under Performance Schema
1.
replication_asynchronous_connection_failover
2. replication_asynchronous_connection_failover_managed 3. The copy uses a monitoring thread to track the membership of the managed group and update the source list (thread/sql/replica_monitor).

Once you understand the key indicators and interfaces, you can actually move it.

3 common scenarios:

Set the source list on the copy of the channel in the copy area. You can use asynchronous_connection_failover_add_source and asynchronous_connection_failover_delete_source udf to set and manage the source list to add and delete a single replication source server

Add source:

SELECT asynchronous_connection_failover_add_source('channel_rpl', '10.100.20.206', 3306, '', 70);SELECT asynchronous_connection_failover_add_source('channel_rpl', '10.100.20.206', 3307, '', 80);
image.png

Delete source:

SELECT asynchronous_connection_failover_delete_source('channelS206', '10.100.20.206', 3307, '');

Configuration replication:

SET GLOBAL slave_net_timeout=2; CHANGE REPLICATION SOURCE TO  SOURCE_HOST='10.100.20.206',  SOURCE_USER='repl',  SOURCE_PASSWORD='123456',  SOURCE_PORT=3306,  SOURCE_CONNECTION_AUTO_FAILOVER=1,   SOURCE_AUTO_POSITION = 1,  SOURCE_CONNECT_RETRY = 1,  SOURCE_RETRY_COUNT = 6  FOR CHANNEL 'channel_rpl' ;
image.png

Problems:

  • Under the master-slave architecture, if the master library M suddenly goes down, the slave library S will be the new replication source. The slave library S has been promoted to the master, but at this time the master library suddenly recovers, and the copy source will refer to the original M again.
  • Before switching, if it is written from the transaction before the library S and is purged, the replication will fail after the replication source is switched.

With the introduction of the replication source, under the master-slave architecture, it is necessary to ensure the control of the replication source change (the current script + mysql original table directly updata mode), which should be more convenient.
It can be effectively switched under the dual main structure, as long as it is guaranteed that there is no split brain, it is very useful.

Add source:

#确认组名SELECT @@group_replication_group_name;#集群节点信息select * from performance_schema.replication_group_members;#复制源添加集群SELECT asynchronous_connection_failover_add_managed('channel_rpl', 'GroupReplication', 'd150c201-c02c-11eb-a374-38f3ab632406', '10.132.20.206', 3306, '', 80, 60);
image.png

Delete source:

SELECT asynchronous_connection_failover_delete_managed('channel_rpl', '02d95097-c036-11eb-9dff-38f3ab632406');
image.png

Remarks: From one MGR to another MGR, configuration items need to be copied from multiple sources when configuring values

CHANGE REPLICATION FILTER filter[, filter]	[, ...] [FOR CHANNEL channel]

Use mysql-shell to create a cluster. Meta-information is in the mysql_innodb_cluster_metadata database. Changed information will also be synchronized. In addition, the internal users of the mysql database will also be synchronized. It is not allowed between different MGR clusters. Unless you do not use the shell to create.

Problems:

  • group_replication_exit_state_action: more critical, it is recommended to choose ABORT_SERVER
parametersignificance
READ_ONLYDisable write operations on the server
ABORT_SERVERShut down the server
OFFLINE_MODEClose all links and prohibit users without CONNECTION_ADMIN or SUPER authority from establishing new connections

After the target-side MGR cluster is destroyed, the target-side master node replicates the configuration and needs to be changed to a new node. Currently, there is no global configuration method, and then it needs to be processed manually.

After the source-end MGR cluster is destroyed, it may be connected to the disconnected node, and it needs to be tested and verified whether there are bugs or the like.

Dual-master V dual-master synchronization is controlled by different replication source channels, but the scene is too complicated and difficult to control, so you can give up.

image.png

Precautions:

  • The same replication user account and password must exist on all source servers in the source list of the channel. This account is used to connect to each source.
  • The replication user account must have SELECT permissions on the Performance Schema table, for example, GRANT SELECT ON performance_schema.* TO'repl';
  • The replication source is an asynchronous mechanism, so it is normal that there is a delay.
  • Another available server on the source list has a higher priority (weight) setting, and the asynchronous connection failover mechanism will also fail on the connection.

to sum up:

From one cluster (master-slave, dual-master, MGR) to another master-slave cluster, it can be used well, but there are some problems. As mentioned above, when a cluster splits multiple times, whether the synchronization source is valid or not is impossible. Effective judgment. Need to be artificially dry.

It is not recommended to use it from MGR to MGR. Heartbeat problems cause split brain and other aspects cannot be effectively confirmed. There is always a process of familiarity and understanding for new things

In terms of stability, there are currently no cases. It can achieve error-free heartbeat detection, and normal switching under various scenes is at least as effective as master-slave, and can only be optimized slowly.

try harder!