This article mainly introduces mysql master-slave data consistency verification and error correction tools to everyone. The main content includes basic applications, practical skills, principles and mechanisms, and I hope it will be helpful to everyone.
If you are an operation and maintenance personnel, if you deploy the mysql system in your production environment, and if your online mysql is based on a master-slave replication architecture, congratulations, it may bring you inconsistent master-slave data "Bad luck".
Because of the native features of mysql replication architecture, the data on the master-slave server cannot be "synchronized" replication, so the delay is inevitable, even if it is not so busy on the server, when the business is not busy, the slave database can Keeping up with the progress of the master library may also cause the data written on the master library to be inconsistent with the data written on the slave library due to factors such as slave server crashes, illegal shutdowns, and program bugs. When this happens, there is no corresponding mechanism inside mysql to detect the consistency of master-slave data. For users, you do not know that the master-slave data is inconsistent.
Therefore, a tool is needed to solve such problems, and the pt-table-checksum tool in the percona-toolkit tool set is an efficient tool that can detect inconsistent master-slave data without affecting the performance of mysql. When the data inconsistency really occurs, the pt-table-sync tool is also provided in the percona-tools tool set to repair the inconsistent data, so as to avoid the trouble of re-deploying the slave server.
However, in a real production environment, these two tools still have certain limitations. To be precise, the asynchronous replication architecture of mysql has caused the limitations of the tools in use, because the slave library will be slower than the main library. Therefore, when verifying the table on the main library and verifying the table on the slave library, the data is always inconsistent. This is not caused by the delay of the slave library. Therefore, these two tools are best used in the following scenarios:
a ). When the slave server is upgraded to the master server, the new master server needs to be checked for data consistency with the old master server when the new master server goes online.
b) After data migration, the data consistency check should be carried out.
c) The slave database is mishandled. After causing the data to be updated, a consistency check should be carried out
d), the planned data consistency check
The following presentation is based on the article in an article to build a master-slave replication environment.
2. Installation of percona-toolkit tool
First install the dependent packages and percona-toolkit:
[[email protected] ~] yum -y install perl perl-devel libaio libaio-devel perl-Time-HiRes perl-DBD-MySQL perl-IO-Socket-SSL [[email protected] ~] yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm [[email protected] ~] yum install percona-toolkit
Three, create a new user
Create a non-root user to perform data consistency checking and other tasks,
pt-table-syncneed to link to the slave library for corresponding data viewing, data modification, etc., so create a user like this on the main library:
mysql> GRANT select ,insert,update,delete,create,process,super,replication slave ON *.* TO [email protected] '192.168.0.%' IDENTIFIED BY '111111' ; Query OK, 0 rows affected (0.01 sec) mysql> GRANT select ,insert,update,delete,create,process,super,replication slave ON *.* TO [email protected] '127.0.0.1' IDENTIFIED BY '111111' ; Query OK, 0 rows affected (0.00 sec) mysql> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec)
After the user is created, please test whether the main library and standby library can be accessed normally (because this user also needs to link to the main library, the above two users have been established). This user needs a lot of permissions, and there is no way. This is because
percona-toolkitthe permissions required by the working principles of the two tools in the middle are required. If it is to save trouble, you can also give the ALL permissions directly.
Fourth, the use of pt-table-checksum
Use the tb1 table in the mydb1 library as a test, and view the contents of tb1 on the main library:
mysql> SELECT * FROM mydb1.tb1; +----+-------+------+ | id | name | age | +----+-------+------+ | 1 | tom | 12 | | 2 | jem | 23 | | 3 | jason | 29 | | 4 | aaa | 30 | | 5 | b | 69 | +----+-------+------+ 5 rows in set (0.01 sec)
View the contents of mydb1.tb1 on the slave library:
mysql> select * from mydb1.tb1; +----+-------+------+ | id | name | age | +----+-------+------+ | 1 | tom | 12 | | 2 | jem | 23 | | 3 | jason | 29 | | 4 | aaa | 30 | | 5 | b | 69 | +----+-------+------+ 5 rows in set (0.00 sec)
At this time, the data of the master and the slave are consistent. Use the
pt-table-checksumtool to test to see what the output is:
[[email protected] ~] # pt-table-checksum --nocheck-replication-filters --replicate=mydb1.checksums --databases=mydb1 h=127.0.0.1,u=monitor,p=111111 Replica slave has binlog_format ROW which could cause pt-table-checksum to break replication. Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation. If you understand the risks, specify --no-check-binlog- format to disable this check. #报错了，由于个人mysql环境的二进制日志是基于行的，即‘binlog_format=ROW’，若是是基于行的复制环境，percona官方是不建议使用pt-table-checksum工具来进行数据的一致性检查的，但它又提供了一个选项来跳过此检查。
The meaning of each frequently used option:
--nocheck-replication-filters：不检查复制过虑，咱们用--databases来指定须要检查的数据库 --replicate：把校验的信息写入指定的表中 --no-check-binlog-format：不检查二进制日志文件格式 --replicate-check-only：只显示有不一致数据的信息 --databases：指定校验的数据库，多个用逗号隔开 --tables：指定校验的表，多个用逗号隔开 h：主机，指主服务器IP u：账号 p：密码
--no-check-binlog-formatTest it again after adding options:
[[email protected] ~] # pt-table-checksum --nocheck-replication-filters --replicate=mydb1.checksums --no-check-binlog-format --databases=mydb1 --h=127.0.0.1,u=monitor,p=111111 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 04-21T18:00:59 0 0 5 1 0 0.280 mydb1.tb1 04-21T18:00:59 0 0 2 1 0 0.331 mydb1.tb2 #注意观察“DIFFS”那一列，若是数据有不一致的这里不是“0”值。
no slaves were foundsimilar error may be reported after running the above command , which is caused by the inability to connect to the slave server. After running the above command, pt-table-checksum will take a recursive method to automatically find the master and slave servers after linking to the master mysql, first run it
show processlist;, and then run
show slave hoststo find out, if it encounters a problem that cannot be connected to the slave server When an error occurs, you can add in the my.cnf of the slave server
report_host=从服务器IPto actively tell the master server that it is the master slave, and
--recursion-method=hostsoptions to the running command , so that the master server can use the
show slave hostscommand to view the IP address of the slave server.
Now we artificially make the master-slave data inconsistent, and change the age column with id number 5 in the tb1 table to 20 on the slave server:
mysql> update mydb1.tb1 set age=20 where id =5; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from mydb1.tb1; +----+-------+------+ | id | name | age | +----+-------+------+ | 1 | tom | 12 | | 2 | jem | 23 | | 3 | jason | 29 | | 4 | aaa | 30 | | 5 | b | 20 | +----+-------+------+ 5 rows in set (0.00 sec)
In this way, the master-slave data is inconsistent. Let's run the pt-table-checksum tool on the master server to test it:
[[email protected] ~] # pt-table-checksum --nocheck-replication-filters --replicate=mydb1.checksums --no-check-binlog-format --recursion-method=hosts --databases=mydb1 h=127.0.0.1,u=monitor,p=111111 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 04-21T18:27:00 0 1 5 1 0 0.307 mydb1.tb1 04-21T18:27:01 0 0 2 1 0 0.048 mydb1.tb2 #tb1这个表的"DIFFS"的值变为了"1"。
Next, the pt-table-sync tool comes on stage.
Five, the use of pt-table-sync
[[email protected] ~] # pt-table-sync --replicate=mydb1.checksums h=127.0.0.1,u=monitor,p=111111 h=192.168.0.202,u=monitor,p=111111 --charset=utf8 --print REPLACE INTO `mydb1`.`tb1`(` id `, `name`, `age`) VALUES ( '5' , 'b' , '69' ) /*percona-toolkit src_db:mydb1 src_tbl:tb1 src_dsn:A=utf8,h=127.0.0.1,p=...,u=monitor dst_db:mydb1 dst_tbl:tb1 dst_dsn:A=utf8,h=192.168.0.202,p=...,u=monitor lock:1 transaction:1 changing_src:mydb1.checksums replicate:mydb1.checksums bidirectional:0 pid:3205 user:root host:master*/; #上边的输出信息表示从库上id=5那行的age的值应该是69。 #命令中有两组 "h= ,u= ,p= "，第一组指定的是主服务器，第二组指向从服务器。
The meaning of each frequently used option:
--replicate= ：表示基于pt-table-checksum工具生成的checksums表来修复有问题的数据 --databases=：表示执行同步的数据库，多个用逗号隔开 --tables=： 表示执行同步的数据表，多个用逗号隔开 h= ：服务器主机名 u= ：账号 p= ：密码 --print：只打印，但不执行命令 --execute：执行命令
It really inconsistent data confirm the
--executeto replace the implementation of the statement:
[[email protected] ~] # pt-table-sync --replicate=mydb1.checksums h=127.0.0.1,u=monitor,p=111111 h=192.168.0.202,u=monitor,p=111111 --charset=utf8 --execute
From the data manually correct before running the library
pt-table-checksumtools look at the data in both tables is already in agreement:
[[email protected] ~] # pt-table-checksum --replicate=mydb1.checksums --nocheck-replication-filters --no-check-binlog-format --databases=mydb1 h=127.0.0.1,u=monitor,p=111111 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 04-21T21:42:31 0 0 5 1 0 0.309 mydb1.tb1 04-21T21:42:32 0 0 2 1 0 0.304 mydb1.tb2
In this way, the data is successfully corrected.
Six, my summary
These two tools are usually used in combination to make up for mysql's lack of data consistency checking mechanism, so that operation and maintenance personnel can maintain better in the master-slave replication architecture. Based on the official instructions of percona, in the pt-table-checksum tool, it is best to make the replication be statement-based replication, while statement-based and row-based replication have their own advantages and disadvantages. If you consider that it will be often used in later maintenance For the pt-table-checksum tool, I think the binlog_format should still be set to statement or mixed.
The last thing to say is that if the inconsistency between the primary and backup data is really caused in the production environment, rather than caused by delay, then remember to back up the source data when using these tools to manipulate the data, regardless of whether the source data is incomplete. Yes, some data has been damaged. You must make a backup of the source data before doing data restoration work. In a high-pressure environment like data restoration, who can guarantee that your operations are standardized and correct ? If you make a mistake, at least you still have a chance to roll back.