[Alibaba database interview questions solution] MySQL high availability principle

In a master-backup relationship, each backup database receives and executes the binlog of the master database.

Under normal circumstances, as long as all the binlogs generated by the main library perform the update can be transferred to the standby database and be executed correctly, the standby database can reach the state consistent with the main database, which is the final consistency .

But for MySQL to provide high-availability capabilities, only eventual consistency is not enough. why?

  • MySQL master/backup switching process-double M structure

Active/standby delay

Active/standby switching may be:

  • Active operation and maintenance actions,
    such as software upgrades, the machine where the main library is located offline as planned, etc.
  • Passive operation,
    such as power failure of the machine where the main library is located.

Synchronization delay

The time points related to data synchronization mainly include the following three:

  1. The main library A completes a transaction and writes it to binlog. This time is recorded as t1
  2. Then it is passed to standby database B, and the time when standby database B finishes receiving the binlog is recorded as t2
  3. The standby database B completes the transaction, and this time is recorded as t3

Standby delay, is the same transaction, the standby database execution completion time and the main library execution completion time between a difference, i.e., t3-t1.

You can execute show slave status in the standby database , and its return result will display SBM (SBM for short), indicating how many seconds the current standby database is delayed.
SBM calculation method:

  1. The binlog of each transaction has a time field to record the time written on the main library
  2. The standby database takes out the value of the time field of the currently executing transaction, calculates the difference between it and the current system time, and obtains the SBM .

In fact, SBM is t3-t1. Therefore, SBM can be used as the main and standby delay value, and the time precision of this value is s.

  • If the system time settings of the main and standby machines are inconsistent, won't the values ​​of the main and standby delays be inaccurate?
    Will not. Because, when the standby library is connected to the main library, it will obtain the current main library system time by executing the SELECT UNIX_TIMESTAMP() function. If it is found that the system time of the main database is inconsistent with itself at this time, the standby database will automatically deduct the difference when performing the SBM calculation.

When the network is normal, the time required for the log to be transmitted from the main database to the standby database is very short, that is, t2-t1 is very small. That is, under normal network conditions, the main source of delay between the active and standby databases is the time difference between the completion of the binlog received by the standby database and the completion of the transaction.

Therefore, the most direct manifestation of the delay between the active and standby databases is that the speed at which the standby database consumes relay logs is slower than the speed at which the main database produces binlogs. What could be the cause of this?

The source of the active and standby delay

备库所在机器的性能 < 主库所在的机器性能

The person who deploys will think, anyway, there is no request for the backup database, so you can use almost a machine. Or put 20 main libraries on 4 machines, and concentrate the standby libraries on one machine.

However, there is no difference between the pressure of the update request on the IOPS between the main database and the standby database. Therefore, when doing this kind of deployment, the standby database is generally set to "non-double 1" mode.

But in fact, a large number of read operations are also triggered during the update process. Therefore, when multiple standby databases on the standby database host are competing for resources, it may cause delays in the active and standby.

This deployment is now less. Because the main and standby databases may switch, and the standby database may become the main database at any time, the main and standby database must use the same specification machines and be deployed symmetrically.

We also did a symmetric deployment, but there was still a delay, why?

It is likely that the pressure on the backup database is high. Since the main library provides write capabilities, the standby library can provide some read capabilities. Or some analysis statements required by the operation background cannot affect normal business, so they can only be run on the standby database.

Since the main library directly affects the business, you will be more restrained in using it, but neglect the pressure control of the standby library. As a result, the query on the standby database consumes a lot of CPU, which affects the synchronization speed = "the delay of the master and the standby.

At this time, it can generally be handled like this:

  • One master, multiple slaves
    In addition to the standby library, you can connect several more slave libraries to share the reading pressure. This scheme is mostly adopted because the database system must ensure that it has the capability of regular full backup. The slave library is very suitable for backup.
  • Output to external systems
    such as Hadoop through binlog , allowing external systems to provide statistical query capabilities.
The slave library and the standby library are actually similar in concept. Generally, the one that will be selected as the new master database during the HA process is called the standby database, and the others are called the slave database.

We have also adopted one master and multiple slaves to ensure that the pressure of the standby database will not exceed the main database, but the delay between the master and the standby database, why?

It may be a big deal. Because in the main database, the binlog must be written after the transaction is completed, and then passed to the standby database. Therefore, if a statement of the main library is executed for 10 minutes, the transaction may cause a delay of 10 minutes from the library.

delete deletes too much data at once

For example, for some archived data, the historical data is not deleted at ordinary times. When the space is almost full, the SE will delete a large amount of historical data at one time. It is also necessary to avoid peak periods, so these large amounts of data deletion will be performed at night.

As a result, the DBA received a delayed alarm in the middle of the night. Then, when the DBA requires you to delete data later, you must control the amount of data deleted by each transaction and divide it into multiple deletions.

Big table DDL

For the planned DDL, it is recommended to use the gh-ost solution

Our main library doesn't have any major affairs, so how come the main and backup delays are paid back?

Probably because of the parallel replication capability of the standby database.

Other situations

TODO.

Due to the existence of the active/standby delay, it is different when the active/standby switchover occurs

Strategy

Reliability priority strategy

For example, the initial dual M architecture, the switching process is as follows:

  1. Determine the current SBM of standby database B. If it is less than a certain value (for example, 5s), continue to the next step, otherwise continue to retry the step
  2. Change the main library A to read-only state, that is, set readonly to true
  3. Determine the SBM value of standby database B until the value = 0
  4. Change the standby database B to a readable and writable state: set readonly to false
  5. Cut the business request to standby database B

The handover is generally completed by the HA system.

MySQL reliability priority primary/standby switching process

There is an unavailable time in this switching process. Because after step2, both A and B are readonly, the system is not writable at this time, and will not be restored until step5 is completed.

In this unavailable process, step 3 is more time-consuming, which may take a few seconds. This is why you must make a judgment in step 1 to ensure that the SBM is small enough.

If the main-standby delay is as long as 30 minutes at the beginning, and the system is unavailable for as long as 30 minutes without judgment first, the general business is not acceptable.

The unavailable time of the system is determined by the strategy of giving priority to the reliability of the data. You can also choose a strategy that prioritizes availability to reduce this unavailability time to almost zero.

Usability Priority Strategy

If I forcibly adjust steps 4 and 5 to the very beginning, that is to say, without waiting for the synchronization of the main and standby data, directly switch the connection to the standby database B and make the standby database B available for reading and writing, then the system will have almost no unavailable time Up.

We refer to this switching process as the availability priority process for the time being. The cost of this switching process is that there may be data inconsistencies.

Next, I will share with you an example of data inconsistency produced by the availability priority process. Suppose there is a table t:

CREATE TABLE `t` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `c` int(11) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

insert into t(c) values(1),(2),(3);

After initializing the data, there are 3 rows of data on the main database and the standby database. Next, the business person should continue to execute the two insert statement commands on the table t, which are in order:

insert into t(c) values(4);
insert into t(c) values(5);

Suppose that there are a large number of updates to other data tables on the main database, causing the delay of the main and standby to reach 5s. After inserting a statement with c=4, a master/backup switch was initiated.

Availability priority strategy, and the switching process and data results when binlog_format=mixed.

step2: After the main library A executes the insert, a row of data (4,4) is inserted, and then the master/backup switch is started

step3: Due to the 5s delay between the active and standby, the standby database B has not had time to apply the "insert c=4" transfer log, and it starts to receive the client's "insert c=5" command

step4: The standby database B inserts data (4,5), and sends the binlog to the main database A

Step5: The standby database B executes the transfer log "insert c=4" and inserts a row of data (5, 4). The "insert c=5" statement executed directly in the standby database B is passed to the main database A, and a new row of data (5, 5) is inserted.

In the end, the two rows of inconsistent data on A and B are caused by the availability priority process.

What if I still want to use the availability priority strategy , but set binlog_format=row ?

When the row format records binlog, all field values ​​of the newly inserted row will be recorded, so there will only be one row inconsistent at the end. And the application threads that are synchronized between the two sides will report a duplicate key error and stop. That is to say, in this case, the two rows of data of B's ​​(5,4) and A's (5,5) will not be executed by the other party:

  • Availability priority strategy, and binlog_format=row

So using the row format, data inconsistencies are easier to be found. With mixed and statement, the data is likely to be inconsistent quietly. If you find that the data is inconsistent after a long time, you may have to delete the database and run away.

The availability priority strategy of active /standby switchover will cause data inconsistency. Therefore, it is more recommended to use the reliability first strategy . After all, for data services, data reliability> availability .

Is there any situation where the priority of data availability is higher?
The role of a library is to record operation logs. At this time, if the data is inconsistent, it can be repaired through binlog, and this short-term inconsistency will not cause business problems.
At the same time, the business system relies on the writing logic of this log. If the library is not writable, online business operations cannot be executed.
At this time, you may need to forcibly switch first, and then fill in the data afterwards.
After reviewing the situation, I thought of an improvement measure: Let business logic not rely on the writing of this type of log. That is, the log writing to this logical module should be downgradable, such as writing to a local file or another temporary library.

In this scenario, the reliability priority strategy can be used .

According to reliability first, what will be the effect of abnormal switching?
Assume that the main-standby delay between the main library A and the standby library B is 30 minutes. At this time, the main library A is powered off, and the HA system needs to switch B as the main library. In the active handover, you can wait until the active-standby delay is less than 5s before starting the handover, but at this time there is no choice.

  • Reliability priority strategy, the main library is not available

The reliability priority strategy is adopted , and the switch can only be performed after the SBM of standby database B is =0. But now it is more serious than just before. It is not that the system is read-only and not writable, but that the system is completely unusable. Because, after the main library A is powered off, our connection has not been switched to the standby library B.

  • Can I switch directly to standby database B, but keep B read-only?
    No way. Because, during this period of time, the application of the transfer log has not been completed. If the active/standby switch is initiated directly, the client query cannot see the previously executed transaction, and it will be considered "data loss".

Although the data will be restored as the transfer log continues to be used, for some businesses, the "temporary data loss status" cannot be accepted.

On the premise of satisfying data reliability, the availability of MySQL's high-availability system depends on the delay of active and standby. The smaller the delay, the shorter the service recovery time and the higher the availability when the main library fails.