Sentinel-Sentinel Mechanism

Redis: Sentinel-Sentinel Mechanism

Redis adopts the master-slave library mode. If the slave library hangs, the client can continue to send requests to the master library or other slave libraries for related operations, but if the master library fails, it will directly affect the slave library Synchronization, because the slave library does not have a corresponding master library to perform data copy operations. If Redis fails in the middle of the night, manual operation and maintenance needs to be switched, and the cost of manual operation and maintenance is too high. Therefore, we must have a high-availability solution to resist node failures. When a failure occurs, it can automatically switch from the master, the program does not need to be restarted, and the operation and maintenance can continue to sleep as if nothing happened. Redis officially provides such a solution-Redis Sentinel (Sentinel).

In the master-slave mode, if the main library hangs, it will cause the write service to be interrupted, and a new main library needs to be run, which will inevitably involve three problems:

  • Is the main library really down?
  • Which slave library to choose as the master library?
  • How to notify the related information of the new master library to the slave library and the client.

The basic process of the sentinel mechanism

Sentinel mechanism

The sentinel mechanism is: the Sentinel system (system) composed of one or more Sentinel instances (instance) can monitor any number of master servers, as well as all the slave servers under these master servers, and go offline on the monitored master server In the status, a slave server under the offline master server is automatically upgraded to a new master server, and then the new master server will replace the offline master server to continue processing command requests.

From the picture below, we can see that the master node is down, the original master-slave replication is also disconnected, and the client and the damaged master node are also disconnected. The slave node is promoted to the new master node, and other slave nodes establish a replication relationship with the new master node. The client continues to interact through the new master node. Sentinel will continue to monitor that the master node has died

Insert picture description here

Therefore, the sentinel mechanism is mainly responsible for the following three tasks:

  • Monitoring : Periodically send PING commands to all master and slave libraries to check whether they are still running online. If the slave library does not respond to the sentry's PING command within the specified time, the sentry will mark it as "offline state"; similarly, if the main library does not respond to the sentry's PING command within the specified time, the sentry will determine the main library Go offline and start the process of automatically switching the main library;
  • Master selection : After the main library is hung up, the sentry needs a lot of slave libraries. According to certain rules, select an instance of the slave library and use it as the new master library;
  • Notification : When performing notification tasks, the sentry will send the connection information of the new main library to other slave libraries, let them execute the replicaof command, establish a connection with the new main library, and perform data replication. At the same time, the sentry will notify the client of the connection information of the new main library, so that they can send the requested operation to the new main library;

The sentinel's judgment on the offline of the main library includes "subjective offline" and "objective offline". So, why are there two judgments? What are their differences and connections?

Subjective offline and objective offline

By default, Sentinel will send a PING command to all instances (including the master server, slave server, and other Sentinel) that have created a command connection with it at a frequency of once per second, and judge by the PING command reply returned by the instance Whether the instance is online.

The down-after-millisecondsoptions in the Sentinel configuration file specify the length of time it takes for Sentinel to determine that the instance enters the subjective offline: if an instance down-after-millisecondscontinuously returns invalid responses to Sentinel within milliseconds, then Sentinel will modify the instance structure corresponding to this instance. open attribute flags in SRI_S_DOWNidentification, in order to express this example has entered the offline state subjective.

The instance's response to the PING command can be divided into the following two situations:

  • Valid responses: return instance +PONG, -LOADING, -MASTERDOWNthree kinds of responses one;
  • Invalid reply: the instance returns other replies except for +PONG, - LOADING, and -MASTERDOWNthree replies, or no reply is returned within the specified time limit;

If the slave library is detected, the sentinel can simply mark the subjective offline, and the offline impact of the slave library will not be too great; if the detected is the master library, the master-slave switch cannot be simply marked as subjective offline , Because once the main library is not faulty, the misjudgment by the sentry will bring unnecessary calculation and communication overhead.

In order to avoid misjudgment, the sentinel mechanism generally adopts the mode of server cluster composed of multiple instances for deployment. Introducing multiple sentinel instances to judge together can prevent a single sentry from misjudging that the main database is offline due to its bad network. Because the probability of multiple sentinels being unstable at the same time is small, making decisions together can reduce the rate of misjudgment. The logic of sentinel cluster judgment is shown in the figure below:

Insert picture description here

Objective offline:
When there are N sentinel instances, it is better to have N/2 + 1 instances to judge the main library as "subjective offline", in order to finally determine the main library as "objective offline". In this way, the probability of misjudgment can be reduced, and unnecessary master-slave library switching caused by misjudgment can also be avoided. (Of course, how many instances make a "subjective offline" judgment can be set by the Redis administrator).

Election leader sentinel

When a primary server is judged to be objectively offline, the various Sentinels monitoring the offline primary server will negotiate, elect a leader Sentine l, and the leader Sentine l will perform a failover operation on the offline primary server.

  • Any one of multiple online Sentine ls monitoring the same main server may become the leader Sentinel.
  • After each time leading Sentinel election, regardless of whether the success of the elections, all configuration Sentinel era of (configuration epoch)value will be incremented once. The configuration epoch is actually a counter, nothing special.
  • Every Sentinel that finds that the main server enters the objective offline will ask other Sentinel to set itself as the local leader Sentinel;
  • When a Sentinel (source Sentinel) sends a SENTINEL is- master-down-by-addrcommand to another Sentinel (target Sentinel) , and the runid parameter in the command is not the * symbol but the run ID of the source Sentinel, it means that the source Sentinel requires the target Sentinel to set the former to the latter. Local leader Sentinel;
  • The rule for Sentinel to set the local leader Sentine l is first-come, first-served: the source Sentinel that first sends the setting request to the target Sentine l will become the local leader Sentinel of the target Sentinel, and all setting requests received afterwards will be rejected by the target Sentinel. .
  • After receiving the SENTINEL is-master-down-by-addrcommand, the target Sentinel will return a command reply to the source Sentinel. The leader_runidparameters and leader_epochparameters in the reply respectively record the running ID and configuration epoch of the local leader Sentinel of the target Sentinel.
  • After the source Sentinel receives the command reply returned by the target Sentinel, it will check leader_epochwhether the value of the parameter in the reply is the same as its own configuration epoch. If they are the same, the source Sentinel will continue to fetch the leader_runid parameter in the reply. If the value of the leader_runid parameter is the same as the source Sentinel's running ID is consistent, then it means that the target Sentinel has set the source Sentinel as the local leader Sentinel.
  • If a certain Sentinel is set as the local leader Sentinel by more than half of the Sentinel, then this Sentinel becomes the leader Sentinel. For example, in a Sentinel system composed of Sentinel 10 which, as long as there are not less than 10/2+ 1= 6a certain Sentinel Sentinel will lead to local Sentinel, it is provided that will become Sentinel Sentinel lead.
  • If within a given time limit, no Sentinel is elected as the leader Sentinel, then each Sentinel will conduct elections again after a period of time, until the leader Sentine l is elected.

SENTINEL is-master-down-by- addr <ip> <port> <current_epoch> <runid>
parametersignificance
ipThe main server ip judged to be offline by sentinel subjectively
portThe port number of the main server subjectively offline
current_epochUsed to elect the leader sentinel
runidIt may be *, or it may be the running id of sentinel

Election of a new main library

After the leader Sentinel is elected, the leader Sentinel will perform a failover operation on the main server that has gone offline. This operation includes the following three steps:

1) Among all the slave servers under the offline master server, select a slave server and convert it to the master server.

2) Change all the slave servers under the offline master server to replicate the new master server.

3) Set the offline master server as the slave server of the new master server. When the old master server comes back online, it will become the slave server of the new master server.

The leader Sentinel will save all the slave servers of the offline master server into a list, and then filter the list one by one according to the following rules:

1) Delete all the slave servers that are offline or disconnected in the list, which can ensure that the remaining slave servers in the list are all online normally.

2) Delete all INFOslave servers in the list that have not responded to the command of the leader Sentinel within the last five seconds , which can ensure that the remaining slave servers in the list have successfully communicated recently.

3) Delete all slave servers that have been disconnected from the offline master server for more than down-after-milliseconds*10milliseconds: The down-after-millisecondsoption specifies the time required to determine that the master server is offline, and deleting down-after-milliseconds*10slave servers whose disconnection time exceeds milliseconds can guarantee the remaining in the list None of the slave servers disconnected from the master server prematurely. In other words, the data saved by the remaining slave servers in the list is relatively new.

The process of selecting the new main library for the sentinel is a **"screening + scoring"**. Among multiple slave libraries, first remove the unqualified ones according to certain screening conditions. Then, according to certain rules, we will score the remaining slave libraries one by one, and select the highest scored slave library as the new master library, as shown in the following figure:

Insert picture description here


Filtering conditions:
we must first ensure that the selected slave library is still Run online. However, when the master is selected, the slave library is normally online, which can only indicate that the status of the slave library is good, and does not mean that it is the most suitable for the master library.

Imagine that if a slave library runs normally when the master is selected, we select it as the new master library and start using it. However, its network soon failed, and at this time, we had to re-elect the master. This is obviously not the result we expected.

When selecting the master, in addition to checking the current online status of the slave library, it is also necessary to judge its previous network connection status. If the slave library is always disconnected from the master library, and the number of disconnections exceeds a certain threshold, we have reason to believe that the network condition of the slave library is not very good, and the slave library can be screened out.

Use configuration items down-after-milliseconds * 10. Among them, it down-after-millisecondsis the maximum connection timeout time that we believe that the master-slave library is disconnected. If the down-after-millisecondsmaster and slave nodes are not connected through the network within milliseconds, we can consider that the master and slave nodes are disconnected. If the number of disconnections exceeds 10 times, it means that the network condition of the slave library is not good and it is not suitable for the new master library.

Scoring

Three rounds of scoring are carried out in sequence according to three rules. The three rules are slave library priority, slave library copy progress, and slave library ID number . As long as there is a secondary library with the highest score in a certain round, then it is the primary library, and the process of selecting the master is over. If there is no secondary library with the highest score, then proceed to the next round.

  • The first round: the highest priority slave library scores high : users can set different priorities for different slave libraries through the slave-priority configuration item. For example, if you have two slave libraries with different memory sizes, you can manually set a high priority to the instance with the larger memory. When selecting the master, the sentry will give high scores to the high priority slave library. If there is a slave library with the highest priority, then it is the new master library. If the priorities of the slaves are the same, then the sentry starts the second round of scoring.
  • Second round: The slave library that is closest to the old master library has a high score : choose the slave library that is closest to the old master library as the master library, then the new master library has the latest data. The master library will use master_repl_offset to record the current position of the latest write operation in repl_backlog_buffer, and the slave library will use the value of slave_repl_offset to record the current copy progress. The slave library we want to find, its slave_repl_offset needs to be closest to master_repl_offset. If among all the slave libraries, the slave_repl_offset of the slave library is the closest to the master_repl_offset, then it has the highest score and can be used as the new master library.

As shown in the figure below, the master_repl_offset of the old master library is 1000, and the slave_repl_offsets of slaves 1, 2 and 3 are 950, 990 and 900 respectively. Then, slave 2 should be selected as the new master. Of course, if there are two slave libraries whose slave_repl_offset values ​​are the same (for example, the slave_repl_offset values ​​of slave 1 and slave 2 are both 990), we need to give them a third round of scoring.

Insert picture description here
  • The third round: The small ID number of the slave library scores high : each instance will have an ID, which is similar to the number of the slave library here. At present, Redis has a default rule when selecting the main library: when the priority and replication progress are the same, the slave library with the smallest ID number has the highest score and will be selected as the new main library.

First, the sentry will filter out some of the slave libraries that do not meet the requirements according to online status and network status, and then score the remaining slave libraries according to priority, copy progress, and ID number. As long as there is a slave with the highest score When the library appears, select it as the new main library.

Sentinel Group

The sentinel mechanism can automatically switch between the master and slave libraries. By deploying multiple instances to make joint judgments, it can reduce the misjudgment rate of the offline of the master library. Once multiple instances form a sentinel cluster, even if a sentinel instance fails and hangs up, other sentinels can continue to cooperate to complete the work of switching between master and slave libraries, including determining whether the master library is offline, selecting a new master library, and Notify the slave library and the client.

When configuring sentinel information, each sentinel only needs to configure the following connection information and set the IP and port of the main library .

sentinel monitor <master-name> <ip> <redis-port> <quorum> 

Then each sentinel does not know how each other's configuration information forms a cluster. The composition and operating mechanism of the sentinel cluster will be introduced below.

Sentinel cluster composition based on pub/sub mechanism

The connection between sentinel instances is established through Redis's pub/sub mechanism—publish/subscribe mechanism .

When writing an application, you can publish and subscribe messages through Redis. In order to distinguish messages from different applications, Redis will establish different channels for classified management. When the message is in the same channel, information can be exchanged through the published message;

When Sentinel establishes a subscription connection with a master server or a slave server, Sentinel will send the following commands to the server through the subscription connection. In the master-slave cluster, there is a “__sentinel__:hello”channel named on the master library, through which different sentinels can discover each other and communicate with each other.

SUBSCRIBE __sentinel__:hello

Sentinel __sentinel__:hello's subscription to the channel will continue until Sentinel is disconnected from the server. This means that for each server connected with Sentinel, Sentinel both to the server via the command connection __sentinel__:helloto send information channels, in turn, connected through a subscription from the server __sentinel__:helloto receive information channel.

For example, suppose there are three Sentinels, sentinel1, sentinel2, and sentinel3, monitoring the same server, then when sentinel1 __se ntinel__:hellosends a message to the server’s channel including its own ip and port number (172.16.19.3:26579), all subscribed __sentinel__:helloThe Sentinel of the channel (including sentinel1 itself) will receive this message and obtain the ip address and port number of sentinel1.

Sentinel 2 and 3 can establish a network connection with Sentinel 1. In this way, Sentinel 2 and 3 can also establish a network connection, so that a cluster of sentries is formed. They can communicate with each other through a network connection, for example, to judge and negotiate whether the main library is offline.

Insert picture description here

In addition to establishing a connection with each other to form a cluster, the sentinels also need to establish a connection with the slave library. This is because, in the monitoring task of the sentry, it needs to make heartbeat judgments on the master and slave libraries, and after the master and slave libraries are switched, it also needs to notify the slave libraries to synchronize them with the new master library.

The sentry sends an INFO command to the main library to complete. After the main library receives this command, it will return the slave library list to the sentry. Then, the sentry can establish a connection with each slave library based on the connection information in the slave library list, and continuously monitor the slave library on this connection. Sentinel 1 and 3 can establish a connection with the slave library through the same method.

Insert picture description here


Through the pub/sub mechanism, the sentry can form a cluster. At the same time, the sentry can obtain the connection information of the slave library through the INFO command, and can also establish a connection with the slave library and perform monitoring.

In the actual use of sentinels, we sometimes encounter such a problem: how to understand the process of master-slave switching of sentinels through monitoring on the client? For example, where is the master-slave switch? This is actually a requirement that the client can obtain various events that occur during the monitoring, master selection, and switching of the sentinel cluster.

At this point, we can still rely on the pub/sub mechanism to help us complete the information synchronization between the sentinel and the client.

Client event notification based on pub/sub mechanism

Each sentinel instance also provides a pub/sub mechanism, and the client can subscribe to messages from the sentinel. There are many news subscription channels provided by Sentinel. Different channels contain different key events in the process of switching between master and slave libraries, and key events related to switching between master and slave libraries, such as master library offline judgment, new master library selection, and slave library renewal. Configuration.

Insert picture description here


The specific operation steps are that after the client reads the sentry's configuration file, it can obtain the sentry's address and port, and establish a network connection with the sentry. Then, we can execute the subscription command on the client to get different event messages.

SUBSCRIBE +odown // 订阅所有实例处于“客观下线状态的事件”

PSUBSCRIBE  * // 订阅所有事件

switch-master <master name> <oldip> <oldport> <newip> <newport> // 新主库切换事件

When the sentinel selects the new main library, the client will see the switch-masterevent. This event indicates that the main library has been switched, and the IP address and port information of the new main library are already available. With the pub/sub mechanism, connections can be established between sentries and sentries, between sentries and slave libraries, and between sentries and clients.

Which sentry performs the master-slave switch

This process is how to select the leading sentinel in the sentinel cluster to perform the master-slave switch.

Any sentinel instance will send is-master-down-by-addrcommands to other instances as long as it judges that the main library is "subjectively offline" . Then, other instances will respond with an YOR Nbased on the connection between themselves and the main library , which is Yequivalent to an affirmative vote*8, which is Nequivalent to a negative vote**.

When there are N sentinel instances, it is better to have N/2 + 1 instances to judge the main database as "subjectively offline" in order to finally determine the main database as "objectively offline". At this time, the sentry can send commands to other sentries to indicate that it wants to perform the master-slave switchover by itself, and let all other sentries vote for themselves. This voting process is called "Leader election".

The conditions for becoming a leader are as follows:

  • Get more than half of the affirmative votes;
  • The number of votes must also be greater than or equal to the quorum value in the sentinel configuration file;

The election process is shown in the figure below:

Insert picture description here
  • T1 moment: Sentinel 1 judges that the main database is "objectively offline", first voted Y for himself, and then expressed his desire to become the leader to Sentinel 2 and Sentinel 3 respectively;
  • Time T2: Sentinel 3 also judges that the main database is "objectively offline" and wants to become a leader. It also votes for himself first, and then sends commands to S1 and S2 to become Leader;
  • Time T3: Sentinel 1 receives the Leader vote request of Sentinel 2. Because Sentinel 1 has voted Y for itself, it can no longer vote for other Sentinels, so Sentinel 1 replies with N to disagree. At the same time, Sentinel 2 received the Leader voting request sent by Sentinel 3 at T2. Because Sentinel 1 has not voted before, it will reply Y to the first sentry that sends a voting request to it, and reply N to the sentry who sends a poll request later. Therefore, at T3, Sentinel 2 replies to Sentinel 3 and agrees to the sentry. 3 Become a leader.
  • At T4, Sentinel 2 received the voting order sent by Sentinel 1 at T1. Because Sentinel 2 has agreed to Sentinel 3's voting request at T3, Sentinel 2 replies with N to Sentinel 1 at this time, indicating that Sentinel 1 is not allowed to become Leader. This happens because the network transmission between Sentinel 3 and Sentinel 2 is normal, and the network transmission between Sentinel 1 and Sentinel 2 may just be congested, resulting in slow transmission of voting requests.
  • T5 time: Sentinel 3 not only won more than half of the leader's votes, but also reached the preset quorum value (quorum is 2), so it eventually became the leader. Then, Sentinel 3 will start to perform the master selection operation, and after the new master library is selected, it will notify other slave libraries and clients of the information of the new master library.

In fact, the sentinel’s vote is first-come, first-served, and the first vote request received will reply with Y. If Sentinel 3 does not get 2 Y votes, then this round of voting will not produce Leader. The sentinel cluster will wait for a period of time (that is, twice the time of the sentinel failover timeout) before re-election. This is because the sentinel cluster's ability to successfully vote depends largely on the normal network propagation of election orders. If the network is under high pressure or short-term blockage, it may result in no sentinel that can get more than half of the votes. Therefore, after the network congestion improves, and then the voting is conducted, the probability of success will increase.

It should be noted that if the sentry cluster has only 2 instances, at this time, if a sentry wants to become a leader, it must get 2 votes instead of 1 vote. Therefore, if a sentinel fails, then the cluster cannot switch between master and slave libraries at this time. Therefore, we usually configure at least 3 sentinel instances.

The process of selecting the leader is the consensus algorithm in the distributed system. How multiple nodes in the cluster reach a consensus on a problem will be introduced in the following articles. Consensus algorithms such as Paxos and Raft will be introduced.

Another question is if an instance in the sentinel cluster is down, what should be done? Will it affect the judgment of the main library status and the selection of the master?

Answer: This is a problem in the field of distributed systems. It refers to the distributed system, if there are faulty nodes, can the entire cluster still provide services? And the service provided is correct? The most famous in this regard is the "Byzantine Generals" problem in the distributed field. The "Byzantine Generals Problem" not only solves the problem of fault tolerance, but also solves the problem of wrong nodes, which will be introduced in subsequent articles. When there are faulty nodes in the Redis sentinel cluster, as long as most of the nodes in the cluster are in normal state, the cluster can still provide external services.

references

1. Geek Time-Redis core technology and actual combat
2. Redis design and implementation;