redis series (7): redis cluster

RedisCluster is a distributed solution for redis. The solution launched after version 3.0 effectively solves the distributed needs of Redis. When you encounter bottlenecks such as stand-alone memory and concurrency, you can use this solution to solve these problems

1 Distributed database concept:
1. Distributed database maps the entire data to multiple nodes according to partition rules, that is, divides the data into multiple nodes, and each node is responsible for a subset of the overall data. For example, our database has 900 users Data, there are 3 redis nodes, and 900 entries are divided into 3 parts and stored in 3 redis nodes respectively

2. Partition rules:

Common partitioning rules are hash partitioning and sequential partitioning. Redis cluster uses hash partitioning, and sequential partitioning is not used temporarily, so no specific instructions will be given;

rediscluster uses the "virtual slot partition" method of hash partitioning (hash partition is divided into nodes to take the remainder, a 900 data partition rule redis-1 redis-2 redis-3 consistent hash partition and virtual slot partition), the other two The species is not introduced. If you are interested, you can find it on Baidu.

3. Virtual slot partition (slot: slot)

RedisCluster uses this partition, all keys are mapped to slots 0-16383 according to the hash function (CRC16[key]&16383), a total of 16384 slots, each node maintains part of the slot and the key value data mapped by the slot

Hash function: Hash()=CRC16[key]&16383 bitwise AND

The relationship between slots and nodes is as follows

Redis uses virtual slots for partitioning reasons: 1. Decoupling the relationship between data and nodes, the node itself maintains the slot mapping relationship, and distributed storage

4. Defects of redisCluster:

a. The key batch operation support is limited, such as mset, mget, if multiple keys are mapped in different slots, they will not be supported

b. Key transaction support is limited. When multiple keys are distributed on different nodes, transactions cannot be used. The same node supports transactions

c. The key is the smallest granularity of the data partition, and a large key-value pair cannot be mapped to different nodes

d. Multi-database is not supported, only 0, select 0 e, the replication structure only supports single-layer structure, and does not support tree structure.

2 Set up a cluster environment:

1. In the /usr/local/bin/clusterconf directory

6389 is the slave node of 6379, 6390 is the slave node of 6380, and 6391 is 638

2. Modify 6379, 6380, 7381, 6389, 6390, 6391 configuration files respectively

port 6379 //Node port

cluster-enabled yes //Enable cluster mode

cluster-node-timeout 15000 //Node timeout time (time to receive pong message reply)

cluster-config-file /usrlocalbin/cluster/data/nodes-6379.conf The configuration of other nodes in the cluster internal configuration file is consistent with this, just change the port

3. After configuration, start 6 redis services

4. Automatic installation mode: Create a new directory in /usr/local: ruby

Download link: Password: n3pc

Download ruby-2.3.1.tar.gz and redis-3.3.0.gem from this link

tar -zxvf ruby-2.3.1.tar.gz

a, cd ruby-2.3.1

b, ./configure -prefix=/usr/local/ruby

c, make && make install //The process will be a bit slow, about 5-10 minutes

d, then gem install -l redis-3.3.0.gem //If there is no gem, you need to install yum install gem

e. Prepare 6 nodes (be careful not to set requirepass), delete the config-file of /usr/local/bin/clusterconf/data; start 6 nodes in turn: ./redis-server clusterconf/redis6379.conf if before Redis has data, flushall is cleared; (pit: no need for cluster meet ..)

f, Enter cd /usr/local/bin, execute the following: 1 represents the number of slave nodes. /redis-trib.rb create --replicas 1 192.168. 0.111:6389

Master-slave allocation, 6379 is the slave node of 6389

It seems that only the master node can read and write, not the slave node

After the master node dies, the slave node becomes the master node

e, Cluster health check: redis-trib.rb check (Note: redis first uncomment the requirepass, otherwise it will not be connected)

So this problem occurred, slot number 5798 of 6379 was opened

The solution is as follows:

Some slots of 6379, 6380, and 6381 are opened. Enter these nodes respectively and execute

6380:>cluster setslot 1180 stable cluster setslot 2998 stable cluster setslot 11212 stable

The other is the same, after performing the repairs separately:

At this time, the health after repair is normal;

When 6379 is stopped, 6389 will become the master node after a while

Note: add -c when querying using client tools

./redis-cli -h -p 6379 -c mset aa bb cc dd, batch settings correspond to different solts, disadvantages

14. After the cluster starts normally, add in each redis.conf

masterauth "12345678"

requiredpass "12345678"

When the master node goes offline, the slave node will become the master node. The user and password are necessary and set to the same

15. There is one master and one follower, so can it be one master and multiple followers?

./redis-trib.rb create --replicas 2

3 Communication between nodes

1. The Gossip protocol is used to communicate between nodes. The Gossip protocol refers to the constant communication and exchange of information between nodes

When the role of master and slave changes or new nodes are added, communicate with each other through ping/pong to know the latest status of all nodes and achieve cluster synchronization

2, Gossip protocol

The main responsibility of the Gossip protocol is information exchange. The carrier of information exchange is the Gossip messages sent between nodes. Commonly used Gossip messages include ping messages, pong messages, meet messages, and fail messages.

meet message: used to notify the new node to join, the sender of the message informs the receiver to join the current cluster, after the meet message is communicated, the receiving node will join the cluster and perform periodic ping pong exchanges

Ping message: The most frequently exchanged message in the cluster. Each node in the cluster sends a ping message to other nodes every second to detect whether the node is online and status information. The ping message is sent to encapsulate the status data of its own node and other nodes;

The pong message, when a ping meet message is received, is returned to the sender as a response message to confirm normal communication. The pong message also closes its own status data;

Fail message: When a node determines that another node in the cluster is offline, it will broadcast a fail message to the cluster, which will be discussed later. ...

3. Message analysis process

All message formats are: message header and message body. The message header contains the status data of the sending node (such as node ID, slot mapping, node role, offline, etc.). The receiving node can obtain the relevant data of the sending node according to the message header.

Message analysis process:

4. Select the node and send a ping message:

The exchange mechanism of Gossip protocol information has a natural distributed characteristic, but the frequency of ping pong transmission is very high, and the status data of other nodes can be obtained in real time, but the high frequency will increase the bandwidth and computing power, so it will be selected purposefully each time Some nodes; but too few node selections will affect the speed of fault judgment. The Gossip protocol of the redis cluster takes into account the advantages and disadvantages of the two, see the following figure:

It is not difficult to see: the node selection process can be seen that the cost of message exchange is mainly reflected in the number of nodes sending messages and the amount of data carried in each message

Flow Description:

A. Choose the number of nodes to send messages: The maintenance timing task of each node in the cluster is executed 10 times per second by default, and 5 nodes will be randomly selected per second to find the node that has not communicated for the longest time and send ping messages to ensure information The randomness of the exchange, the local node list will be scanned every 100 milliseconds. If it is found that the last time the node received the pong message is greater than cluster-node-timeout/2, the ping message will be sent immediately. The purpose of this is to prevent the node information from being out for too long. Update, when our broadband resources are in short supply, change cluster-node-timeout 15000 to 30 seconds in redis.conf, but not excessively increase

B, message data: node information and other node information

5. Cluster expansion

This is also the most common requirement of distributed storage. When our storage is not enough, we must consider the following expansion steps:

A, ready for the new node

B, join the cluster, migrate slots and data

1), add redis6382.conf and redis6392.conf in the same directory to start two new redis nodes

./redis-server clusterconf/redis6382.conf & (new master node)

./redis-server clusterconf/redis6392.conf & (new slave node)

2), add a master node. /redis-trib.rb add-node 6379 is the original master node, 6382 is the new master node

3), add slave node

redis-trib.rb add-node --slave --master-id 03ccad2ba5dd1e062464bc7590400441fafb63f2

--slave, which means that the slave node is added

--master-id 03ccad2ba5dd1e062464bc7590400441fafb63f2 represents the master_id of the master node 6382, the new slave node The existing old node of the cluster

4), redis-trib.rb reshard //Re-allocate solt for the new master node How many slots do you want to move (from 1 to 16384)? 1000 //Set the slot number to 1000 What is the receiving node ID? 464bc7590400441fafb63f2 //new node node id Source node #1:all //reshuffle all nodes

The addition is complete!

3. Cluster reduction nodes:

The cluster also supports node offline

The offline process is as follows:

Flow Description:

A. Determine whether there is a slot in the offline node. If so, you need to migrate the slot to other nodes first to ensure the integrity of the entire cluster slot node mapping;

B. When the offline node does not have a slot or is a slave node, it can notify other nodes in the cluster (or forgotten node). When the offline node is forgotten, it will be shut down normally.

There are also two types of deleting nodes:

One is the master node 6382, and the other is the slave node 6392.

In the slave node 6392, there is no hash slot assigned, execute ./redis-trib.rb del-node 7668541151b4c37d2d9 There are two parameters ip: port and node id. The slave node 6392 is removed from the cluster.

Steps to delete the master node 6382: 1. ./redis-trib.rb reshard asks how many hash slots we have to remove, because we have just allocated 1000 on this node, so we enter 1000 here

2. Finally

./redis-trib.rb del-node 3e50c6398c75e0088a41f908071c2c2eda1dc900

At this point, the node is offline...

Request routing redirection

We know that in the redis cluster mode, any key-related command received by redis first calculates the key CRC value, finds the corresponding slot through CRC, and then finds the corresponding redis node according to the slot. If the node is itself, directly Process the key command; if it is not, the reply key is redirected to other nodes. This process is called MOVED redirection


The redis cluster achieves high availability. When a small number of nodes in the cluster fail, failover can ensure that the cluster provides services to the outside world normally.

When a node in the cluster has a problem, the nodes in the redis cluster can find out whether the node is healthy and whether it is faulty through the ping pong message. In fact, the main links also include subjective offline and objective offline;

Subjective offline: Refers to a node that another node is unavailable, that is, the offline state. Of course, this state is not the final fault judgment, but can only represent the opinion of the node itself, and there may be misjudgments;

Offline process:

A, node a sends a ping message to node b, if the communication is normal, it will receive the pong message, and node a updates the time of the last communication with node b;

B. If there is a communication problem between node a and node b, the connection will be disconnected, and the connection will be reconnected next time. If the communication fails all the time, their last communication time will not be updated;

C. When the timing task in node a detects that the last communication time with node b exceeds cluster_note-timeout, update the local status of node b subjectively offline (pfail)

Objective offline: Refers to true offline. Multiple nodes in the cluster think that the node is unavailable, and reach a consensus to take it offline. If the offline node is the master node, it must be failed over.

If node a marks node b subjectively offline, after a period of time, node a sends the status of node b to other nodes through messages. When node c receives the message and parses the message body, it will find the pfail status of node b. Will trigger an objective offline process;

When the master node is offline, the redis cluster counts whether the number of votes of the master node holding the slot reaches half. When the number of offline reports is greater than half, it is marked as an objective offline state.


After the failed master node goes offline, if the offline node is the master node, you need to select one of its slave nodes to replace it to ensure the high availability of the cluster; the transfer process is as follows:

1. Qualification check: Check whether the slave node is qualified to replace the failed master node. If the slave node has disconnected from the master node, then the current slave node does not failover specifically;

2. Preparation for election time: When the slave node is eligible for failover, the triggering failure election time is updated, and the follow-up process can only be executed after this time is reached;

3. Initiate an election: when the time for the faulty election is reached, the election will be conducted;

4. Election voting: Only the master node holding the slot has a vote, and it will process the failure election message. The voting process is actually a leader election (election of the slave node as the leader) process, and each master node can only vote for one The votes are given to the slave node. When the slave node collects enough votes (greater than N/2+1), it triggers the replacement of the master node, cancels the slot of the original failed master node, delegates to itself, and broadcasts its delegation message to notify the cluster All nodes within.