Have a close contact with your own Redis Cluster

The author has maintained thousands of redis instances. These instances use a simple master-slave structure. The cluster solution is mainly a client jar package. At the beginning, I didn't like redis cluster too much, because its routing was too rigid and complicated in operation and maintenance.

But the official is pushing this thing, it is destined to be more and more widely used, which can be found in ordinary exchanges. Although there are shortcomings of this kind, but they can't resist the wave of authority. As the redis cluster becomes more and more stable, it is time to have a soul exchange with the redis cluster.

Introduction

Redis cluster is a biological cluster solution. At present, it has made great progress in terms of high availability and stability. According to statistics and observations, more and more companies and communities adopt the redis cluster architecture, which has become the de facto standard. Its main feature is that it is decentralized and does not require a proxy. One of the main design goals is to achieve linear scalability.

Only relying on the redis cluster server itself cannot fulfill the officially promised functions. In a broad sense, redis cluster should include both redis server and client implementations such as jedis. They are a whole.

Distributed storage is nothing more than dealing with shards and copies. For redis cluster, the core concept is slot. After understanding it, you can basically understand how the cluster is managed.

Pros and cons

After understanding these features, the operation and maintenance is actually easier. Let us first look at the more obvious advantages and disadvantages.

advantage

1. No additional Sentinel cluster is needed, which provides users with a consistent solution and reduces learning costs.
2. Decentralized architecture, peer-to-peer nodes, clusters can support thousands of nodes.
3. The slot concept is abstracted, and operation and maintenance operations are performed on the slot.
4. The copy function can realize automatic failover without manual intervention in most cases.

Disadvantage

1. The client needs to cache part of the data and implement the Cluster protocol, which is relatively complicated.
2. Data is replicated asynchronously, and strong consistency of data cannot be guaranteed.
3. Resource isolation is difficult, and traffic is often unbalanced, especially when multiple businesses share a cluster. The data does not know where, and for hot data, it cannot be 专项优化completed. 4. The slave library is completely cold standby and cannot share the read operation, which is too wasteful. Need to do extra work. 5. MultiOp and Pipeline support is limited. If the old code is upgraded, be careful. 6. Data migration is based on keys instead of slots, and the process is slow.

Fundamental

From slot to key, the positioning process is obviously a two-layer route.

key routing

The redis cluster has nothing to do with the commonly used consistency hash, it mainly uses the concept of a hash slot. When you need to access a key in it, the redis client will first crc16calculate a value for the key using an algorithm, and then perform a mod operation on this value.

crc16(key)mod 16384

Therefore, each key will fall on one of the hash slots. 16384 is equivalent to 2^14 (16k). When a redis node sends a heartbeat packet, it needs to put all the slot information in this heartbeat packet, so you must make every effort to optimize. If you are interested, you can see why the default number of slots is 16384 .

Simple server-side principle

As mentioned above, the redis cluster defines a total of 16,384 slots, and all cluster operations are encoded around this slot data. The server uses a simple array to store this information.

For the operation of judging whether or not, using bitmap to store is the most space-saving. Redis cluster uses an array called slot to store whether the current node holds this slot.

As shown in the figure, the length of this array is 16384/8=2048 Byte, then 0 or 1 can be used to identify whether the node owns a slot.

In fact, only the first data ClusterState can be used to complete the operation, and the Slot array of another dimension is saved, which is convenient for coding and storage. In addition to recording the slots it is responsible for processing in two places (slots and numslots in the clusterNode structure), a node will also send its slots array to other nodes in the cluster through messages to tell other nodes that it is currently Owned slot.

When 16384 slots in the database are processed by nodes, the cluster is in the online state (ok); on the contrary, if any slot in the database is not processed, the cluster is in the offline state (fail).

When the client sends a related command to the node, the node receiving the command will figure out which slot the key to be processed by the command belongs to, and check whether this slot is assigned to itself. If it is not your own, it will direct the client to the correct node.

Therefore, the client can complete the operation by connecting to any machine in the cluster.

Install a 6-node cluster

Ready to work

Suppose we want to assemble a 3-shard cluster, and each shard has a copy. Then there are 3*2=6 node instances in total. Redis can be started by specifying the configuration file. What we do is modify the configuration file.

Copy 6 default configuration files.

for i in {0..5}  do  cp redis.conf  redis-700$i.confdone  

Modify the content of the configuration file. Take redis-7000.conf as an example. We need to enable its cluster mode.

cluster-enabled yesport 7000cluster-config-file nodes-7000.conf

nodes-7000.conf will save some cluster information to the current node, so it needs to be independent.

Startup & shutdown

Similarly, we use a script to start it.

for i in {0..5}donohup ./redis-server redis-700$i.conf &done

To demonstrate, we violently shut it down.

ps -ef| grep redis | awk '{print $2}' | xargs kill -9

Combined cluster

We use redis-cli for cluster combination. Redis will automatically complete this process. This series of processes is combined by sending instructions to each node.

./redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1

Several advanced principles

Node failure

Each node in the cluster will periodically send ping messages to other nodes in the cluster to check whether the other node is online. If the node receiving the ping message does not return the pong message within the specified time, then the node sending the ping message is The node receiving the ping message will be marked as suspected offline (PFAIL).

If in a cluster, more than half of the nodes report a master node x as suspected offline, then the master node x will be marked as offline (FAIL), and the node marked x as FAIL will broadcast a message to the cluster Regarding the FAIL message of x, all nodes that receive this FAIL message will immediately mark x as FAIL.

You can notice that this process is similar to the node judgment of es and zk. The judgment is made only when more than half of the nodes are judged, so the number of master nodes is generally an odd number. Since there is no minimum group configuration, theoretically there will be a split brain (not encountered yet).

Master-slave switch

When a node finds that its master node enters the fail state, it will select one of the slave nodes of this node, execute the slaveof no onecommand, and become the master node.

After the new node completes its own slot assignment, it will broadcast a pong message to the cluster so that other nodes can immediately know these changes. It tells others: I am already the master node, and I have taken over the problematic node and become its stand-in.

Redis' internal management of the cluster uses a large number of these instructions that have been defined. So these instructions are not only for our use from the command line, but also for redis itself.

data synchronization

When a slave is connected to the master, it will send a sync command. After receiving this instruction, the master will start the save process in the background. After the execution is completed, the master transfers the entire database file to the slave, thus completing the first full synchronization.

Next, the master will send the change instructions it receives to the slave in turn, so as to achieve the final synchronization of the data. Starting from redis 2.8, the resumable transmission of master-slave replication has been supported. If the network connection is broken during the master-slave replication, you can continue to copy from the place where it was copied last time, instead of copying from the beginning.

data lost

Asynchronous replication is used between nodes in a redis cluster, and there is no concept of ack like kafka. The nodes exchange status information through the gossip protocol, and use the voting mechanism to complete the role upgrade from Slave to Master. It is destined to take time to complete this process. In the process of failure, it is easy to have a window, resulting in the loss of written data. For example, the following two situations.

1. The command has arrived at the master, and the data is not synchronized to the slave at this time, and the master will reply ok to the client. If the master node goes down at this time, then this piece of data will be lost. Redis will avoid many problems by doing so, but it is intolerable for a system that requires high data reliability.

2. Since the routing table is stored on the client, there is a timeliness issue. If the partition makes a node unreachable, a certain slave node is promoted, but the original master node can be used again at this time (failover has not been completed). At this time, once the client's routing table is not updated, it will write the data to the wrong node, causing data loss.

Therefore, the redis cluster runs very well under normal circumstances. In extreme cases, some values ​​are lost, and there is currently no solution.

Complex operation and maintenance

The operation and maintenance of redis cluster is very complicated. Although abstraction has been carried out, the process is still not simple. Some instructions can only be used with confidence after understanding their implementation principles in detail.

The above figure shows some commands that will be used for capacity expansion. In the process of actual use, these commands may need to be entered frequently many times, and their status must be monitored during the input process, so it is basically impossible to run these commands manually.

There are two entrances for operation and maintenance. One is to use redis-cli to connect to any one, and then send clusterthe command at the beginning. Most of this part of the command is to operate the slot. When starting to combine clusters, these commands are called repeatedly to execute specific logic.

Another entry is to use the redis-cli command, plus --clusterparameters and instructions. This form is mainly used to control cluster node information  , such as adding and deleting nodes. So it is recommended to use this method.

Redis cluster provides very complicated commands, which are difficult to operate and remember. It is recommended to use tools similar to CacheCloud for management.

Here are a few examples.


By sending the CLUSTER MEET command to node A, the client can let the receiving node A add another node B to the cluster where node A is currently located:

CLUSTER MEET  127.0.0.1 7006

Through the cluster addslots command, one or more slots can be assigned to a node to be responsible.

127.0.0.1:7000> CLUSTER ADDSLOTS 0 1 2 3 4 . . . 5000

Set up the slave node.

CLUSTER REPLICATE <node_id>

# redis-cli --cluster

redis-trib.rb is the official Redis Cluster management tool, but the latest version has recommended using redis-cli for operation.

Add a new node to the cluster

redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7007 --cluster-replicas 1

Remove a node from the cluster

redis-cli --cluster del-node 127.0.0.1:7006 54abb85ea9874af495057b6f95e0af5776b35a52

Migrate slot to new node

redis-cli --cluster reshard 127.0.0.1:7006 --cluster-from 54abb85ea9874af495057b6f95e0af5776b35a52 --cluster-to 895e1d1f589dfdac34f8bdf149360fe9ca8a24eb  --cluster-slots 108

There are many similar commands.

create: create a cluster
check: check cluster
info: view cluster information
fix: repair cluster
reshard: online migration slot
rebalance: balance the number of cluster node slots
add-node: add a new node
del-node: delete a node
set-timeout: set node timeout Time
call: execute the command
import on all nodes in the cluster: import external redis data into the cluster

Overview of other options

Master-slave mode

The earliest supported by redis is the MS mode, that is, one master and multiple slaves. Redis single machine qps can reach 10w+, but in some high-traffic scenarios, it is still not enough. Generally speaking, read and write separation is used to increase the slave and reduce the pressure on the host.

Since it is a master-slave architecture, it faces synchronization problems. The synchronization of redis master-slave mode is divided into full synchronization and partial synchronization. When a slave is just created, it is inevitable to perform a full synchronization. After the full synchronization is over, it enters the incremental synchronization phase. This is no different from redis cluster.

This model is relatively stable, but some extra work is required. Users need to develop the master-slave switching function by themselves, that is, use sentinels to detect the health status of each instance, and then change the cluster state through instructions.

When the cluster size increases, the master-slave mode will soon encounter a bottleneck. Therefore, client-side hash methods are generally used for expansion, including consistent hashing similar to memcached.

The client-side hash routing may be very complicated, and the meta information is usually maintained by publishing jar packages or configuration, which also adds a lot of uncertainty to the online environment.

However, by adding a function similar to ZK's active notification and maintaining the configuration in the cloud, the risk can be significantly reduced. The thousands of redis nodes that the author has maintained are managed in this way.

Agency model

Before the emergence of redis cluster, the code mode was very popular, such as codis. The proxy layer simulates itself as a redis, receives requests from the client, and then performs data fragmentation and migration according to the custom routing logic, and the business side does not need to change any code. In addition to smooth expansion, some master-slave switching and FailOver functions are also completed at the agent layer, and the client may not even have any perception. This type of program is also called distributed middleware.

A typical implementation is shown in the figure below, and the redis cluster behind it can even be mixed.

But the shortcomings of this approach are also obvious. First of all, it introduces a new agent layer, which is complicated in structure and operation and maintenance. A lot of coding is required, such as failover, read-write separation, data migration, etc. In addition, the addition of the proxy layer has a corresponding loss in performance.

Multiple proxies generally use lvs and other front-end load balancing designs. If there are few machines in the proxy layer and the back-end redis traffic is high, the network card will become the main bottleneck.

Nginx can also be used as the proxy layer of redis, which is called in a more professional way Smart Proxy. This method is more partial, if you are familiar with nginx, it is an elegant approach.

Use restrictions and pits

Redis is extremely fast. Generally, the faster the thing, the greater the consequences when something goes wrong. Not long ago, I wrote a redis specification: "This is probably the most relevant Redis specification" (opens new window) . The norm is the same as the structure. The one that suits your company's environment is the best, but it will provide some basic ideas.

Strictly prohibited things are generally where the predecessors stepped on the pit. In addition to the content of this specification, for redis-cluster, add the following points.

1. Redis cluster claims to be able to support 1k nodes, but you'd better not do this. When the number of nodes increases to 10, you can feel some jitter of the cluster. Such a large cluster proves that your business is already very good. Consider client sharding.

2. It is necessary to avoid generating hot spots. If all the traffic hits a certain node, the consequences are generally very serious.

3. Don't put redis on the big key, it will generate a lot of slow queries and affect normal queries.

4. If you are not as a storage, the cache must be set with an expiration time. The feeling of occupying a pit without shit is very annoying.

5. Large traffic, don't open aof, open rdb.

6. For the operation of redis cluster, use less pipeline and less multi-key, they will produce a lot of unpredictable results.

The above are some supplements, and more fully refer to the specification  "This may be the most relevant Redis specification" (opens new window) . .

End

The redis code is so small that it will certainly not implement very complex distributed energy supply. The positioning of redis is performance, horizontal scalability and availability, which is sufficient for simple and general traffic applications. The production environment is no small matter. For complex and highly concurrent applications, it is destined to be a combined optimization solution.

Can follow my B station account→→→→ B station account

Learning Exchange Group→→→→ Exchange Group