Chapter 6 Redis Cluster

1 Introduction

  • The ability to automatically split data sets between multiple nodes.
  • The ability to continue operations when some nodes fail or cannot communicate with the rest of the cluster.

Each Redis Cluster node needs to open two TCP connections. The ordinary Redis TCP port used to serve the client, such as 6379, plus the port obtained by adding 10000 is the data port, so in the example it is 16379.

The second high port is used for the cluster bus, that is, the node-to-node communication channel using the binary protocol. Nodes use the cluster bus for fault detection, configuration updates, failover authorization, etc. The client should never try to communicate with the cluster bus port, but should always communicate with the ordinary Redis command port, but please make sure to open these two ports in the firewall, otherwise the Redis cluster nodes will not be able to communicate.

The offset of the command port and the cluster bus port is fixed and is always 10000.

The cluster bus uses different binary protocols for node-to-node data exchange, and is more suitable for exchanging information between nodes with very little bandwidth and processing time.

Currently Redis Cluster does not support the NATted environment, nor does it support the general environment where the IP address or TCP port is remapped.

Redis Cluster cannot guarantee strong consistency. In practice, this means that in some cases, Redis Cluster may lose writes that the system has confirmed to the client.

The first reason why Redis Cluster loses writes is because it uses asynchronous replication.

The Redis cluster supports synchronous writing when absolutely necessary, through WAIT commands. This greatly reduces the possibility of lost writes. However, please note that even with synchronous replication, Redis Cluster does not achieve strong consistency: in more complex failure scenarios, it is always possible to select the slave that cannot receive writes as the master.

Another notable scenario is that the Redis cluster will lose writes. This occurs during the network partition, and the client is isolated from a small number of instances (including at least one master instance).

After the node times out, the master node is deemed to have failed and can be replaced by one of its replicas. Similarly, after the node times out, the master node cannot perceive most other master nodes, it will enter an error state and stop accepting writes.

Note that the smallest cluster that works as expected needs to contain at least three master nodes.

  • High performance and linear scalability of up to 1000 nodes. There is no proxy, asynchronous replication is used, and no merge operations are performed on values.
  • Acceptable write security: The system attempts (in a best-effort manner) to retain all writes from clients connected to most master nodes. There are usually some small windows, and the confirmed write may be lost. When the client is in a small number of partitions, the loss of confirmed writes in Windows will be greater.
  • Availability: Redis clusters can survive in most of the partitions accessible by the master node, and for each master node that is no longer accessible, there is at least one accessible slave node. In addition, with replica migration, a master server that is no longer replicated by any slave server will receive one from a master server covered by multiple slave servers.

By default, a Redis instance has only 16 databases.

2. Data fragmentation

Redis Cluster does not use consistent hashing, but a different form of sharding, in which each key is conceptually part of what we call a hash slot.

There are 16384 hash slots in the Redis cluster. To calculate the hash slot of a given key, we only need to take the CRC16 modulus of the key 16384.

Each node in the Redis cluster is responsible for a subset of the hash slot. This allows easy addition and removal of nodes in the cluster.

Since moving a hash slot from one node to another node does not require any downtime, adding and removing nodes or changing the percentage of hash slots held by a node does not require any downtime.

Redis Cluster supports multi-key operations, as long as all keys involved in a single command execution (or the entire transaction, or Lua script execution) belong to the same hash slot. Users can use a concept called hash tags to force multiple keys to be part of the same hash slot.

  • It allows the use of the sum of the memory of multiple computers to create larger databases. Without partitioning, you will be limited by the amount of memory that a single computer can support.
  • It allows the expansion of computing power to multiple cores and multiple computers, and the expansion of network bandwidth to multiple computers and network adapters.

3. Master-slave mode

In order to remain available when part of the master node fails or cannot communicate with most nodes, Redis cluster uses a master-slave model, where each hash slot has from 1 (the master node itself) to N replicas (N -1 additional Slave node).

Redis Cluster does not support multiple databases like the standalone version of Redis. There is only database 0 and the SELECT command is not allowed.

In a Redis cluster, nodes are responsible for saving data and obtaining the state of the cluster, including mapping keys to the correct nodes. The cluster node can also automatically discover other nodes, detect non-working nodes, and promote the slave node to be the master node when needed, so that it can continue to run in the event of a failure.

In order to perform their tasks, all cluster nodes are connected using TCP bus and binary protocol, called Redis cluster bus. Each node uses cluster bus to connect to every other node in the cluster. Nodes use the gossip protocol to propagate information about the cluster to discover new nodes, send ping packets to ensure that all other nodes are working properly, and send cluster messages required for specific conditions. The cluster bus is also used to propagate Pub/Sub messages in the cluster and coordinate manual failover when requested by the user (manual failover is not initiated by the Redis cluster fault detector, but directly initiated by the system administrator).

Each node has a unique name in the cluster. The node name is the hexadecimal representation of a 160-digit random number, which is obtained when the node is first started (usually /dev/urandom is used). The node saves its ID in the node configuration file and will always use the same ID, or at least as long as the system administrator does not delete the node configuration file, or request a hard reset via the CLUSTER RESET command.

The node ID is used to identify each node in the entire cluster. A given node can change its IP address without changing the node ID. The cluster can also detect IP/port changes and reconfigure using the gossip protocol running on the cluster bus.

The node ID is not the only information associated with each node, but the only information that is always globally consistent. Each node also has the following set of associated information. Some information is about the details of the cluster configuration of this particular node, and is ultimately consistent across the entire cluster. Some other information, such as the time when the node was last pinged, is the local information of each node.

Each node maintains the following information about other nodes it knows in the cluster: node ID, node IP and port, a set of flags, if it is marked as slave, what is the master node of the node, and the last time the node was The ping and the last time the pong was received, the current configuration period of the node, the link state and the last provided hash slot set.

Each Redis cluster node has an additional TCP port for receiving incoming connections from other Redis cluster nodes. This port has a fixed offset from the normal TCP port used to receive incoming connections from clients. To get the Redis Cluster port, you need to add 10000 to the common command port. For example, if the Redis node is listening for client connections on port 6379, the cluster bus port 16379 will also be open.

Node-to-node communication only uses cluster bus and cluster bus protocol: a binary protocol composed of frames of different types and sizes. There is no public record of the cluster bus binary protocol because it is not suitable for external software devices to use this protocol to communicate with Redis cluster nodes.

The Redis cluster is a full mesh network, where each node is connected to each other node using a TCP connection.

In a cluster of N nodes, each node has N-1 outgoing TCP connections and N-1 incoming connections.

These TCP connections are always active and will not be created on demand. When a node expects a pong reply in response to a ping in the cluster bus, it will try to refresh the connection with the node by reconnecting from the beginning before waiting long enough to mark the node as unreachable.

The Redis Cluster nodes form a full mesh. The nodes use the gossip protocol and configuration update mechanism to avoid excessive exchange of messages between nodes under normal conditions, so the number of messages exchanged is not exponential.

  • If a node shows itself a MEET message. The meet message is exactly the same as the PING message, but the receiver is forced to accept the node as part of the cluster. The MEET node will send messages to other nodes only when the system administrator requests it through the following command: CLUSTER MEET ip port
  • If an already trusted node talks about this other node, then one node will also register the other node as part of the cluster. So if A knows B and B knows C, B will eventually send A gossip about C. When this happens, A will register C as part of the network and will try to connect with C.

This means that as long as we connect the nodes in any connection graph, they will eventually automatically form a fully connected graph. This means that the cluster can automatically discover other nodes, but only if there is a trust relationship enforced by the system administrator.

This mechanism makes the cluster more robust, but prevents different Redis clusters from accidentally mixing after IP address changes or other network-related events.

Redis Cluster supports the ability to add and delete nodes while the cluster is running. Adding or deleting nodes is abstracted as the same operation: moving the hash slot from one node to another. This means that the same basic mechanism can be used to rebalance the cluster, add or remove nodes, etc.

  • In order to add a new node to the cluster, an empty node will be added to the cluster and some hash slot sets will be moved from the existing node to the new node.
  • In order to remove a node from the cluster, the hash slot assigned to that node is moved to other existing nodes.
  • In order to rebalance the cluster, a given set of hash slots is moved between nodes.

The core of the implementation is the ability to move the hash slot.

Redis Cluster nodes continuously exchange ping-pong packets. The two types of messages have the same structure and both carry important configuration information. The only practical difference is the message type field. We call the sum of ping and pong packets a heartbeat packet.


Configuration parameter
parameter nameParameter valuedescription
cluster-enabledyes/noIf so, enable Redis Cluster support in the specific Redis instance. Otherwise, the instance is launched as a standalone instance as usual.
cluster-config-filefilenameNote that despite the name of this option, this is not a user-editable configuration file, but a Redis Cluster node automatically persists the cluster configuration (state) file every time there is a change, so that it can be re-read at startup . This file lists the other nodes in the cluster, their status, persistent variables, and so on. Due to receiving certain messages, this file is usually rewritten and flushed to disk.
cluster-node-timeoutmillisecondsThe maximum time that a Redis cluster node is unavailable, and is not considered a failure. If the master node cannot be accessed within the specified time, it will be failed over by its slave node. This parameter controls other important matters in Redis Cluster. It is worth noting that every node that cannot access most of the master nodes within the specified time will stop accepting queries.
cluster-slave-validity-factorfactorIf set to zero, the slave will always consider itself valid, and therefore will always try to fail over the master, regardless of how long the link between the master and the slave remains disconnected. If the value is positive, the maximum disconnection time is calculated as the node timeout value multiplied by the coefficient provided by this option. If the node is a slave node, if the primary link is disconnected for more than the specified time, it will not attempt to initiate a failover. For example, if the node timeout is set to 5 seconds, and the validity factor is set to 10, the slave that disconnects from the master for more than 50 seconds will not attempt to fail over to its master. Please note that if there is no slave server that can fail over, any non-zero value may cause the Redis cluster to become unavailable after the failure of the master server. In this case, the cluster will be available only when the original master node rejoins the cluster.
cluster-migration-barriercountA master station will maintain the minimum number of connected slave stations so that another slave station can migrate to a master station that is no longer covered by any slave station.
cluster-require-full-coverageyes/noIf you set it to yes, by default, if any node does not cover a certain percentage of the key space, the cluster will stop accepting writes. If this option is set to no, the cluster will still provide query services even if only requests related to a subset of keys can be processed.
cluster-allow-reads-when-downyes/noIf set to no, by default, when the cluster is marked as failed, the nodes in the Redis cluster will stop serving all traffic, or when a quorum is reached or full coverage is not reached when the node is inaccessible. This prevents reading potentially inconsistent data from nodes that are unaware of changes in the cluster. You can set this option to yes to allow reads from the node during a failed state, which is useful for applications that want to prioritize read availability but still want to prevent inconsistent writes. It can also be used when using a Redis Cluster with only one or two shards, because it allows a node to continue to provide write services when the primary node fails but cannot automatically failover.

5. Master-slave replication

  • When the master instance and the replica instance are well connected, the master instance keeps the replica updated by sending a command stream to the replica to replicate the impact on the data set that occurs on the master side due to the following reasons: client write, key expiration or eviction, any Other operations that change the master data set.
  • When the link between the master and the replica is broken, due to network issues or because a timeout is detected in the master or replica, the replica will reconnect and try to resynchronize partially: this means it will try to get only part of it is disconnected Command flow missed during the connection.
  • When partial resynchronization is not possible, the replica will require a full resynchronization. This will involve a more complex process, where the master node needs to create a snapshot of all its data, send it to the replica, and then continue to send the command stream as the data set changes.

Redis uses asynchronous replication by default, with low latency and high performance, which is the natural replication mode for most Redis use cases. However, Redis replicas will asynchronously confirm the amount of data they and the master server regularly receive. Therefore, the master does not wait for the replica to process a command every time, but if necessary, it knows which replica has already processed which command. This allows optional synchronous replication.

The client can use the WAIT command to request synchronous replication of some data. However, WAIT can only guarantee that there are a specified number of confirmed copies in other Redis instances. It cannot turn a group of Redis instances into a strongly consistent CP system: confirmed writes during failover may still be lost, depending on Redis. The exact configuration of persistence. However, with WAIT, the possibility of losing writes after a failure event is greatly reduced to certain failure modes that are difficult to trigger.

  • Redis uses asynchronous replication to asynchronously replicate to the master node to confirm the amount of data processed.
  • A master can have multiple copies.
  • The replica can accept connections from other replicas. In addition to connecting multiple replicas to the same master server, replicas can also be connected to other replicas in a similar cascading structure. Starting with Redis 4.0, all child replicas will receive exactly the same replication stream from the master replica.
  • Redis replication is non-blocking on the primary side. This means that when one or more replicas perform initial synchronization or partial resynchronization, the master node will continue to process queries.
  • Replication is also basically non-blocking on the replica side. When the replica performs the initial synchronization, it can use the old version of the data set to process queries, assuming you configure Redis to do so in redis.conf. Otherwise, you can configure the Redis replica to return an error to the client when the replication stream is closed. However, after the initial synchronization, the old data set must be deleted and the new data set must be loaded. The replica will block incoming connections during this short window (may be as long as several seconds for very large data sets). Starting with Redis 4.0, Redis can be configured so that the deletion of the old data set occurs in a different thread, but the loading of the new initial data set will still occur in the main thread and block the copy.
  • Replication can be used for scalability to provide multiple replicas for read-only queries (for example, slow O(N) operations can be offloaded to replicas), or it can only be used to improve data security and high availability.
  • Replication can be used to avoid the cost of having the master write the complete data set to disk: Typical techniques include configuring masterredis.conf to completely avoid persisting to disk, then connecting to a copy that is configured to be saved from time to time, or using AOF enabled. However, this setting must be handled with care, because the restarted master server will start with an empty data set: if the replica tries to synchronize with it, the replica will also be emptied.
Redis master-slave synchronization

6. Master-slave synchronization

Master-slave synchronization mechanism
  • Among multiple data servers, there is only one main server, and the main server is only responsible for writing data, not for external programs to read data.
  • There are multiple slave servers, and the slave servers do not write data, but are only responsible for synchronizing the data of the master server and allowing external programs to read the data.
  • After the master server writes the data, it immediately sends the data write command to the slave server, so that the master-slave data is synchronized.
  • The application program can randomly read data from a certain slave server, thus sharing the pressure of reading data.
  • When the slave server fails to work, the entire system will not be affected; when the master server fails, you can easily elect one of the slave servers to be the master server.

7. Sentinel (Sentinel)

Redis Sentinel provides high availability for Redis.

  • monitor. Sentinel will constantly check whether your master and replica instances are working as expected.
  • Notice. Sentinel can notify system administrators or other computer programs via API that there is a problem with one of the monitored Redis instances.
  • Automatic failover. If the master does not work as expected, Sentinel can initiate a failover process, one of the replicas is promoted to master, the other additional replicas are reconfigured to use the new master, and the application using the Redis server will be notified of the new one to use When the address is connected.
  • Configure the provider. Sentinel acts as the authoritative source of client service discovery: the client connects to Sentinel to request the address of the current Redis master node responsible for a given service. If a failover occurs, Sentinels will report the new address.

A configuration file must be used when running Sentinel because the system will use this file to save the current state that will be reloaded upon restart. If no configuration file is given or the configuration file path is not writable, Sentinel will simply refuse to start.

Sentinel runs by default listening for connections on TCP port 26379, so for Sentinel to work, your server's port 26379 must be open to receive connections from the IP addresses of other Sentinel instances. Otherwise, the sentry cannot speak, and cannot agree on what to do, so failover will never be performed.

  • You need at least three Sentinel instances for a robust deployment.
  • The three Sentinel instances should be placed in a computer or virtual machine that is considered to have failed in an independent manner. For example, different physical servers or virtual machines executing on different availability zones.
  • The Sentinel + Redis distributed system does not guarantee the retention of confirmed writes during failures because Redis uses asynchronous replication. However, there are ways to deploy Sentinel to limit the window of lost writes to certain moments, and there are other less secure deployment methods.
  • Your customers need Sentinel support. Popular client libraries have Sentinel support, but not all.
  • No HA setup is safe, and it is better if you are not testing in a development environment from time to time, or if you can, in a production environment if they can work. You may have a misconfiguration that only becomes apparent when it is too late.
  • Sentinel, Docker, or other forms of network address translation or port mapping should be carefully mixed: Docker performs port remapping, destroying the Sentinel automatic discovery of other Sentinel processes and the replica list of the master server.

You only need to specify the master node to be monitored, and assign a different name to each separated master node (which may have any number of replicas). There is no need to specify an auto-discovered copy. Sentinel will automatically update the configuration with additional information about the copy (in order to retain that information on reboot). The configuration is rewritten every time the replica is promoted to the primary server during a failover and every time a new Sentinel is discovered.

  • The master nodes are called M1, M2, M3,..., Mn.
  • The copies are called R1, R2, R3,..., Rn (R stands for copy).
  • The sentinels are called S1, S2, S3, ..., Sn.
  • The clients are called C1, C2, C3,..., Cn.
  • When an instance changes its role due to Sentinel actions, we put it in square brackets, so [M1] means an instance that is now master due to Sentinel intervention.

7.1. Distributed features

  • When multiple Sentinel agree that a given master is no longer available, fault detection is performed. This reduces the possibility of false alarms.
  • Even if not all Sentinel processes are working, Sentinel can work normally, so that the system can withstand failures. After all, there is no point in having a failover system that is itself a single point of failure.