Slicing cluster: The data has increased, should I add memory or add instances?

1. What is a slice cluster?

Slicing cluster, also called sharding cluster, refers to starting multiple Redis instances to form a cluster, and then dividing the received data into multiple copies according to certain rules, and each copy is saved by one instance.

2. A plan to save more data?

  • Vertical expansion: Upgrade the resource configuration of a single Redis instance, including increasing memory capacity, increasing disk capacity, and using higher-configured CPUs.
  • Horizontal expansion: Increase the number of Redis instances horizontally.
Insert picture description here


Vertical expansion:

The advantage is that it is simple and straightforward to implement.
Two questions:

  • When using RDB to persist data, if the amount of data increases, the required memory will also increase, and the main thread may be blocked when the child process is forked. But if you don't require persistent storage of Redis data, then vertical scaling would be a good choice.
  • Vertical expansion will receive hardware and cost constraints.

Horizontal expansion:
In the face of millions and tens of millions of users, the horizontally expanded Redis slice cluster will be a very good choice.

problem:

  • After the data is sliced, how to distribute it before multiple instances
  • How does the client determine which instance the data to be accessed is on?

3. Corresponding distribution relationship between data slices and instances

After Redis3.0, the official provides a solution called Redis Cluster, which is used to implement sliced ​​clusters.

The corresponding rules of data and instances are stipulated in the Redis Cluster solution.

Specifically, the Redis Cluster solution uses Hash Slot to process the mapping relationship between data and instances. In the Redis Cluster solution, a slice cluster has a total of 16384 hash slots. These hash slots are similar to data partitions. Each key-value pair is mapped to a hash slot according to its key.

The specific mapping process is divided into two major steps:
first, according to the key of the key-value pair, calculate a 16-bit value according to the CRC16 algorithm, and then use the 16-bit value to modulate 16384, and each modulus represents a hash of a corresponding number. groove.

So how do these hash slots map to specific Redis instances?

When deploying the Redis Cluster solution, you can use the cluster create command to create a cluster. At this time, Redis will automatically distribute these slots evenly on the cluster instances.

The number of slots on each instance is 16384/N.

Of course, we can also use the cluster meet command to manually establish a connection between instances to form a cluster, and then use the cluster addslots command to specify the number of hash slots on each instance.

However, when manually assigning hash slots, you need to allocate all 16384 slots, otherwise the Redis cluster cannot work normally.

Insert picture description here

4. How does the client locate the data?

When locating key-value pair data, the hash slot it is in can be obtained by calculation , and this calculation can be performed by sending a request from the client. But to further locate the instance, you also need to know which instance the hash slot is on.

The Redis instance will send its own hash slot information to other instances connected to it, and the distribution information of the hash slot will be spread in the next ten thousand years. When the instances are connected to each other, each instance has a mapping relationship for all hash slots.

After the client receives the hash slot information, it will cache the hash slot information locally. When the client requests a key-value pair, the hash slot corresponding to the key is calculated first, and then the request can be sent to the corresponding instance.

5. How to notify the client when the correspondence between the hash slot and the instance changes?

However, in a cluster, the correspondence between instances and hash slots is not static, and there are two most common changes:

  • In the cluster, if instances are added or deleted, Redis needs to reallocate hash slots.
  • For load balancing, Redis needs to redistribute hash slots on all instances.

Instances can also pass messages to each other to obtain the latest hash slot allocation information, but the client cannot actively perceive these changes. This will lead to inconsistencies between the cached allocation information and the latest allocation information. What should I do?

** The Redis Cluster solution provides a redirection mechanism: ** The so-called redirection means that when the client sends data read and write operations to an instance, the instance does not have corresponding data, and the client has to give a new instance Send operation commands.

How does the client know the access address of the new instance when it is redirected?

When the client sends a key-value pair operation request to an instance, if there is no hash slot for the key-value pair mapping on this instance, then the instance will return the following MOVED command response result to the client, this result Contains the access address of the new instance.

After the client receives the MOVED command, it will also update the local cache and update the correspondence between the hash slot and the instance.

Insert picture description here

If the data in the hash slot is in the migration process, but not all migration is completed, the client will receive an ASK error message.
The ASK command does not update the hash slot allocation information cached by the client.