# Redis miscellaneous notes

## Redis miscellaneous notes

It is not easy to organize, please indicate: https://blog.csdn.net/zhangjingao/article/details/117601983

### Consistent Hash Algorithm

The consistent hash algorithm divides the space into a ring space. In the ring space, each machine node is distributed at different points on the ring. When the key value is calculated to a certain point, it will look for a node clockwise. , The data will be saved there . When a machine is down or a new machine is added, it will only affect a part of the nearby keys, and will not affect all subsequent keys, so as to reduce the impact of deleting or adding machines on node rehash as much as possible. This feature solves the monotonicity and load balancing characteristics.

If there are few nodes, then there are two problems:
1. There is a data skew problem . After the node is hashed, it is placed on the ring. If the result of the hash is not uniform, there will be a data skew problem. For example: there are two nodes, the value after the hash is
2, there is a cache avalanche problem , if one of the nodes is deleted, then there will be a large number of key failures, then there will be an avalanche problem.

In order to solve the above problems, the consistent hash algorithm introduces the concept of virtual nodes. A real node is divided into many virtual nodes, and the virtual nodes are placed on the ring. Then when adding or deleting nodes, the affected The point will be greatly reduced.
As shown in the figure below: v100 and v101 are both virtual nodes of node 1, v200 and v201 are virtual nodes of node 2, and v300 and v301 are virtual nodes of node 3. At this time, if node 1 is deleted, the corresponding v100 and v101 will disappear, and the only affected data is k1 mapped to v301 and k4 mapped to v200.

The consistent hash algorithm virtualizes a lot of virtual nodes for each machine node, which are evenly distributed on the ring. This solves the balance and will not cause a node to cache too many keys, and some nodes Too few keys in the cache are unbalanced.

Note: Real nodes will not be placed on the hash ring, only virtual nodes.

### Hash slot algorithm (currently used)

The algorithm currently used by the redis cluster is a hash slot algorithm, not a consistent hash algorithm.
The hash slot algorithm is no longer a ring, but a linear interval. There are a total of 16384 (2^14=2 8 1024) slots. When the cluster is created, which slots each master node is responsible for have already been allocated. When adding and deleting nodes later, all that is done is the migration of the slots. .
Put a classic diagram of keys, slots, nodes, and clusters.

Maintaining the relationship between slots and nodes is an array structure, the key is the slot subscript, and the value is the node object. In this way, the corresponding processing node can be found quickly according to the slot.

### Performance optimization

Pressure test command: ./redis-benchmark -h 127.0.0.1, I used 133s for get and set100000 pieces of data on my computer

Redis has limited memory, which is related to the computer, 64G or 128G

Memory is limited, you can use sub-database and sub-table to increase memory, and sub-server storage for different keys based on modulo

In the case of more reads and less writes, use read-write separation to achieve speed optimization

### RESP protocol

The communication protocol of Redis's java client jedis to communicate with redis. Based on TCP application layer protocol.

### Cache penetration

#### solution

##### Cache empty key

Redis caches this non-existent key, and the value is set to null. When it is not found in the database, redis caches this non-existent key and sets the expiration time.

##### Use BloomFilter.

BloomFilter is a component that can determine whether the key exists. Before loading the cache, if the key does not exist in the bloomfilter, it will directly return null. If it exists, check the cache and then check the db.

##### Comparison of the two programs

When the client may often use non-existent key queries with a high repetition rate, we can use the first scheme,

When the repetition rate of non-existent keys used by the client is low, the second scheme is used.

### Cache breakdown

#### solution

When there are multiple threads requesting the database, the mutex lock is used to lock the request. When the first thread requests the data, the data can be stored in the cache, and then other threads can directly use the cache.

### Cache avalanche

#### solution

##### beforehand

Use cluster caching to ensure high availability of cache services. In redis, you can use master-slave + sentinel, redis cluster to avoid redis crashing.

##### In the matter

Ehcache local cache + Hystrix current limit & downgrade to avoid MySQL being killed

##### afterwards

Turn on the redis persistence mechanism to restore the data in the memory.

### Hot data failure

#### solution

##### Set different expiration time

Set different expiration times for different hotspot data. The expiration time can be a certain certain time + an uncertain random value that fluctuates in a small range.

##### Mutex

Similar to the cache breakdown solution, the requesting thread is locked. When the first thread requests the database in the database, the thread requesting the same data will not go to the database again, but this solution will reduce the system throughput. Used in combination with the situation.

### Persistence

There are two ways to persist: RDB and AOF

#### RDB

Principle : Redis calls the fork thread, and the child thread will write data changes to the rdb file, overwriting the old file, which is a snapshot form of persistence

1. RDB will configure the time point. When the time point is reached, the change amount will start to persist when the required data is reached, such as save 60 100. If the change amount reaches 100 within 60s, it will be backed up. There can be multiple time point configurations.

2. Sub-thread backup is used, without affecting the continued service of the main thread

3. In the case of a large amount of data, RDB can start faster than AOF