Redis miscellaneous notes
It is not easy to organize, please indicate: https://blog.csdn.net/zhangjingao/article/details/117601983
Consistent Hash Algorithm
The consistent hash algorithm divides the space into a ring space. In the ring space, each machine node is distributed at different points on the ring. When the key value is calculated to a certain point, it will look for a node clockwise. , The data will be saved there . When a machine is down or a new machine is added, it will only affect a part of the nearby keys, and will not affect all subsequent keys, so as to reduce the impact of deleting or adding machines on node rehash as much as possible. This feature solves the monotonicity and load balancing characteristics.
If there are few nodes, then there are two problems:
1. There is a data skew problem . After the node is hashed, it is placed on the ring. If the result of the hash is not uniform, there will be a data skew problem. For example: there are two nodes, the value after the hash is
2, there is a cache avalanche problem , if one of the nodes is deleted, then there will be a large number of key failures, then there will be an avalanche problem.
In order to solve the above problems, the consistent hash algorithm introduces the concept of virtual nodes. A real node is divided into many virtual nodes, and the virtual nodes are placed on the ring. Then when adding or deleting nodes, the affected The point will be greatly reduced.
As shown in the figure below: v100 and v101 are both virtual nodes of node 1, v200 and v201 are virtual nodes of node 2, and v300 and v301 are virtual nodes of node 3. At this time, if node 1 is deleted, the corresponding v100 and v101 will disappear, and the only affected data is k1 mapped to v301 and k4 mapped to v200.
The consistent hash algorithm virtualizes a lot of virtual nodes for each machine node, which are evenly distributed on the ring. This solves the balance and will not cause a node to cache too many keys, and some nodes Too few keys in the cache are unbalanced.
Note: Real nodes will not be placed on the hash ring, only virtual nodes.
Hash slot algorithm (currently used)
The algorithm currently used by the redis cluster is a hash slot algorithm, not a consistent hash algorithm.
The hash slot algorithm is no longer a ring, but a linear interval. There are a total of 16384 (2^14=2 8 1024) slots. When the cluster is created, which slots each master node is responsible for have already been allocated. When adding and deleting nodes later, all that is done is the migration of the slots. .
Put a classic diagram of keys, slots, nodes, and clusters.
Maintaining the relationship between slots and nodes is an array structure, the key is the slot subscript, and the value is the node object. In this way, the corresponding processing node can be found quickly according to the slot.
Pressure test command: ./redis-benchmark -h 127.0.0.1, I used 133s for get and set100000 pieces of data on my computer
Redis has limited memory, which is related to the computer, 64G or 128G
Memory is limited, you can use sub-database and sub-table to increase memory, and sub-server storage for different keys based on modulo
In the case of more reads and less writes, use read-write separation to achieve speed optimization
The communication protocol of Redis's java client jedis to communicate with redis. Based on TCP application layer protocol.
Reason: The client queries a non-existent key multiple times, which results in not being queried in the cache each time, and thus reaching the database for query every time.
Cache empty key
Redis caches this non-existent key, and the value is set to null. When it is not found in the database, redis caches this non-existent key and sets the expiration time.
BloomFilter is a component that can determine whether the key exists. Before loading the cache, if the key does not exist in the bloomfilter, it will directly return null. If it exists, check the cache and then check the db.
Comparison of the two programs
When the client may often use non-existent key queries with a high repetition rate, we can use the first scheme,
When the repetition rate of non-existent keys used by the client is low, the second scheme is used.
Reason: When multiple threads request a key at the same time, but the key happens to be invalid at this time, this causes many requests to request the database at the same time in an instant, and the database pressure increases sharply.
When there are multiple threads requesting the database, the mutex lock is used to lock the request. When the first thread requests the data, the data can be stored in the cache, and then other threads can directly use the cache.
Reason: It refers to a large amount of data failure at a certain moment. For example, when the cache service suddenly goes down, all requests will be directly requested to the database, and the database may not be able to bear it.
Use cluster caching to ensure high availability of cache services. In redis, you can use master-slave + sentinel, redis cluster to avoid redis crashing.
In the matter
Ehcache local cache + Hystrix current limit & downgrade to avoid MySQL being killed
Turn on the redis persistence mechanism to restore the data in the memory.
Hot data failure
Reason: When we set up the cache, we often set the expiration time. When multiple hot data expires at the same time, there may be many requests that arrive at the database at the same time, resulting in increased database pressure.
Set different expiration time
Set different expiration times for different hotspot data. The expiration time can be a certain certain time + an uncertain random value that fluctuates in a small range.
Similar to the cache breakdown solution, the requesting thread is locked. When the first thread requests the database in the database, the thread requesting the same data will not go to the database again, but this solution will reduce the system throughput. Used in combination with the situation.
There are two ways to persist: RDB and AOF
Principle : Redis calls the fork thread, and the child thread will write data changes to the rdb file, overwriting the old file, which is a snapshot form of persistence
1. RDB will configure the time point. When the time point is reached, the change amount will start to persist when the required data is reached, such as save 60 100. If the change amount reaches 100 within 60s, it will be backed up. There can be multiple time point configurations.
2. Sub-thread backup is used, without affecting the continued service of the main thread
3. In the case of a large amount of data, RDB can start faster than AOF
1. The set time point is too long, which may result in redis downtime, etc. The data is not restored from the last time point to the downtime
Principle: It is more reliable. When redis data is modified, the command will be added to the aof file, and when redis restarts, the aof file will be executed again