Distributed development (3) --- Redis must knowFive, expired key processing strategy

In distributed development, Redis is still used more, because of its high performance, high concurrency, and it can easily support tens of thousands of QPS. Here is a summary, let everyone not only simple Set and Get operations.

1. What is Redis?

Redis is an open source (BSD licensed), in-memory data structure storage system, which can be used as a database, cache, and messaging middleware. It supports multiple types of data structures, such as strings, hashes, lists, sets, sorted sets and range queries, bitmaps, hyperloglogs, and geospatial ( geospatial) Index radius query. Redis has built-in replication (replication), LUA scripting (Lua scripting), LRU eviction, transactions, and different levels of disk persistence, and through Redis sentinel and automatic partitioning (Cluster) ) Provide high availability.

It boils down to one sentence: Redis is a memory- based high-performance KV database that supports rich data types (strings, hashes, lists, sets, sorted sets, etc.).

2. Why is Redis so fast?

Let's first look at the test results of Alibaba Cloud on Redis QPS

Why can single-threaded Redis support 10w+ QPS?
There are mainly three aspects:

  • Pure memory operation
  • Single-threaded operation avoids context switching problems in multi-threaded situations
  • Adopts non-blocking I/O multiplexing mechanism

The emphasis here is on the non-blocking I/O multiplexing mechanism.

Non-blocking I/O multiplexing

image

Referring to the above figure, in simple terms, Redis uses network communication based on the Reactor mode, uses I/O multiplexing program to monitor multiple sockets at the same time, puts them in the queue, and then the file event dispatcher, in turn from Take it from the queue and forward it to different event handlers.

3. Common data types and corresponding scenarios

Types of

Features

scenes to be used

strings

The most basic type, the value can be a String or a number, and the maximum storage capacity is 512MB.

Common commands: get, set, incr, decr, mget, etc.

Cache, atomic counter, etc.

hashes

A mapping between a string field and a string value, hash is particularly suitable for storing objects

Use an object to store data (such as user information)

lists

Ordered and repeatable list, sorted in the order of insertion,

Simple message queue

sets

Unordered non-repeatable list

Global de-
duplication, through operations such as intersection, union, difference, etc., to realize the calculation of common preferences, all preferences, and your own unique preferences, etc.

sorted sets

An ordered set with points for each element

Ranking application, take TOP N operation

Fourth, Redis persistence

Redis provides different levels of persistence:

  1. The RDB persistence method can perform snapshot storage of your data at a specified time interval.
  2. The AOF persistence method records each write operation to the server. When the server restarts, these commands will be re-executed to restore the original data. The AOF command uses the redis protocol to append and save each write operation to the end of the file. Redis can also perform AOF The file is rewritten in the background so that the volume of the AOF file is not too large.
  3. If you only want your data to exist while the server is running, you can also not use any persistence methods.
  4. You can also enable the two persistence methods at the same time. In this case, when redis restarts, the AOF file will be loaded first to restore the original data, because under normal circumstances the data set saved by the AOF file is saved than the RDB file. The data set must be complete.

The difference between RDB and AOF persistence:

RDB

RDB persistence is to periodically dump the database records of Reids in memory to the RDB on the disk for persistence.

Advantages:
1. When the RDB saves the RDB file, the only thing the parent process needs to do is to fork a child process. The next work is done by the child process. The parent process does not need to do other IO operations, so the RDB persistence method can be maximized The performance of redis.
2. Compared with AOF, the RDB method will be faster when recovering large data sets.

Disadvantages:
RDB is stored in units of time, such as a complete save every 5 minutes or longer. In case of unexpected downtime in Redis, you may lose several minutes of data.

AOF

AOF persistence is to write the operation log of Reids to the file in an appended manner.

Advantages:
1. Using AOF will make your Redis data more secure: You can use different fsync strategies: no fsync, fsync per second, fsync every time you write. Using the default fsync per second strategy, Redis performance is still very good Good (fsync is processed by a background thread, and the main thread will try its best to process client requests). Once a failure occurs, you can lose up to 1 second of data.
2. Redis can automatically update the AOF file when the size of the AOF file becomes too large. Rewrite AOF in the background

Disadvantages:
For the same data set, the volume of the AOF file is usually larger than the volume of the RDB file.

Five, expired key processing strategy

Redis has two ways to clear expired keys: regular deletion + lazy deletion.

Periodic deletion strategy : By default, Redis randomly extracts some keys with an expiration time set every 100ms, checks whether they expire, and deletes them if they expire. (It will cause many keys to be deleted when the time expires)
Lazy deletion strategy : When obtaining a key, redis will check first, if the key is set with an expiration time, is it expired? If it expires, it will be deleted at this time. (If you never request a key, then this key will always be saved)

According to the above analysis, some expired keys will be omitted and cleared, and they will always be stored in the Redis server. The memory of redis will get higher and higher. Then the memory elimination mechanism should be adopted .
There is a line of configuration in redis.conf

# maxmemory-policy volatile-lru

Redis provides 8 memory elimination mechanisms:

volatile-lru : Select the least recently used data from the data set with an expiration time set (server.db[i].expires)
volatile-ttl : Eliminate the data set with an expiration time (server.db[i]. Expires) select the data that will expire
volatile-random : select any data from the data set with the expiration time set (server.db[i].expires). Eliminate
allkeys-lru : when the memory is not enough to accommodate the newly written data , In the key space, remove the least recently used key (this is the most commonly used)
allkeys-random : arbitrarily select data out of the data set (server.db[i].dict).
no-eviction : Prohibit eviction of data, the default configuration, that is, when the memory is not enough to accommodate the newly written data, the new write operation will report an error. No one should use this!
After version 4.0, the following two types have been added:
volatile-lfu : select the least frequently used data from the data set (server.db[i].expires) with an expiration time set to eliminate
allkeys-lfu : when the memory is insufficient to accommodate new writes When importing data, in the key space, remove the least frequently used key

Six, what are the disadvantages of using redis

1. Double-write consistency of cache and database

Consistency is a common problem in distributed systems. If the database and cache are double-written, there will inevitably be inconsistencies. We can only guarantee final consistency. Therefore, data with strong consistency requirements cannot be cached.

Here is a brief talk about the update strategy: update the database first, and then delete the cache.

Second, because there may be a problem of failure to delete the cache, the message queue can be used to compensate.

2. Cache penetration and cache avalanche issues

Cache penetration , that is, deliberately requesting data that does not exist in the cache (such as data with an id of "-1"), causing all requests to be sent to the database, resulting in abnormal database connection.

Solution:
1. Add verification at the interface layer, such as direct interception with id<=0;
2. Use bloom filters to maintain a series of legal and valid keys internally.

Cache avalanche , that is, the cache expires in a large area at the same time . At this time, another wave of requests comes. As a result, the requests are all sent to the database, which leads to excessive database pressure and abnormal connections.

Solution:
1. Add a random value to the expiration time of the cache to avoid collective failure.
2. Use mutex locks.