Redis summary

table of Contents

Overview

What is Redis? Briefly describe its advantages and disadvantages?

Redis is essentially a Key-Value type in-memory database, much like Memcached. The entire database is loaded and operated in memory, and the data in the database is periodically saved to the hard disk through asynchronous operations.

Because it is a pure memory operation, Redis has excellent performance. It can handle more than 100,000 read and write operations per second. It is the fastest known Key-Value database.

Advantages :

  • The read and write performance is extremely high. Redis can read at 110,000 times/s and write at 81,000 times/s.
  • Support data persistence, support AOF and RDB two persistence methods.
  • Support transactions. All operations of Redis are atomic, meaning that they are either executed successfully or not executed at all if they fail. A single operation is atomic. Multiple operations also support transactions, that is, atomicity, packaged by MULTI and EXEC instructions.
  • Rich data structure, besides supporting string type value, it also supports hash, set, zset, list and other data structures.
  • Supports master-slave replication, the master will automatically synchronize data to the slave, and read and write can be separated.
  • Rich features-Redis also supports publish/subscribe, notification, key expiration and other features.

Disadvantages :

  • The database capacity is limited by physical memory and cannot be used for high-performance reading and writing of massive data. Therefore, the suitable scenarios for Redis are mainly limited to high-performance operations and calculations with a small amount of data.

Why use Redis/cache

  • High performance: When users need to access data in the database, if each visit goes to the database, the process will be very slow. If the data accessed for the first time is cached in the memory, the data can be read directly from the memory the next time the data is accessed again, which is very fast
  • High concurrency: Under high concurrency conditions, direct access to the database will bring great pressure to the database, so some data can be stored in the cache, and users can directly access the cache. The number of concurrency that the cache can withstand is much greater than that of the database.

Why is Redis so fast?

  • Memory storage: Redis uses in-memeroy storage without the overhead of disk IO. Data is stored in memory, similar to HashMap. The advantage of HashMap is that the time complexity of search and operation is O(1).
  • Single-threaded implementation (before Redis 6.0): Redis uses a single thread to process requests, avoiding the overhead of thread switching and lock resource contention among multiple threads. Note: Single thread means that in the core network model, the network request module uses one thread to process, that is, one thread processes all network requests.
  • Non-blocking IO: Redis uses multiplexing IO technology, and uses epoll as the realization of I/O multiplexing technology. In addition, Redis's own event processing model converts connections, reads and writes, and closes in epoll into events. , Do not waste too much time on network I/O.
  • Optimized data structure: Redis has many implementations of optimized data structure that can be directly applied, and the application layer can directly use the native data structure to improve performance.
  • The underlying model is different: Redis directly builds the VM (virtual memory) mechanism by itself, and realizes the separation of cold and hot data by itself, so that the hot data is still in the memory and the cold data is saved to the disk. In this way, you can avoid the problem of slow access speed due to insufficient memory
There are two ways for Redis to increase the database capacity: one is to split the data into multiple RedisServers; the other is to use virtual memory to exchange data that is not frequently accessed to disk. It is important to note that Redis does not use the Swap provided by the OS, but implements it by itself.

What are the advantages of Redis over Memcached?

  • Data type: All values ​​of Memcached are simple strings, Redis supports richer data types, support string (string), list (list), Set (collection), Sorted Set (ordered set), Hash ( Hash) etc.
  • Persistence: Redis supports persistent storage of data on the ground, which can keep the data in memory on the disk, and load it again for use when restarting. Memcached does not support persistent storage of data.
  • Cluster mode: Redis provides a master-slave synchronization mechanism, as well as Cluster deployment capabilities, which can provide highly available services. Memcached does not have a native cluster mode, and needs to rely on the client to write data to the shards in the cluster
  • Performance comparison: Redis is much faster than Memcached.
  • Network IO model: Redis uses a single-threaded multi-channel IO multiplexing model, Memcached uses a multi-threaded non-blocking IO mode.
  • Redis supports server-side data operations: Compared with Memcached, Redis has more data structures and supports richer data operations. Usually in Memcached, you need to take the data to the client to make similar modifications and then set it back. .
    This greatly increases the number of network IOs and data volume. In Redis, these complex operations are usually as efficient as general GET/SET. Therefore, if the cache is required to support more complex structures and operations, then Redis will be a good choice.

What are the common scenarios of Redis?

1. Cache

Caching is now a nirvana that is used by almost all medium and large websites. Reasonable use of caching can not only increase the speed of website access, but also greatly reduce the pressure on the database. Redis provides a key expiration function and a flexible key elimination strategy. Therefore, Redis is now used in many caching situations.

2. Ranking

Many websites have ranking applications, such as JD’s monthly sales rankings, and new rankings of products based on time. The ordered collection data structure provided by Redis can realize various complex ranking applications.

3. Counter

What is a counter, such as the number of views of products on e-commerce sites, the number of videos played on video sites, etc. In order to ensure real-time data efficiency, you must give +1 every time you browse, and it is undoubtedly a challenge and pressure to request database operations every time when the amount of concurrency is high. The incr command provided by Redis implements the counter function, memory operation, and the performance is very good, which is very suitable for these counting scenarios.

4. Distributed Session

In cluster mode, when there are not many applications, the session replication function that comes with the container can generally be used. When the number of applications increases and is relatively complex, a session service centered on a memory database such as Redis is generally built, and the session is no longer Managed by the container, but managed by the session service and the in-memory database.

5. Distributed lock

Distributed technology is used in many Internet companies. The technical challenge brought by distributed technology is the concurrent access to the same resource, such as global ID, inventory reduction, spike and other scenarios. The database can be used in scenarios with low concurrency. Pessimistic locks and optimistic locks are implemented, but in high concurrency situations, it is not ideal to use database locks to control concurrent access to resources, which greatly affects the performance of the database. You can use the setnx function of Redis to write distributed locks. If the setting returns 1, the lock acquisition is successful, otherwise the lock acquisition fails. There are more details to consider in actual applications.

6. Social network

Likes, dislikes, following/being followed, mutual friends, etc. are the basic functions of social networking sites. The traffic of social networking sites is usually relatively large, and traditional relational database types are not suitable for storing this type of data. Data structures such as Greek, collection, etc. can easily realize these functions. For example, mutual friends in Weibo can be easily obtained through Redis set.

7. The latest list

Redis list structure, LPUSH can insert a content ID as a keyword in the head of the list, LTRIM can be used to limit the number of lists, so that the list will always be N IDs, no need to query the latest list, go directly to the corresponding content page according to the ID. can.

8. Message system
Message queue is a necessary middleware for large websites, such as ActiveMQ, RabbitMQ, Kafka and other popular message queue middleware. It is mainly used for business decoupling, traffic peak reduction and asynchronous processing of low real-time services. Redis provides publish/subscribe and blocking queue functions, which can implement a simple message queue system. In addition, this cannot be compared with professional messaging middleware.

type of data

What are the data types of Redis?

There are 5 data types commonly used in Redis, including String, List, Set, Zset, Hash

type of dataValue that can be storedoperatingApplication scenario
StringString, integer or floating pointPerform operations on the entire string or part of the string; perform increment or decrement operations on integers and floating-point numbersConventional key-value caching application. Regular counting: number of Weibo, number of followers
HashUnordered hash table containing key-value pairsAdd, delete, modify and check a single key-value pair; get all key-value pairsStructured data, such as an object
ListOrdered listFollow the first-in-first-out principle, the bottom layer is realized by doubly linked list, so it supports forward and reverse double lookupStore some list-type data structures, such as fan lists, article comment lists, etc.
SetUnordered collectionAdd, delete, modify and check individual elements; calculate intersection, union, and difference; get elements randomly from the setIt is especially convenient to realize the functions of seeking mutual friends and paying attention to together
ZSet(SortedSet)Ordered setAdd, get, delete elements; get elements according to the score range or members; calculate the ranking of a keyIt is suitable for scenarios such as leaderboards and message queues with weights.

Three special data types:

  • Bitmap: Bitmap. Think of Bitmap as an array with bits as the unit. Each cell in the array can only store 0 or 1. The subscript of the array is called offset in Bitmap. Use Bitmap to achieve statistical functions and save more space. If you only need the binary status of the statistical data, such as whether there is a product, whether the user is present, etc., you can use Bitmap, because it can represent 0 or 1 with only one bit.
  • HyperLogLog : HyperLogLog is a type of data collection used to calculate the cardinality . The advantage of HyperLogLog is that when the number or volume of input elements is very large, the space required to calculate the cardinality is always fixed and small. Each HyperLogLog key only needs 12 KB of memory to calculate the base of close to 2^64 different elements.
    Scenario: Counting the UV of a webpage (that is, Unique Visitor, no repeat visitor, a person visits a certain website many times, but it is still only counted as one time).
    It should be noted that the statistical rules of HyperLogLog are based on probability, so the statistical results it gives have a certain error, and the standard error rate is 0.81%.
  • Geospatial : Mainly used to store geographic location information and perform operations on the stored information, applicable to scenarios such as the location of friends, nearby people, and taxi distance calculations.

Thread model

Why does Redis choose single thread?

Before Redisv6.0, Redis's core network model (network event processor: file event processor) chose to use a single thread to implement

The reasons are as follows:

  • Avoid excessive context switching overhead
  • Avoid the overhead of synchronization mechanism
  • Simple and maintainable: all underlying data structures do not need to be implemented as thread-safe

Is Redis really single threaded?

Redis 4.0

The file event dispatcher is single threaded

I/O multiplexing:

  • Redis needs to suspend the processing of an IO event at an appropriate time and switch to processing another IO event. In this way, redis is like a switch. When the switch is switched to the circuit of which IO event, which IO event is processed, the other IO Event processing is suspended. This is IO multiplexing technology. There are 3 implementation mechanisms for IO multiplexing: select, poll, epoll
Insert picture description here

At the same time, multithreading is introduced to handle asynchronous tasks, such as resource release, clean up dirty data, delete big keys, etc.

Insert picture description here

Redis 6.0
changes the IO model from multiplexing to multithreading

Insert picture description here

Why does Redisv6.0 introduce multithreading?

Redis's single-threaded mode will cause the system to consume a lot of CPU time on network I/O, thereby reducing throughput. The
solution is to use multi-core advantages to optimize network I/O.

Deletion strategy for expired keys

Redis expired key deletion strategy?

Redis's expired deletion strategy is: two strategies of lazy deletion and periodic deletion are used together

Lazy deletion :

  • Only when a key is accessed, will it be judged whether the key has expired, and it will be cleared if it expires. If it is not expired, no action will be taken and the original command will continue to be executed.
  • Advantages: Maximize saving of CPU resources
  • Disadvantages: Very unfriendly to memory. In extreme cases, a large number of expired keys may not be accessed again, so they will not be cleared, occupying a large amount of memory, and causing memory leaks.

Delete regularly

  • At regular intervals, a certain number of keys in the expires dictionary of a certain number of databases will be scanned, and the expired keys will be cleared
  • Advantages: The impact of the delete operation on the CPU can be reduced by limiting the duration and frequency of the delete operation. In addition, regular deletion can also effectively release the memory occupied by expired keys.
  • Disadvantages: It is difficult to determine the duration and frequency of the delete operation. When obtaining a key, if the expiration time of a key has arrived, but the periodic deletion has not been performed, then the value of this key will be returned. This is an error that cannot be tolerated by the business

What are the deletion strategies for expired keys?

Timed delete

  • While setting the expiration time of a key, we create a timer so that when the expiration time comes, the timer will immediately delete it.
  • Advantages: Timed deletion is the most memory-friendly, and the key that can save the memory can be deleted from the memory immediately once it expires.
  • Disadvantages: The most unfriendly to CPU. When there are more expired keys, deleting expired keys will take up part of the CPU time, which will affect the response time and throughput of the server.

Lazy deletion :

  • Only when a key is accessed, will it be judged whether the key has expired, and it will be cleared if it expires. If it is not expired, no action will be taken and the original command will continue to be executed.
  • Advantages: Maximize saving of CPU resources
  • Disadvantages: Very unfriendly to memory. In extreme cases, a large number of expired keys may not be accessed again, so they will not be cleared, occupying a large amount of memory, and causing memory leaks.

Delete regularly

  • At regular intervals, a certain number of keys in the expires dictionary of a certain number of databases will be scanned, and the expired keys will be cleared
  • Advantages: The impact of the delete operation on the CPU can be reduced by limiting the duration and frequency of the delete operation. In addition, regular deletion can also effectively release the memory occupied by expired keys.
  • Disadvantages: It is difficult to determine the duration and frequency of the delete operation. When obtaining a key, if the expiration time of a key has arrived, but the periodic deletion has not been performed, then the value of this key will be returned. This is an error that cannot be tolerated by the business

Memory related

There are 2000w data in MySQL, and only 20w data can be stored in redis. How to ensure that the data in redis is all hot data?

When the existing memory is greater than maxmemory, it will trigger Redis to actively eliminate the memory mechanism

Redis memory elimination mechanism?

Global key space selective removal

  • no-eviction: When the memory is not enough to hold the newly written data, the new write operation will report an error.
  • allkeys-lru: When the memory is insufficient to accommodate the newly written data, in the key space, remove the least recently used key. (LRU algorithm) (this is the most commonly used)
  • allkeys-lfu: When the memory is insufficient to accommodate the newly written data, in the key space, remove the least frequently used key (LFU algorithm)
  • allkeys-random: When the memory is insufficient to accommodate the newly written data, a key is randomly removed from the key space.

Selective removal of key space for setting expiration time

  • Volatile-lru: When the memory is not enough to accommodate the newly written data, remove the least recently used key from the key space with the expiration time set. ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
  • Volatile-lfu: When the memory is not enough to accommodate the newly written data, remove the least frequently used key from the key space with the expiration time set. ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
  • Volatile-random: When the memory is not enough to accommodate the newly written data, a key is randomly removed from the key space with the expiration time set. ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
  • Volatile-ttl: When the memory is insufficient to accommodate the newly written data, in the key space with the expiration time set, the key with the earlier expiration time is removed first.

How does Redis optimize memory?

You can make good use of collection type data such as Hash, list, sorted set, set, because usually many small Key-Values ​​can be stored together in a more compact way. Use hash tables as much as possible . The memory used by hash tables (which means that the number stored in a hash table is small) is very small, so you should abstract your data model into a hash table as much as possible. For example, if you have a user object in your web system, do not set a separate key for the user's name, surname, email, and password, but store all the user's information in a hash table

Persistence

Redis persistence mechanism?

In order to be able to reuse Redis data or prevent system failures, we need to write the data in Redis to disk space, that is, persistence. Redis provides two different persistence methods to store data in disks, one is called snapshots RDB, and the other is just append files AOF.

RDB

  • Redis default persistence mechanism
  • Write the snapshot of the data set in the memory to the disk (Snapshot) within the specified time interval. When it is restored, it reads the snapshot file directly into the memory.
  • Advantages: suitable for large-scale data recovery; low requirements for data integrity and consistency
  • Disadvantages: Make a backup at a certain interval, so if Redis is accidentally down, all the changes after the last snapshot will be lost.

AOF
records each write operation in the form of a log, and records all write instructions executed by Redis (read operations are not recorded). Only files can be appended but files cannot be rewritten. If Redis restarts, it will write according to the contents of the log file. The instruction is executed once from front to back to complete the data recovery work.

AOF adopts the file append method, and the file will become larger and larger. In order to avoid this situation, a new rewrite mechanism has been added. When the size of the AOF file exceeds the set threshold, Redis will start the content compression of the AOF file. Only the minimum instruction set that can restore data is retained.

  • Advantages: There are different synchronization mechanisms to ensure data integrity as much as possible
  • Synchronization for every modification: Synchronization is persistent, and every data change will be recorded to the disk immediately
  • Sync per second
  • Out of sync
  • Disadvantages: For the data of the same data set, the aof file is much larger than the rdb file, and the recovery speed is slower than the rdb. The operating efficiency of aof is slower than that of RDB, the efficiency of the synchronization strategy per second is better, and the efficiency of asynchronization is the same as that of RDB

How to choose RDB and AOF?

  • If the data is not so sensitive and can be regenerated from other places, then persistence can be turned off.
  • If the data is more important and you don't want to get it from other places, and you can withstand several minutes of data loss, such as caching, then you can only use RDB.
  • If it is used as an in-memory database, to use Redis persistence, it is recommended to enable both RDB and AOF, or periodically execute bgsave for snapshot backup. The RDB method is more suitable for data backup. AOF can ensure that data is not lost.

Redis4.0's optimization of the persistence mechanism?

The first half of the 4.0 AOF file is full data in RDB format, and the second half is incremental data in AOF format, as shown in the figure below:

Insert picture description here
  • Advantages: Hybrid persistence combines the advantages of RDB persistence and AOF persistence. Since most of them are in RDB format, the loading speed is fast. At the same time, combined with AOF, incremental data is saved in AOF mode, and data is less lost.
  • Disadvantages: Poor compatibility. Once the hybrid persistence is turned on, the aof file is not recognized in the version before 4.0. At the same time, because the previous part is in the RDB format, the readability is poor.

How to expand Redis persistent data and cache?

If Redis is used as a cache, use consistent hashing to achieve dynamic expansion and contraction.

If Redis is used as a persistent storage, a fixed key-to-nodes mapping relationship must be used, and the number of nodes cannot be changed once it is determined. Otherwise (that is, the situation where Redis nodes need to change dynamically), a system that can rebalance data at runtime must be used, and currently only Redis clusters can do this.

Cache exception

Cache penetration

Under high concurrency conditions, the data accessed by users does not exist in redis and the database, causing all requests to fall on the database, and the database crashes under a large number of requests in a short time.

solution

  • The interface layer adds data verification , such as user authentication verification, id is used for basic verification, and id<=0 is directly intercepted;
  • Cache of empty data : Data that cannot be retrieved from the cache is also not retrieved in the database. At this time, the key-value pair can also be written as key-null, and the effective time of the cache can be set to a short time, such as 30 seconds (set too Long will lead to normal conditions can not be used). This can prevent attacking users from repeatedly using the same id brute force attack
  • Use Bloom filter to hash all possible data into a bitmap (bitmap, each element can only record 1bit of data, 0/1), a data that must not exist will be this bitmap Block it, thereby avoiding query pressure on the underlying storage system
Bloom filter: introduces k (k>1) mutually independent hash functions to ensure that the process of judging the weight of elements is completed under a given space and misjudgment rate.
Its advantage is that the space efficiency and query time far exceed the general algorithm, but the disadvantage is that it has a certain misrecognition rate and difficulty in deletion.
The core idea of ​​Bloom-Filter algorithm is to use multiple different Hash functions to resolve "conflicts".
Hash has a conflict (collision) problem, and the values ​​of two URLs obtained with the same Hash may be the same. In order to reduce conflicts, we can introduce a few more hashes. If we find that an element is not in the set through one of the hash values, then the element is definitely not in the set. Only when all the Hash functions tell us that the element is in the set, can it be determined that the element exists in the set. This is the basic idea of ​​Bloom-Filter.
Bloom-Filter is generally used to determine whether an element exists in a large data set.

Cache breakdown

Under high concurrency conditions, if a large number of users access data that is not in the cache but in the database at the same time, the requests will fall on the database, causing excessive pressure on the database

solution:

  • Set hotspot data to never expire
  • Add mutex lock : add a lock to the request after the data does not exist in the cache, only one request can go to the database to access, update the cache after the data is obtained

Cache avalanche

When the cache server restarts or a large number of caches fail in a certain period of time, user requests are directly sent to the database, causing excessive pressure on the database

solution:

  • Caused by server restart: add multi-layer cache, redis, memcache
  • Add a random value to the expiration time of cached data to prevent a large amount of data from expiring at the same time.
  • Generally, when the amount of concurrency is not particularly high, the most used solution is to lock and queue.
  • Set hotspot data to never expire