Redis is called Remote DIctionary Server, essentially a key-value storage system, which is a cross-platform non-relational database. The official definition of Redis is:
Redis is an open source (BSD licensed), in-memory data structure store, used database, cache and message broker.
Translated into Chinese is: Redis is an open source (BSD license), memory-based data structure storage system that can be used as a database, cache, and message middleware. From the official introduction, Redis has very rich functions, and our most commonly used function is mainly caching.
It supports multiple types of data structures, including string, hash, list, set, and sorted set. In addition to these basic types, it also includes range queries , Bitmaps, hyperloglogs and geospatial index radius query. Redis has built-in replication, LUA scripts, LRU-driven events, transactions, and different levels of disk persistence, and provides high availability through Redis Sentinel and automatic partitioning (Cluster).
Compared with other key-value storage systems, Redis has the following characteristics:
- Supports data persistence, the data in the memory can be saved in the disk, and can be loaded again for use when restarting. In the disk format, data is generated by appending, so random access is not required.
- Not only supports simple key-value data caching, but also supports storage of list, set, zset, hash and other data. All data types are transparent to the programmer and no additional abstraction is required.
- Supports data backup in master-slave mode.
In general, Redis has the following advantages:
- Very high performance. The reading speed is 110000 times/s, and the writing speed is 81000 times/s.
- Rich data types. Support string, list, hash, sets, ordered set and other data types.
- Atomicity. All operations are atomic, a single operation is atomic, multiple operations can support transactions, and are encapsulated by MULTI and EXEC instructions.
- Rich features. Redis also supports features such as subscription/publishing, notification, and key expiration.
Redis supports data persistence. The data in the memory can be saved to the disk, and it can be loaded again for use when restarting. Redis provides two persistence methods, RDB (Redis DataBase) and AOF (Append Only File).
RDB (Redis DataBase)
The RDB method is to generate a snapshot of Redis data at a certain moment and store it on a disk and other media. Specifically, during the process of data persistence, Redis will first write the data to a temporary file, and when the persistence process is over, the temporary file will be used to replace the last persisted file. In this case, the snapshot file is always fully available, so backup and restore operations can be performed at any time.
The RDB method has a great advantage in performance. When using the RDB method, Redis will fork another child process for persistence, and the main process will not perform any IO operations, thus ensuring the performance of Redis. If large-scale data recovery is required, and the integrity requirements for data recovery are not very high, the RDB method is more efficient than the AOF method.
However, relatively speaking, the disadvantage of RDB lies in its weak guarantee of data integrity. If the persistence is set every 10 minutes, then when Redis fails, there will be nearly 10 minutes of data loss. In this situation, it is better to use AOF.
AOF (Append Only File)
As the name implies, AOF rewrites the file by appending, records the executed write instructions, and executes all instructions in the order from front to back when data is restored.
Since AOF uses an append method, the AOF file will become larger and larger after a period of time. For this reason, Redis provides an AOF file rewriting mechanism. When the size of the AOF file exceeds the set threshold, Redis will start the content compression of the AOF file, and only retains the minimum instruction set that can restore the data. Simply put, it is through some mechanism to merge some instructions into one instruction, but still guarantee the invariance of the result after the instruction is executed.
The default AOF persistence strategy is to record the instructions in the cache to the disk every second, so even if Redis fails, only the most recent 1 second of data will be lost. When rewriting AOF, it is still the process of writing temporary files first, and replacing them after all. Therefore, the availability of AOF files will not be affected by situations such as full disk space, inode full or power failure, and Redis provides redis- The check-aof tool can be used to repair logs.
When AOF rewriting is performed internally, Redis will first fork a sub-process responsible for rewriting. This sub-process will first read the existing AOF file, then analyze the instructions contained in it and compress it and write it into a temporary file. . At the same time, the main process accumulates the received write instructions into the memory buffer while continuing to write to the original AOF file. This is to ensure the availability of the original AOF file and avoid accidents during the rewriting process. When the child process completes the rewriting work, it will send a signal to the main process, and then the main process will append the write instruction cached in the memory to the new AOF file. When the append is over, Redis will replace the original AOF file with the new AOF file. The new commands that appear afterwards will be appended to the new AOF file.
In fact, RDB and AOF can be used in combination, which is also the official recommendation. When Redis restarts, because the AOF method has a higher data recovery integrity, the AOF method will be preferentially used for data recovery.
According to the official introduction of Redis, Redis is essentially a key-value data storage system, that is, a large hash table. In our actual application, we can usually operate this hash table in the code by using Redis command line or Redis API in various languages. The platform for this operation is the Redis client, and the location of the hash table is the Redis server. Therefore, Redis is actually a C/S architecture system. The client and server in this architecture can be on the same machine or on different machines.
Single threaded Redis
It should be noted that the server side of Redis is a single-threaded server, which processes client requests based on the event-loop mode. The advantages of using single thread are:
- There is no need to consider thread safety issues. Many operations do not require locking, which simplifies the development process and improves performance.
- There is no need to use various thread locks. Reduce performance consumption caused by thread switching.
The reason Redis was designed to be single-threaded at first is because its operations are all based on memory, and CPUs generally do not become the bottleneck of Redis. In order to give full play to the performance of the cpu, you can usually start multiple Redis instances on a single machine, that is, start multiple processes. Since Redis is a non-relational database, there are no constraints between data. The client only needs to clear the correspondence between the key and the Redis process.
Single-threaded Redis can execute instructions quickly, and because it uses non-blocking IO multiplexing, it uses a single thread to poll descriptors, and converts database opening, closing, reading, and writing into events, reducing Context switching and competition during thread switching are eliminated. Thus, by reusing the same thread, the throughput of the system is still guaranteed when multiple connections are made.
The single-threaded core efficiency is very high. In today's computing environment, with more and more Redis clients, the memory and throughput of a machine are limited after all. Even a single machine and multithreading often cannot meet the needs of the business, so single-threaded multi-server clustering The solution is the more commonly used solution nowadays. For these solutions, client requests will be distributed to different Redis servers through a load balancing algorithm (usually a consistent hash algorithm), thereby reducing the access pressure of a single server. The cluster strategy actually expands the cache capacity and improves the throughput of the server.
However, the version after Redis 6.0 abandoned the design of the single-threaded model and began to introduce the multi-threaded model. The reason why multithreading is used is that in some respects, the single-threaded design no longer has its advantages.
One of the main manifestations is that read and write operations will take up most of the CPU in Redis. Separating read and write operations through multithreading will greatly improve performance. It should be noted that the multi-threaded part of Redis is only used to process network data read and write protocol analysis, and the execution of internal commands is still in a single-threaded mode. This way, while introducing multi-threading to improve performance, it also maintains the advantages that the original single-threaded design can bring.
Redis master-slave synchronization
We mentioned the Redis clustering solution above. This solution can effectively improve the working efficiency of Redis, but there are still some problems, such as:
- Poor data availability. If one of Redis fails, all the cached data above will be lost. As a result, the data that can be obtained from the cache can only be obtained by accessing the database, which increases the pressure on the database.
- Data query is slow. When the access volume of a certain data is very high, that is, there are a large number of requests to query the same data, then the access volume of the Redis machine where the data is located will be very high, and it is difficult to support such a high load if the throughput is insufficient.
In view of the above problems, we can analyze step by step:
- To solve the problem of data availability, we can use the master-slave mode commonly used in databases, and add a slave to each Redis machine to implement data backup. If the master is down, the slave can be upgraded to the master to ensure high data availability. When the master has too much access, it can forward a part of the request for the slave to process, that is, the master is responsible for reading and writing or only writing, and the slave can only be responsible for reading.
- In order to make the master-slave mode more powerful, we can connect more slaves to each master, but in this case, the workload of data backup for the master becomes larger. That is, every time a slave is added, the master must back up one more time. Therefore, the master-slave chain mode can be used to let the slave be responsible for the backup work of the slave under it, thereby releasing the master.
Based on the above analysis, we can get the final master-slave architecture as follows:
In this way, multiple first-level slaves can assist the master to ensure data availability and query efficiency, and reduce the number of backup synchronizations between the master and the slave. The backup task is issued to the lower-level slave, which reduces the master's Backup pressure.