Redis persistence strategy
AOF(Append Only File)
Three write-back strategies for AOF
Timing of AOF to write commands to disk
The AOF mechanism provides us with three choices, that is, the three optional values of the AOF configuration item appendfsync.
Always : Synchronous write-back: After each command is executed, the log will be written back to disk synchronously immediately
Everysec , write back every second, every write command, first write the log to the memory buffer of the AOF file, write once every 1 second
No , after each command is executed, the log is first written to the memory buffer of the AOF file, and the writing is controlled by the operating system. How is the to do operating system controlled?
|Configuration item||Write back timing||advantage||Missing point|
|Always||Synchronous write back||Reliable, at most one piece of data can be lost||Every write command must be placed on the disk, which has a large performance impact|
|Everysec||Write back every second||Moderate performance||Loss of data within one second of downtime|
|No||Operating system control writeback||Highest performance||Downtime may lose a lot of data|
AOF log rewriting mechanism
Reduce the size of the AOF log file, such as the three commands set key 1; set key 2; set key 3; only need to keep set key 3
AOF rewrite trigger mechanism
Manually execute the bgrewriteaof command.
Redis automatically triggers rewriting and triggers rules
When aof-current-size>=auto-aof-rewrite-min-size and (aof-current-size-aof-base-size)/aof-base-size>=auto-aof-rewrite-percentage, it will Trigger AOF rewrite.
Implementation process of AOP log rewriting
Fork a child process (bgrewriteaof) for rewriting, so that it will not affect the main process,
Fork uses Copy On Write
Create a new AOF log file and use the child process to rewrite the new AOF log
The new AOF log file is used because
Failure to rewrite does not affect the old AOF log file
There will be competition issues for multiple processes to operate on a file.
At the same time, every new operation of redis will be given to the buffer of the new AOF log to ensure that the rewritten log will not lose the latest operation
After the rewrite is successful, replace the old one with the new AOF file
Problems solved by AOF log rewriting
AOF log file is too large will cause performance problems
The file system itself has limitations on file size, and it is impossible to save files that are too large;
If the file is too large, and then add command records to it, the efficiency will be lower;
In the event of a downtime, the commands recorded in AOF must be re-executed one by one for failure recovery. If the log file is too large, the entire recovery process will be very slow, which will affect the normal use of Redis.
Snapshot file is called RDB file, RDB is Redis DataBase
RDB snapshot trigger mechanism
The save command will block the main thread until the RDB snapshot is generated, so it is generally rarely used.
The bgsave command, similar to the aof rewrite, is completed by the fork sub-process. The fork sub-process is often very fast and the blocking time is relatively short.
Redis is automatically triggered, which is mainly triggered in the following three ways:
save mn configuration, which means that there are n modification operations within m seconds, and it is automatically triggered
Master-slave full replication
Execute the shutdown command to close redis. If the AOF log is not enabled, it will be triggered.
Full snapshot bgsave
Fork a child process of bgsave
Fork uses Copy-On_Write (COW), the data in the main process memory is modified, and the child process will copy a copy of the data, so that the write of the main process and the snapshot of the bgsave child process will not affect each other
The bgsave child process reads the memory data of the main process and writes them to the RDB file.
Modifications can be made during the snapshot, and the saved RDB file is still the data when the snapshot is started
After a full snapshot is taken, subsequent snapshots only record the modified data, which can avoid the overhead of each full snapshot.
After taking a full snapshot for the first time, if you take a snapshot at T1 and T2, we only need to write the modified data into the snapshot file. However, the premise of this is that we need to remember which data has been modified . Don't underestimate this "remember" function, it requires us to use additional metadata information to record which data has been modified, which will bring additional space overhead. As shown below:
If we make a record for each key-value pair modification, then if there are 10,000 modified key-value pairs, we need 10,000 additional records. Moreover, sometimes, the key-value pair is very small, such as only 32 bytes, and to record its modified metadata information, it may require 8 bytes. Such a painting, in order to "remember" the modification, introduces additional space The overhead is relatively large. For Redis, where memory resources are precious, some of the gains outweigh the gains.
RDB snapshot + AOF hybrid persistence
Hybrid persistence starts with Redis 4.0, and memory snapshots are executed at a certain frequency. Between two snapshots, AOF logs are used to record all command operations during this period.
Turn on hybrid persistence
# aof-use-rdb-preamble yes
Hybrid persistence process
Take full snapshots at regular intervals, and clear the AOF log file after the snapshot is completed
Each operation is recorded through AOF log
Comparison of persistence methods
|Comparison item||RDB||AOF||AOF+RDB hybrid|
|take up space||Small (data compression)||Large (not compressed at the command level)||Slightly larger than RDB|
|Store once speed||slow||fast|
|Recovery speed||fast||slow||Slightly faster with AOF|
|Data security||Lost data||Decide according to strategy||Determined according to the AOF strategy used|
|Open state||On by default||Off by default||Off by default|
When the data cannot be lost, the mixed use of memory snapshot and AOF is a good choice;
If you allow minute-level data loss, you can only use RDB;
If only AOF is used, the everysec configuration option is preferred because it strikes a balance between reliability and performance.