Redis persistence strategy

Redis persistence strategy

AOF(Append Only File)

Three write-back strategies for AOF

Timing of AOF to write commands to disk

The AOF mechanism provides us with three choices, that is, the three optional values ​​of the AOF configuration item appendfsync.

Always : Synchronous write-back: After each command is executed, the log will be written back to disk synchronously immediately

Everysec , write back every second, every write command, first write the log to the memory buffer of the AOF file, write once every 1 second

No , after each command is executed, the log is first written to the memory buffer of the AOF file, and the writing is controlled by the operating system. How is the to do operating system controlled?

Configuration itemWrite back timingadvantageMissing point
AlwaysSynchronous write backReliable, at most one piece of data can be lostEvery write command must be placed on the disk, which has a large performance impact
EverysecWrite back every secondModerate performanceLoss of data within one second of downtime
NoOperating system control writebackHighest performanceDowntime may lose a lot of data

AOF log rewriting mechanism

Reduce the size of the AOF log file, such as the three commands set key 1; set key 2; set key 3; only need to keep set key 3

AOF rewrite trigger mechanism

Manually execute the bgrewriteaof command.

Redis automatically triggers rewriting and triggers rules

When aof-current-size>=auto-aof-rewrite-min-size and (aof-current-size-aof-base-size)/aof-base-size>=auto-aof-rewrite-percentage, it will Trigger AOF rewrite.

auto-aof-rewrite-min-size:表示AOF重写时的最小文件大小。:上次重写后的文件大小auto-aof-rewrite-percentage:表示当前文件大小(aof-current-size)和上次重写后文件大小(aof-base-size)的比值

Implementation process of AOP log rewriting

Fork a child process (bgrewriteaof) for rewriting, so that it will not affect the main process,

Fork uses Copy On Write

Create a new AOF log file and use the child process to rewrite the new AOF log

The new AOF log file is used because

Failure to rewrite does not affect the old AOF log file

There will be competition issues for multiple processes to operate on a file.

At the same time, every new operation of redis will be given to the buffer of the new AOF log to ensure that the rewritten log will not lose the latest operation

After the rewrite is successful, replace the old one with the new AOF file

Problems solved by AOF log rewriting

AOF log file is too large will cause performance problems

The file system itself has limitations on file size, and it is impossible to save files that are too large;

If the file is too large, and then add command records to it, the efficiency will be lower;

In the event of a downtime, the commands recorded in AOF must be re-executed one by one for failure recovery. If the log file is too large, the entire recovery process will be very slow, which will affect the normal use of Redis.

RDB snapshot

Snapshot file is called RDB file, RDB is Redis DataBase

RDB snapshot trigger mechanism

The save command will block the main thread until the RDB snapshot is generated, so it is generally rarely used.

The bgsave command, similar to the aof rewrite, is completed by the fork sub-process. The fork sub-process is often very fast and the blocking time is relatively short.

Redis is automatically triggered, which is mainly triggered in the following three ways:

save mn configuration, which means that there are n modification operations within m seconds, and it is automatically triggered

Master-slave full replication

Execute the shutdown command to close redis. If the AOF log is not enabled, it will be triggered.

Full snapshot bgsave

Fork a child process of bgsave

Fork uses Copy-On_Write (COW), the data in the main process memory is modified, and the child process will copy a copy of the data, so that the write of the main process and the snapshot of the bgsave child process will not affect each other

The bgsave child process reads the memory data of the main process and writes them to the RDB file.

Modifications can be made during the snapshot, and the saved RDB file is still the data when the snapshot is started

Incremental snapshot

After a full snapshot is taken, subsequent snapshots only record the modified data, which can avoid the overhead of each full snapshot.

After taking a full snapshot for the first time, if you take a snapshot at T1 and T2, we only need to write the modified data into the snapshot file. However, the premise of this is that we need to remember which data has been modified . Don't underestimate this "remember" function, it requires us to use additional metadata information to record which data has been modified, which will bring additional space overhead. As shown below:

If we make a record for each key-value pair modification, then if there are 10,000 modified key-value pairs, we need 10,000 additional records. Moreover, sometimes, the key-value pair is very small, such as only 32 bytes, and to record its modified metadata information, it may require 8 bytes. Such a painting, in order to "remember" the modification, introduces additional space The overhead is relatively large. For Redis, where memory resources are precious, some of the gains outweigh the gains.

RDB snapshot + AOF hybrid persistence

Hybrid persistence starts with Redis 4.0, and memory snapshots are executed at a certain frequency. Between two snapshots, AOF logs are used to record all command operations during this period.

Configuration

Turn on hybrid persistence

# aof-use-rdb-preamble yes

Hybrid persistence process

Take full snapshots at regular intervals, and clear the AOF log file after the snapshot is completed

Each operation is recorded through AOF log

Comparison of persistence methods

Comparison itemRDBAOFAOF+RDB hybrid
take up spaceSmall (data compression)Large (not compressed at the command level)Slightly larger than RDB
Store once speedslowfast 
Recovery speedfastslowSlightly faster with AOF
Data securityLost dataDecide according to strategyDetermined according to the AOF strategy used
Open stateOn by defaultOff by defaultOff by default

When the data cannot be lost, the mixed use of memory snapshot and AOF is a good choice;

If you allow minute-level data loss, you can only use RDB;

If only AOF is used, the everysec configuration option is preferred because it strikes a balance between reliability and performance.