前言

我們都知道，redis是基于內存的K-V數據庫。由于內存是斷電易失的，所以redis提供了相應的持久化機制。

本篇主要講解redis提供的RDB和AOF兩種持久化方式，以及他們的實現原理。

RDB

RDB（Redis DataBase）是指把某個時刻內存中的數據生成快照（snapshot），以dump.rdb文件的形式存在磁盤上。RDB每次生成的快照（snapshot）都是redis中的全量數據。

生成快照可以由兩個命令完成，分別是save和bgsave，先看下這兩個命令的描述

127.0.0.1:6379> help save

  SAVE -
  summary: Synchronously save the dataset to disk
  since: 1.0.0
  group: server

127.0.0.1:6379> help bgsave

  BGSAVE -
  summary: Asynchronously save the dataset to disk
  since: 1.0.0
  group: server

從描述上來看，這兩個命令實現的功能一模一樣，只是save是以同步的方式寫入磁盤，而bgsave是以異步的方式，bg就是Background的意思。

事實上調用save命令后，redis進程會被阻塞，直到快照生成完成，期間redis不能對外提供服務。而bgsave會調用linux的fork()函數來創建一個子進程，讓子進程來生成快照，期間redis依然可以對外提供服務。

了解了RDB的相關命令，再來思考下這個問題：假設redis中有6G數據，要給這6G數據生成一個快照，不可能在一瞬間完成，肯定會持續一段時間。那么從快照開始生成（t1），到快照生成成功（t2）的這段時間內，redis中被修改的數據應該怎么處理？持久化的數據應該是t1時刻的數據，還是t2時刻的數據呢？

對于save的方式來說，生成快照期間，redis不能對外提供服務，所以在t1到t2期間不會有數據被修改。但是對于bgsave方式來說，生成快照期間，redis依然可以對外提供服務，所以極有可能有些數據被修改。這時子進程是根據t1時刻的數據來生成快照的。t1到t2期間被修改的數據只能在下一次生成快照時處理。但是在t1到t2期間被修改的值，對外部調用方來說是可以實時訪問的。也就是說redis不僅要存儲快照生成點(t1)時刻的所有值，還要存儲變量的最新值。這樣的話，redis中6G的數據，在生成快照的時候，會瞬間變成12G。

但是事實并非如此，以性能著稱的redis肯定不允許這樣的事發生。那這個問題是如果解決的呢？這樣就不得不說copy on write機制了

copy on write（COW，寫時復制）是一種計算機程序設計領域的優化策略。

其核心思想是，如果有多個調用者（callers）同時請求相同資源（如內存或磁盤上的數據存儲），他們會共同獲取相同的指針指向相同的資源，直到某個調用者試圖修改資源的內容時，系統才會真正復制一份專用副本（private copy）給該調用者，而其他調用者所見到的最初的資源仍然保持不變。這過程對其他的調用者都是透明的（transparently）。

前文提到調用bgsave時，會調用linux系統的fork()函數來創建子進程，讓子進程去生成快照。fork()函數實現了copy on write機制。

如下圖所示，redis調用bgsave之后，bgsave調用fork。也就是在t1時刻，內存中的數據并不會為了兩個進程而復制成兩份，而是兩個進程中的指針都指向同一個內存地址。

此時子進程開始生成快照，如果在生成快照期間，redis中的數據被修改了，k3的值由c變成了d。操作系統僅僅會把k3復制一份，而沒有變化的k1和k2不會被復制。這就是寫時復制（copy on write）機制。可以看到此時子進程取到的數據還是t1時刻的數據，而redis對外提供的服務也能獲取最新數據。

此處用copy on write優化的前提是生成快照的過程持續的時間較短，期間只有少量的數據發生了變化。如果期間所有的數據都發生了變化，也就相當于真的把6G數據變成了12G。

寫時復制是一種優化思想，在JDK中也能看它的實現

配置

前文說RDB模式生成快照的命令是save和bgsave，但是在實際使用redis的時候，也沒見我們定期手動執行這兩個命令。所以快照的生成還有一種自動的觸發方式，在配置文件中可以找到相關的配置

################################ SNAPSHOTTING  ################################
#
# Save the DB on disk:
#
#   save <seconds> <changes>
#
#   Will save the DB if both the given number of seconds and the given
#   number of write operations against the DB occurred.
#
#   In the example below the behaviour will be to save:
#   after 900 sec (15 min) if at least 1 key changed
#   after 300 sec (5 min) if at least 10 keys changed
#   after 60 sec if at least 10000 keys changed
#
#   Note: you can disable saving completely by commenting out all "save" lines.
#
#   It is also possible to remove all the previously configured save
#   points by adding a save directive with a single empty string argument
#   like in the following example:
#
#   save ""

save 900 1
save 300 10
save 60 10000

save配置表示調用bgsave。save 60 10000表示如果在60秒內，超過10000個key被修改了，就調用一次bgsave。同理save 300 10表示300秒內，超過10個key被修改了，就調用一次bgsave。多個save不是互斥的，如果配置多個save，只要滿足其中一個就會執行bgsave，配置多個是為了適應不同的場景。

配置save ""或者注釋所有的save表示不開啟RDB。

從配置文件配置的save參數來看，如果每60秒執行一次bgsave，而在59秒的時候服務宕機了，這樣就丟失了59秒內修改的數據，因為還沒來得及生成快照。數據丟失量這么大，肯定是不被允許的。為此，redis還提供了另一種持久化方式，那就是AOF

AOF

AOF（Append Only File）是把對redis的修改命令以特定的格式記錄在指定文件中。也就是說RDB記錄的是數據快照，而AOF記錄的是命令。AOF默認是關閉的。

############################## AppEND ONLY MODE ###############################

# By default Redis asynchronously dumps the dataset on disk. This mode is
# good enough in many applications, but an issue with the Redis process or
# a power outage may result into a few minutes of writes lost (depending on
# the configured save points).
#
# The Append Only File is an alternative persistence mode that provides
# much better durability. For instance using the default data fsync policy
# (see later in the config file) Redis can lose just one second of writes in a
# dramatic event like a server power outage, or a single write if something
# wrong with the Redis process itself happens, but the operating system is
# still running correctly.
#
# AOF and RDB persistence can be enabled at the same time without problems.
# If the AOF is enabled on startup Redis will load the AOF, that is the file
# with the better durability guarantees.
#
# Please check http://redis.io/topics/persistence for more information.

appendonly no

# The name of the append only file (default: "appendonly.aof")

appendfilename "appendonly.aof"

如果開啟了AOF，相應的命令會記錄在appendonly.aof文件中。

appendonly.aof這個文件的內容本身也需要寫到磁盤中，如果appendonly.aof還未來得及寫入磁盤，服務就宕機了，也會造成appendonly.aof文件內容丟失，而丟失redis的修改命令，進而丟失redis的修改數據。

為此redis為appendonly.aof的持久化提供了三種配置方式：

# The fsync() call tells the Operating System to actually write data on disk
# instead of waiting for more data in the output buffer. Some OS will really flush
# data on disk, some other OS will just try to do it ASAP.
#
# Redis supports three different modes:
#
# no: don't fsync, just let the OS flush the data when it wants. Faster.
# always: fsync after every write to the append only log. Slow, Safest.
# everysec: fsync only one time every second. Compromise.
#
# The default is "everysec", as that's usually the right compromise between
# speed and data safety. It's up to you to understand if you can relax this to
# "no" that will let the operating system flush the output buffer when
# it wants, for better performances (but if you can live with the idea of
# some data loss consider the default persistence mode that's snapshotting),
# or on the contrary, use "always" that's very slow but a bit safer than
# everysec.
#
# More details please check the following article:
# http://antirez.com/post/redis-persistence-demystified.html
#
# If unsure, use "everysec".

# appendfsync always
appendfsync everysec
# appendfsync no

這三種方式都是通過參數appendfsync來指定。

no：并不是不持久化，只將數據寫到OS buffer，由操作系統決定何時將數據寫到磁盤，這種方式速度最快
always：每次在appendonly.aof中追加內容，都調用fsync()將數據寫入磁盤，這種方式最慢，但是最安全
everysec：默認配置，表示每秒調用一次fsync()，將數據寫入磁盤，是一種折中的方式

根據配置可以知道，如果每秒將appendonly.aof的內容寫到磁盤一次。那么在兩次寫磁盤的間隔，如果服務宕機了，還是有可能丟失部分命令，從而導致redis的修改數據丟失，不過相比于RDB來說，這種丟失已經非常非常小了。

除此之外，appendonly.aof文件是以追加的方式寫入命令，對于長時間運行的服務，必定會導致該文件過大。萬一服務宕機需要根據appendonly.aof文件恢復數據，將會消耗相當長的時間來執行appendonly.aof中記錄的命令。

為了解決appendonly.aof文件過大的問題redis提供了一種機制，叫bgrewriteaof。

bgrewriteaof

bgrewriteaof命令描述如下

127.0.0.1:6379> help bgrewriteaof

  BGREWRITEAOF -
  summary: Asynchronously rewrite the append-only file
  since: 1.0.0
  group: server

這個命令的作用就是fork()出一個子進程來對appendonly.aof文件進行重寫。這個重寫操作在redis4.0以前和4.0以后有不同的實現方式。

redis4.0以前的重寫主要有兩點：刪除抵消的命令、合并重復的命令。對于set key1 a和del key1這樣相互抵消的命令會被直接刪除。對于set key1 a和set key1 b這樣重復的命令會進行合并。這樣一通操作之后，AOF文件可能會變得很小。

redis4.0之后，開啟了RDB和AOF的混合模式。也就是將已有的數據以RDB的方式記錄在appendonly.aof文件的頭部，對于之后的增量數據以AOF的方式繼續追加在appendonly.aof文件中，也就是appendonly.aof文件前半段是快照數據，后半段是redis指令。

這樣的混合模式結合了RDB和AOF的優點，既能最大限度的減少數據丟失，又能在Redis重啟后迅速恢復數據。

那么在什么情況下會觸發bgrewriteaof呢？除了手動觸發，配置文件中提供了幾個相關參數來實現自動觸發

# Automatic rewrite of the append only file.
# Redis is able to automatically rewrite the log file implicitly calling
# BGREWRITEAOF when the AOF log size grows by the specified percentage.
#
# This is how it works: Redis remembers the size of the AOF file after the
# latest rewrite (if no rewrite has happened since the restart, the size of
# the AOF at startup is used).
#
# This base size is compared to the current size. If the current size is
# bigger than the specified percentage, the rewrite is triggered. Also
# you need to specify a minimal size for the AOF file to be rewritten, this
# is useful to avoid rewriting the AOF file even if the percentage increase
# is reached but it is still pretty small.
#
# Specify a percentage of zero in order to disable the automatic AOF
# rewrite feature.

auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

auto-aof-rewrite-min-size參數設置成64mb，意思是redis尚未執行過bgrewriteaof（從啟動開始算），AOF文件需要達到64mb才會第一次執行bgrewriteaof（此后不會再使用auto-aof-rewrite-min-size參數），redis會記錄每次執行bgrewriteaof之后，AOF文件的大小。

auto-aof-rewrite-percentage設置成100，表示當前的AOF文件大小超過上一次bgrewriteaof后AOF文件的百分比后觸發bgrewriteaof。如果上次bgrewriteaof后，AOF為200mb，現在需要AOF文件達到400mb才會執行bgrewriteaof。

auto-aof-rewrite-percentage設置成0，表示禁用bgrewriteaof。auto-aof-rewrite-min-size參數的作用就是在AOF文件比較小的時候，防止因為增長過快而頻繁調用bgrewriteaof。

no-appendfsync-on-rewrite

redis主進程在寫AOF文件采用always或者everysec配置，和子進程在重寫AOF文件的時候，都會產生大量的I/O操作。可能會使fsync阻塞很長時間，為了緩解這個問題，redis提供了no-appendfsync-on-rewrite這個參數

# When the AOF fsync policy is set to always or everysec, and a background
# saving process (a background save or AOF log background rewriting) is
# performing a lot of I/O against the disk, in some Linux configurations
# Redis may block too long on the fsync() call. Note that there is no fix for
# this currently, as even performing fsync in a different thread will block
# our synchronous write(2) call.
#
# In order to mitigate this problem it's possible to use the following option
# that will prevent fsync() from being called in the main process while a
# BGSAVE or BGREWRITEAOF is in progress.
#
# This means that while another child is saving, the durability of Redis is
# the same as "appendfsync none". In practical terms, this means that it is
# possible to lose up to 30 seconds of log in the worst scenario (with the
# default Linux settings).
#
# If you have latency problems turn this to "yes". Otherwise leave it as
# "no" that is the safest pick from the point of view of durability.

no-appendfsync-on-rewrite no

如果開啟該參數，表示在bgsave和bgrewriteaof的過程中，主線程寫入AOF不會調用fsync()，相當于配置appendfsync no。這樣有可能會導致redis的修改命令丟失，Linux默認配置下，最多丟失30秒的數據。

如果關閉該參數，表示在bgsave和bgrewriteaof的過程中，主線程寫入AOF會調用fsync()，并且被阻塞，這樣是最安全的，不會丟失數據。