What are cache penetration, breakdown, and avalanche?

I believe that those who have known about caching should also often hear the words "cache penetration, cache breakdown, and cache avalanche". Recently, the author has also used caching in the actual development process. I also understand that these are issues that we have to consider when using caching. This article will focus on these words for everyone to analyze.

Cache penetration

definition

It means that when querying a data that must not exist, it will bypass the cache and directly request to the DB. When the access traffic is too large, it will cause the DB to hang.

solution
  • If the accessed key fails to find the corresponding value in the DB, the empty value is also written into the cache, but a shorter expiration time can be set. In this way, it can reduce the invalid query of the non-existent data to the DB, thereby reducing the pressure on the DB.
  • Use bloom filter, use a bitmap large enough to store the keys that may be accessed, and the non-existent keys are directly filtered, thus avoiding the query to the DB;
Bloom filter

definition

In essence, Bloom filter is a data structure, probabilistic data structure, which is characterized by efficient insertion and query, and can be used to tell you "something must not exist or may exist". Compared with the traditional List, Set, Map and other data structures, it is more efficient and takes up less space, but the disadvantage is that the returned results are probabilistic, not exact. The underlying data structure is a bit array, the maximum length of the array is 2 to the 32nd power, 512M. In fact, each bit array only occupies one bit (0 or 1). Normal business roughly applies for 10,000 element bits, and only 1250B of space is required. Performance and memory usage are not a problem.

Principle

Bloom filter principle


When you want to determine whether a value is in the Bloom filter, perform multiple hash calculations on the element, and after the value is obtained, determine whether each element in the bit array is 1. If the value is 1, then the value is In the Bloom filter, if there is a value other than 1, it means that the element is not in the Bloom filter.

Cache breakdown

definition

For an existing hot key, when the cache expires, there are a large number of requests at the same time, and these requests will penetrate the DB, causing a large amount of instantaneous DB requests and a sudden increase in pressure. To put it simply, a large number of concurrent requests happened during the hot key failure period, which caused the DB pressure to increase.

solution
  • Use distributed lock
    to set another short-term key to lock the access of the current key when the cache is invalid (the judged value is empty). When the operation returns successfully, load again db operation and reset the cache; otherwise, retry the entire get cache method, the code implementation is as follows:
public String get(key) {

    String value = redis.get(key);

    if (value == null) { //代表缓存值过期
        //设置2min的超时,防止del操作失败的时候,下次缓存过期一直不能load db
        if (redis.setnx(key_mutex, 1, 2 * 60) == 1) {  //代表设置成功
            value = db.get(key);
            redis.set(key, value, expire_secs);
            redis.del(key_mutex);
        } else {  //这个时候代表同时候的其他线程已经load db并回设到缓存了,这时候重试获取缓存值即可
            sleep(60);
            get(key);  //重试
        }
    } else {
        return value;      
    }
}

  • Set up a "never expire" lock. There is indeed no expiration time set, which ensures that there will be no hot key expiration problem, that is, "physical" does not expire; from a functional point of view, if it does not expire, then it becomes static? So we store the expiration time in the value corresponding to the key. If it is found to be expired, the cache is constructed through a background asynchronous thread, which is "logical" expiration. This method is still very friendly. The only disadvantage is that when the cache is built, the remaining threads may access old data. The approximate code is as follows:
String get(final String key) {  
    V v = redis.get(key);  
    String value = v.getValue();  
    long timeout = v.getTimeout();  
    if (v.timeout <= System.currentTimeMillis()) {  
        // 异步更新后台异常执行  
          threadPool.execute(new Runnable() {  
              public void run() {  
                  String keyMutex = "mutex:" + key;  
                  if (redis.setnx(keyMutex, "1")) {  
                        // 3 min timeout to avoid mutex holder crash  
                        redis.expire(keyMutex, 3 * 60);  
                        String dbValue = db.get(key);  
                        redis.set(key, dbValue);  
                        redis.delete(keyMutex);  
                  }  
              }  
          });  
    }  
    return value;  
}

Cache avalanche

definition

The same expiration time is set for the cache, which causes the cache to take effect at a certain moment, all requests are sent to the DB, and the pressure on the DB increases instantly, causing an avalanche

solution

It is also known from the definition that the main cause of the avalanche is to set the same expiration time. You can add a random time to the cache expiration time to stagger the expiration time of different keys and reduce the probability of failure at the same time.

The above is the content shared for everyone this time, comments are welcome!