How does Redis implement distributed locks

Distributed locks are a way to control synchronous access to shared resources between distributed systems. It is to solve the problem that different systems or different hosts of the same system share the same resource in a distributed system. It usually uses mutual exclusion to ensure program consistency. This is the purpose and execution principle of distributed locks.

There are four common implementations of distributed locks:

Distributed locks are implemented based on MySQL's pessimistic locks. This method is the least used, because the performance of this implementation is not good and it is easy to cause deadlocks. To
implement distributed locks based on Memcached, you can use the add method to achieve, if you add Success means that the distributed lock is created successfully; the distributed lock
based on Redis is also the focus of this lesson, which can be implemented using the setnx method; the
distributed lock based on ZooKeeper is implemented, and the ZooKeeper sequential temporary node is used to implement it.

MySQL's execution rate, efficiency, and deadlock problems, while Memcached and Redis are implemented in similar ways, Redis is more popular and preferred

And ZooKeeper can indeed implement distributed locks very well. However, the popularity of this technology in small and medium-sized companies is not high, especially non-Java technology stack companies use less, if only to implement distributed locks and rebuild a set of ZooKeeper cluster, obviously the implementation cost and maintenance cost are too high, so Based on the above factors, we will use Redis to implement distributed locks in this article.

The reason why the above four methods can be used to implement distributed locks is because the above four methods belong to the "external system" called by the program, and the distributed program needs to share the "external system". This is how the distributed lock can be The basic premise of realization.

Stand-alone lock

  • Pessimistic lock

It is data that adopts a conservative strategy for external modification. It believes that threads can easily modify the data. Therefore, the lock state will be adopted during the entire process of data modification. Until one thread is used up, other threads can continue to use it. Typical applications are synchronized;

  • Optimistic lock

Contrary to the concept of pessimistic lock, optimistic lock believes that under normal circumstances, data will not conflict when it is modified, so it will not lock before data access. The data will only be detected when the data is submitted for changes. Typical applications are ReadWriteLock read-write lock;

  • Reentrant lock

Also called recursive lock, it means that after the same thread acquires the lock in the outer function, then the inner function can continue to acquire the lock. In the Java language, ReentrantLock and synchronized are both reentrant locks;

  • Exclusive lock and shared lock

A lock that can only be held by a single thread is called an exclusive lock, and a lock that can be held by multiple threads is called a shared lock. Exclusive lock means that at most one thread can hold the lock at any time. For example, ReentrantLock is an exclusive lock; The ReadWriteLock read-write lock allows multiple threads to perform read operations at the same time, and it is a shared lock.

The reason why stand-alone locks cannot be used in distributed systems is that, in distributed systems, each request may be assigned to a different server, while stand-alone locks are effective on a single server. If there are multiple servers, the request will be distributed to different servers, which will cause the lock code to not take effect, which will cause many abnormal problems, so the single-machine lock cannot be applied in a distributed system.

Use Redis to implement distributed locks

Use Redis to implement distributed locks using setnx (set if not exists) specific code

setnx lock true
# 逻辑处理
del lock

When the return value is 1 after executing the setnx command, it means that the lock creation is successful, otherwise it fails. To release the lock, use del to delete. When other programs setnx fail, it means that the lock is in use, so that a simple distributed lock can be implemented.

But there is a problem with the above code, that is, the lock timeout is not set, so if an abnormal situation occurs, the lock will not be released, and other threads are waiting for the lock, which will cause the program to be unavailable.

127.0.0.1:6379> setnx lock true

(integer) 1 #创建锁成功

127.0.0.1:6379> expire lock 30 #设置锁的(过期)超时时间为 30s

(integer) 1 

#逻辑业务处理...

127.0.0.1:6379> del lock

(integer) 1 #释放锁

Because the setnx lock true and expire lock 30 commands are non-atomic, that is, one can only execute after the other is executed. However, if an abnormal situation occurs after the setnx command is executed, the expire command will not be executed, so the deadlock problem is still not solved.

This problem has not been effectively dealt with before Redis 2.6.12. The solution at that time was to perform atomic merge operations on the client side, so many client libraries were born to solve this atomic problem, but this increased the use of cost. Because you not only need to add a Redis client, but also in order to solve the lock timeout problem, you need to add a new class library, which increases the cost of use, but this problem has been effectively handled in Redis 2.6.12 .

In Redis 2.6.12, we can use a set command to perform key-value storage, and we can determine whether the key exists and set the timeout period, as shown in the following code:

set lock true ex 30 nx

ex is used to set the timeout time, and nx means not exists, which is used to determine whether the key exists. If the returned result is "OK", it means that the lock is created successfully, otherwise it means that the lock is being used.

Lock timeout

We set a timeout period of 10s for the lock, but the execution of the program needs to use 15s, then the lock will be released at the 10th second because of the timeout. At this time, thread 2 normally acquires the lock when executing the set command, so it is very short The lock was deleted after 2s within the time period, which caused the lock to be deleted by mistake, as shown in the following figure:

Thread 1 released the lock due to the lock timeout. At this time, thread 2 got the lock, and thread 1 accidentally deleted the lock of thread 2 after thread 2 was executed.

The solution to the accidental deletion of the lock is to set an attribution identifier for the value value when using the set command to create the lock. For example, when inserting a UUID into value, it is necessary to determine whether the UUID belongs to the current thread each time before deleting it, and delete it if it belongs to it, thus avoiding the problem of accidental deletion of the lock.

if(uuid.equals(uuid)){ // 判断是否是自己的锁

	del(lock); // 删除锁

}

Judgment and deletion should be executed in an atomic unit, so Lua scripts are needed to execute them. Executing Lua scripts in Redis can ensure the atomicity of this batch of commands. Its implementation code is as follows:

/**
 * 释放分布式锁
 * @param jedis Redis客户端
 * @param lockKey 锁的 key
 * @param flagId 锁归属标识
 * @return 是否释放成功
 */
public static boolean unLock(Jedis jedis, String lockKey, String flagId) {
    String script = "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end";
    Object result = jedis.eval(script, Collections.singletonList(lockKey), Collections.singletonList(flagId));
    if ("1L".equals(result)) { // 判断执行结果
        return true;
    }
    return false;
}

Among them, the Collections.singletonList() method is to convert String to List, because the last two parameters of jedis.eval() must be of type List.

Lock timeout can be solved by two solutions:

  • Remove time-consuming methods from the lock, reduce the execution time of the code in the lock, and ensure that the code can be executed before the lock timeout;
  • Set the lock timeout to be longer. Normally, after using the lock, we will call the delete method to manually delete the lock, so the timeout can be set a little longer.