Locking is a very common concept in programming. Wikipedia’s definition describes it pretty accurately:
In computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. A lock is designed to enforce a mutual exclusion concurrency control policy.
Simply put, a lock is a single point of reference for multiple threads to check whether or not they are allowed to access a resource. So if, for example, a thread wants to write data somewhere, it must first check if a write lock already exists. If the write lock exists it must wait until the lock is released before it can obtain its own lock and perform its write. This way, the lock prevents multiple threads from writing simultaneously which might otherwise lead to adverse effects like corruption.
Modern operating systems have built-in functionality to help you with concurrency control, such as
flock. But what if you are running multiple threads of your program on more than one machine? What if you need to control access to a resource across a distributed system?
Use a central lock server
First, we need something accessible from all our threads to store our lock. The lock can only exist in one place, to ensure there is only one authoritative place that defines whether or not the lock is set.
Redis is an ideal candidate for this. As a lightweight in-memory database, it is fast, transactional and consistent, which are the key qualities we require for a locking system.
Designing the lock
The lock itself is easy. It’s simply a key in a Redis database. Setting and unsetting the lock and making it crash safe is where it gets trickier. Let’s list some of the potential pitfalls:
- Your program interacts with Redis over a network, which means there is latency between a command being issued by your program and it being run by the Redis database. During this time, Redis will be running other commands, and the state of the data in Redis could diverge from what your program expects. How does one thread of your program establish a lock only once without clashing with other threads?
- What if your program crashes immediately after it sets the lock, and never unsets it? The lock might stay in place indefinitely and you end up with a deadlock.
Setting the lock
That would be a simple method, but it does not guarantee an exclusive lock. Recall pitfall #1. Because there is latency between our
SET commands, we have no way of knowing if another thread managed to set the lock during the time it took our commands to reach and return from the Redis server. Sure, this is down to a matter of milliseconds and may have a fairly low chance of happening, but in a busy environment running lots of concurrent threads and commands, the likelihood of overlapping is not negligible.
To help counter this, we should use
SETNX eliminates the need for a preceding
GET round-trip because it will succeed only if the key doesn’t already exist when the command runs. So that means only one thread will be able to run a successful
SETNX while the others will fail and will need to retry until they establish the lock.
Unsetting the lock
Once your thread has run a successful
SETNX command, it has established the lock and can do its work with the resource. After this work is completed, it should release the lock by deleting the redis key, allowing other threads to establish the lock as soon as possible.
However, beware! Here lies pitfall #2. If there’s a crash in the thread, it never deletes the Redis key so the lock remains in place indefinitely and you get a deadlock. How do we prevent this?
We can impose a Time To Live (TTL) on the lock key so that it is automatically deleted by Redis if the TTL expires. Any locks that are left established by faulty threads will then release after a suitable timeout, protecting against deadlocks. This is purely a safety feature, however, and it is still far more efficient to ensure your threads release the lock as they should.
PEXPIRE 10000 lock-key
However, this introduces another problem. The
PEXPIRE command is unaware of the result of the
SETNX command, and sets the TTL of the key regardless. If we’ve got a deadlock in place, and multiple threads keep updating the TTL of the key at high frequency each time they want to establish a lock, then they will be perpetually extending the TTL of the key and it will never actually expire. To resolve this issue, we need Redis to handle this logic in a single Redis command. We can achieve this with Redis scripting.
Note – this is also possible without a script by using the additional
NX arguments for
SET in Redis versions >= 2.6.12, but we’re using a script instead because it’s compatible back to 2.6.0.
With Redis scripting, you can write Lua scripts that run multiple Redis commands inside the Redis server itself. Your script is cached in the Redis server and run by your program with a single
EVALSHA command. The power here is your program only has to run a single command (the script) to run multiple redis commands in a transactional way that is immune to concurrency clashes since only one redis script can run at a time.
Here’s a Lua script to set the lock with a TTL in Redis:
-- -- Set a lock -- -- KEYS - key -- KEYS - ttl in ms -- KEYS - lock content local key = KEYS local ttl = KEYS local content = KEYS local lockSet = redis.call('setnx', key, content) if lockSet == 1 then redis.call('pexpire', key, ttl) end return lockSet
It’s pretty clear to see from this script that we solve the unending TTL issue by only running PEXPIRE on a lock that didn’t previously exist.
Warlock: Battle-hardened distributed locking using Redis
Now that we’ve covered the theory of Redis-backed locking, here’s your reward for following along: an open source module! It’s called Warlock, it’s written in Node.js and it’s available on npm. It takes care of all of the plumbing involved in using Redis scripts as well as setting / unsetting the locks. We use it in our own projects at GoSquared to help us with distributed locking around caching, databases, job queues and other aspects sensitive to concurrency. Check out the README for usage instructions.