What Redis deployment do you need ?
Redis is an in-memory database that I really love. It’s one of the rare technologies that make both devs and ops happy. For those who don't know Redis already, here is a small introduction.
There are four main topologies of Redis, and each one has and uses different and incompatible features. Therefore, you need to understand all the trade-offs before choosing one.
Here we go:
Redis Standalone
The old classic. One big bag of RAM. Scale vertically, easy as pie, no availability, no resilience.
Pros:
- This is the most basic setup you can think of.
Cons:
- No resilience
- Scale only vertically (using bigger hardware for bigger workloads)
Remediations:
- As the Redis protocol is simple, you can use an external solution, a proxy like Twemproxy, to do the replication to other nodes (for resilience) and shard the keys yourself (for horizontal scalability)
- Use the built-in replicated Redis or Redis-Sentinel or Redis-Cluster
Redis replicated
There is a master and there are replicas. The master pushes data to replicas. Replicas don’t talk between themselves.
They are Read-Only and will tell you so if you try to do a SET on them (or any Write operation for what it’s worth).
Pros:
- Really easy to setup
- You always have a hot snapshot of your data
- It's an easy Disaster Recovery Plan (DRP) with Master/Stand-By
Cons:
- Write performance is bounded by the master
- In order to achieve resilience, you need manual operations (changing the master Redis manually and restarting the clients)
- The replicas are not used to their full potential
- The Redis client needs to know which Redis to ask for which operation (or you can just ask the master every time)
Remediations:
- Smart clients (eg: Lettuce) able to ask by themselves the right Redis for the right operation
- You can setup a sharding with an external solution like Twemproxy for horizontal scalability
This setup is so easy, you have no excuse to deploy in production a Redis instance without replication.
It's a cheap way to keep your service running when things have gone awry, with just a bit of manual operations. If you are a little to medium-sized organization and it's your first Redis deployment, this may be the best trade-off for you.
Redis Sentinel
Ok, a bit of history: Antirez, the Redis author, started to work on a concept for Redis clustering some years ago. It turns out, distributed systems are complex and the right tradeoff is hard to reach.
There are two problems with standalone Redis: Resilience, and Scaling. Ok, how about we only solve one?
Redis-sentinel is the solution to resilience, and it’s quite brilliant. It’s a monitoring system on a Redis replicated system (so, a master and N replicas) which aims to answer two questions:
- Who is the manager in charge in this cluster? (the current master)
- Oh damn! We lost contact with the current master, who will take its place?
(actually, it also takes care of reconfiguring the Redis instances on the fly so that the newly promoted master actually knows it can accept Write operations)
But, hey, and if we loose the Sentinel? We want resilience after all, so the Sentinels should also be resilient!
That’s why the Sentinels should always be used as a cluster. They have a built-in election system to track who is the master Sentinel (which can elect the master Redis when the current master is down). As it’s a quorum system, you need 3 nodes to support losing 1 (you need the majority of the cluster to be up for a successful election).
Recap: you have 3 nodes with Redis-Sentinel in a cluster, and 2 or more Redis in replicated mode.
You are smart, so you most likely have 3 machines, each hosting both a Redis instance and its Sentinel. This is my preferred topology. You need to deploy your sentinels first, and then you deploy your Redis instances, they register, and it works.
Pros:
- Redis-Sentinel is builtin in the Redis binary so it’s easy to setup
- Good trade-off in complexity
- Very stable. One of the few no-hassle distributed systems
- Automatic resilience
- It doesn't eat your RAM or CPU
Cons:
- Still a big step up compared to Redis or even Redis replicated alone
- The client MUST support Redis-Sentinel. Half the magic is in the client
- Not trivial to upgrade a “Twemproxy + Replica” cluster to a Redis-Sentinel cluster, as you need to orchestrate the upgrade with your consumers
- Doesn’t solve scale issues
- You may need to configure your firewall to open the flow between the Sentinels
Remediations:
- You can soften the operational complexity and resource usage by monitoring multiple redis clusters with 1 sentinel cluster. I am currently doing just that: dozen of api-dedicated mini-Redis clusters (100Mo LRU cache) monitored by 1 Sentinel cluster
- You can soften the client migration path to the Redis-Sentinel protocol with a little trick: you can have old clients that only query the master Redis and manually update the target Redis when Sentinels elect a new Redis master. It’s painful, but feasible.
This setup is less easy to put in place. Don’t do it if you are not ready to pay the price. Sadly, Twemproxy is not easier so if you need automated Resilience, this is still your best bet.
Redis Cluster
This is the Big Gun. The One we waited for so long. It aims to help large deployments, when you need both resilience AND scaling (up to 1000 nodes).
It’s a multi-master architecture: the data is partitioned (sharded) into 16k buckets, each bucket with an assigned master in the cluster, and typically replicated twice. It’s the same design as Kafka or CouchBase. When you SET mykey myvalue, to a Redis-Cluster node:
- the hash of mykey is computed, this gives us the bucket number
- if the current Redis node is the master of this bucket, it accepts the operation with OK
- if it’s not the master it answers MOVED with a destination node, then you must connect to this node, repeat your operation, and wait for the OK to complete your SET
There is a catch, though. A node is either a master node (it owns a subset of the 16k buckets, monitors other nodes of the cluster, and votes for a new master if one fails), or a replica node (exactly like a Redis replicated, but specific for exactly one master).
Thus, for a reliable cluster, you will need at least 6 nodes. 3 or more master nodes and 3 or more replicas nodes. You need to have the same number of replicas per master if you want to have homogeneous resiliency.
In practice, if you have a Write-heavy workload, keep 2 replicas per master and increase the master count, and if you have a Read-heavy workload increase the replicas per master and use a smart client that can load-balance between replicas (like lettuce).
Pros:
- Incredible documentation, see here. The Redis author, Antirez, is great at explaining his work
- It scales, it heals, it works
- It’s bundled in the redis binary
Cons:
- You need at least 6 nodes
- It works in a full-mesh fashion. Expect lots and lots of east-west traffic (intra-datacenter)
- Few clients support Redis Cluster. Check explicit support for your programming language
- It is definitely not compatible with the other Redis configurations. Neither standalone, nor replicated, nor sentinel Redis
- You won’t be able to upgrade from another typology, nor downgrade to another. It’s a one-way ticket
Remediations:
- None
Wrapping it up
- When it's so easy to have replicas, it's really a must have for any production deployment. Don't deploy Redis standalone.
- Redis Sentinel is simple and a very good trade-off when you need resilience
- Clients can't change easily between typologies so take your time before deploying one
So there you have it. You may have variations on these deployments but the meat will stay the same. I hope your future Redis deployment will at least be replicated!
Thanks to Sebastian Caceres, Florent Jaby, and Simon Denel for the review.