Continuing on my series of introductory posts on Redis DB, today i’ll address the subject of replication.
Definition:
- Replication is a method by which other servers receive a continuously updated copy of the data as it’s being written, so that the replicas can service read queries.
Basic info (redis.io):
- Redis uses asynchronous replication. Starting with Redis 2.8 there is however a periodic (one time every second) acknowledge of the replication stream processed by slaves.
- A master can have multiple slaves.
- Slaves are able to accept other slaves connections. Aside from connecting a number of slaves to the same master, slaves can also be connected to other slaves in a graph-like structure.
- Redis replication is non-blocking on the master side, this means that the master will continue to serve queries when one or more slaves perform the first synchronization.
- Replication is non blocking on the slave side: while the slave is performing the first synchronization it can reply to queries using the old version of the data set, assuming you configured Redis to do so in redis.conf. Otherwise you can configure Redis slaves to send clients an error if the link with the master is down. However there is a moment where the old dataset must be deleted and the new one must be loaded by the slave where it will block incoming connections.
- Replications can be used both for scalability, in order to have multiple slaves for read-only queries (for example, heavy SORT operations can be offloaded to slaves), or simply for data redundancy.
- It is possible to use replication to avoid the saving process on the master side: just configure your master redis.conf to avoid saving (just comment all the “save” directives), then connect a slave configured to save from time to time.
How Redis replication works (redis.io):
- If you set up a slave, upon connection it sends a SYNC command. And it doesn’t matter if it’s the first time it has connected or if it’s a re-connection.
- The master then starts background saving, and collects all new commands received that will modify the dataset. When the background saving is complete, the master transfers the database file to the slave, which saves it on disk, and then loads it into memory. The master will then send to the slave all accumulated commands, and all new commands received from clients that will modify the dataset. This is done as a stream of commands and is in the same format of the Redis protocol itself.
- You can try it yourself via telnet. Connect to the Redis port while the server is doing some work and issue the SYNC command. You’ll see a bulk transfer and then every command received by the master will be re-issued in the telnet session.
- Slaves are able to automatically reconnect when the master <-> slave link goes down for some reason. If the master receives multiple concurrent slave synchronization requests, it performs a single background save in order to serve all of them.
- When a master and a slave reconnects after the link went down, a full re-sync is always performed. However starting with Redis 2.8, a partial re-synchronization is also possible.
In order to configure the replication, all you have to do is to add the line below (or issue the same as a CLI command from slave) to the redis.conf file of the slave.
- SLAVEOF <master_ip> <master_port> (ex. SLAVEOF 127.0.0.1 6379)
to tune the replication process you can play with following options in the redis.conf file:
- requirepass <password> – Require clients to issue AUTH <PASSWORD> before processing any other commands. This might be useful in environments in which you do not trust (eg. don’t run your own servers) others with access to the host running redis-server
- masterauth <master-password> – If the master is password protected (using the “requirepass” configuration directive above) it is possible to tell the slave to authenticate before starting the replication synchronization process, otherwise the master will refuse the slave request
- slave-serve-stale-data <yes|no> – When a slave loses its connection with the master, or when the replication is still in progress, the slave can act in two different ways:
- still reply to client requests, possibly with out-of-date data (the default behavior if the switch is set to “yes”)
- or reply with an error “SYNC with master in progress” to all the kind of commands, except for to INFO and SLAVEOF (otherwise)
- slave-read-only <yes|no> – You can configure a slave instance to accept writes or not. Writing against a slave instance may be useful to store some ephemeral data (because data written on a slave will be easily deleted after re-sync with the master anyway), but may also cause problems if clients are writing to it because of a misconfiguration
- repl-ping-slave-period <seconds> – Slaves send PINGs to server in a predefined interval. It’s possible to change this interval with the repl_ping_slave_period option from CLI. The default value is 10 seconds.
- repl-timeout <seconds> – This option sets a timeout for both Bulk transfer I/O timeout and master data or ping response timeout. The default value is 60 seconds. It is important to make sure that this value is greater than the value specified for repl-ping-slave-period otherwise a timeout will be detected every time there is low traffic between the master and the slave.
- repl-disable-tcp-nodelay <yes|no> – Controls whether to disable TCP_NODELAY on the slave socket after SYNC. If you select “yes” Redis will use a smaller number of TCP packets and less bandwidth to send data to slaves. But this can add a delay for the data to appear on the slave side, up to 40 milliseconds with Linux kernels using a default configuration. If you select “no” the delay for data to appear on the slave side will be reduced but more bandwidth will be used for replication. Default value of “no” is an optimization for low latency, but in very high traffic conditions or when the master and slaves are many hops away, turning this to “yes” may be a good idea.
- slave-priority <integer> – The slave priority is an integer number published by Redis in the INFO output. It is used by Redis Sentinel in order to select a slave to promote into a master if the master is no longer working correctly. A slave with a low priority number is considered better for promotion, so for instance if there are three slaves with priority 10, 100, 25 Sentinel will pick the one with priority 10, that is the lowest. However a special priority of 0 marks the slave as not able to perform the role of master, so a slave with priority of 0 will never be selected by Redis Sentinel for promotion.
Allowing writes only with N attached replicas (redis.io):
- Starting with Redis 2.8 it is possible to configure a Redis master in order to accept write queries only if at least N slaves are currently connected to the master, in order to improve data safety.
- However because Redis uses asynchronous replication it is not possible to ensure the write actually received a given write, so there is always a window for data loss.
- This is how the feature works:
- Redis slaves ping the master every second, acknowledging the amount of replication stream processed.
- Redis masters will remember the last time it received a ping from every slave.
- The user can configure a minimum number of slaves that have a lag not greater than a maximum number of seconds.
- If there are at least N slaves, with a lag less than M seconds, then the write will be accepted.
- There are two configuration parameters for this feature:
- min-slaves-to-write
<number of slaves>
- min-slaves-max-lag
<number of seconds>
- min-slaves-to-write
Have a nice weekend!