Tag Archives: Scalability

Redis Replication

Continuing on my series of introductory posts on Redis DB, today i’ll address the subject of replication.



  • Replication is a method by which other servers receive a continuously updated copy of the data as it’s being written, so that the replicas can service read queries.


Basic info (redis.io):

  • Redis uses asynchronous replication. Starting with Redis 2.8 there is however a periodic (one time every second) acknowledge of the replication stream processed by slaves.
  • A master can have multiple slaves.
  • Slaves are able to accept other slaves connections. Aside from connecting a number of slaves to the same master, slaves can also be connected to other slaves in a graph-like structure.
  • Redis replication is non-blocking on the master side, this means that the master will continue to serve queries when one or more slaves perform the first synchronization.
  • Replication is non blocking on the slave side: while the slave is performing the first synchronization it can reply to queries using the old version of the data set, assuming you configured Redis to do so in redis.conf. Otherwise you can configure Redis slaves to send clients an error if the link with the master is down. However there is a moment where the old dataset must be deleted and the new one must be loaded by the slave where it will block incoming connections.
  • Replications can be used both for scalability, in order to have multiple slaves for read-only queries (for example, heavy SORT operations can be offloaded to slaves), or simply for data redundancy.
  • It is possible to use replication to avoid the saving process on the master side: just configure your master redis.conf to avoid saving (just comment all the “save” directives), then connect a slave configured to save from time to time.


How Redis replication works (redis.io):

  • If you set up a slave, upon connection it sends a SYNC command. And it doesn’t matter if it’s the first time it has connected or if it’s a re-connection.
  • The master then starts background saving, and collects all new commands received that will modify the dataset. When the background saving is complete, the master transfers the database file to the slave, which saves it on disk, and then loads it into memory. The master will then send to the slave all accumulated commands, and all new commands received from clients that will modify the dataset. This is done as a stream of commands and is in the same format of the Redis protocol itself.
  • You can try it yourself via telnet. Connect to the Redis port while the server is doing some work and issue the SYNC command. You’ll see a bulk transfer and then every command received by the master will be re-issued in the telnet session.
  • Slaves are able to automatically reconnect when the master <-> slave link goes down for some reason. If the master receives multiple concurrent slave synchronization requests, it performs a single background save in order to serve all of them.
  • When a master and a slave reconnects after the link went down, a full re-sync is always performed. However starting with Redis 2.8, a partial re-synchronization is also possible.


In order to configure the replication, all you have to do is to add the line below (or issue the same as a CLI command from slave) to the redis.conf file of the slave.

  • SLAVEOF <master_ip> <master_port>             (ex. SLAVEOF 6379)


to tune the replication process you can play with following options in the redis.conf file:

  • requirepass <password> – Require clients to issue AUTH <PASSWORD> before processing any other commands. This might be useful in environments in which you do not trust (eg. don’t run your own servers) others with access to the host running redis-server
  • masterauth <master-password> – If the master is password protected (using the “requirepass” configuration directive above) it is possible to tell the slave to authenticate before starting the replication synchronization process, otherwise the master will refuse the slave request
  • slave-serve-stale-data <yes|no> – When a slave loses its connection with the master, or when the replication is still in progress, the slave can act in two different ways:
    • still reply to client requests, possibly with out-of-date data (the default behavior if the switch is set to “yes”)
    • or reply with an error “SYNC with master in progress” to all the kind of commands, except for to INFO and SLAVEOF (otherwise)
  • slave-read-only <yes|no> – You can configure a slave instance to accept writes or not. Writing against a slave instance may be useful to store some ephemeral data (because data written on a slave will be easily deleted after re-sync with the master anyway), but may also cause problems if clients are writing to it because of a misconfiguration
  • repl-ping-slave-period <seconds> – Slaves send PINGs to server in a predefined interval. It’s possible to change this interval with the repl_ping_slave_period option from CLI. The default value is 10 seconds.
  • repl-timeout <seconds> – This option sets a timeout for both Bulk transfer I/O timeout and master data or ping response timeout. The default value is 60 seconds. It is important to make sure that this value is greater than the value specified for repl-ping-slave-period otherwise a timeout will be detected every time there is low traffic between the master and the slave.
  • repl-disable-tcp-nodelay <yes|no> – Controls whether to disable TCP_NODELAY on the slave socket after SYNC. If you select “yes” Redis will use a smaller number of TCP packets and less bandwidth to send data to slaves. But this can add a delay for the data to appear on the slave side, up to 40 milliseconds with Linux kernels using a default configuration. If you select “no” the delay for data to appear on the slave side will be reduced but more bandwidth will be used for replication. Default value of “no” is an optimization for low latency, but in very high traffic conditions or when the master and slaves are many hops away, turning this to “yes” may be a good idea.
  • slave-priority <integer> – The slave priority is an integer number published by Redis in the INFO output. It is used by Redis Sentinel in order to select a slave to promote into a master if the master is no longer working correctly. A slave with a low priority number is considered better for promotion, so for instance if there are three slaves with priority 10, 100, 25 Sentinel will pick the one with priority 10, that is the lowest. However a special priority of 0 marks the slave as not able to perform the role of master, so a slave with priority of 0 will never be selected by Redis Sentinel for promotion.


Allowing writes only with N attached replicas (redis.io):

  • Starting with Redis 2.8 it is possible to configure a Redis master in order to accept write queries only if at least N slaves are currently connected to the master, in order to improve data safety.
  • However because Redis uses asynchronous replication it is not possible to ensure the write actually received a given write, so there is always a window for data loss.
  • This is how the feature works:
    • Redis slaves ping the master every second, acknowledging the amount of replication stream processed.
    • Redis masters will remember the last time it received a ping from every slave.
    • The user can configure a minimum number of slaves that have a lag not greater than a maximum number of seconds.
    • If there are at least N slaves, with a lag less than M seconds, then the write will be accepted.
  • There are two configuration parameters for this feature:
    • min-slaves-to-write <number of slaves>
    • min-slaves-max-lag <number of seconds>


Have a nice weekend!

Redis data sharding – part 2 – hash-based keys

In my previous post on Redis data sharding i introduced the concept of data sharding/partitioning and provided a small Java code example to illustrate the idea. As you noticed i was creating fixed-size “emailbuckets”, containing 1024 emails each. An email address was the key of my hash, while user id was the value. For a shard identifier i used a simple integer value (shardKey) obtained as a result of “i mod shardSize” operation.

Whereas such approach illustrates the concept well, it’s impractical in “real life” applications. This is due to a simple reason – knowing the email address alone (which may often be the only thing you’d know at some point of the app execution flow; for example when you’re using Spring Security and requesting emails “as usernames” during sign-in process), you wouldn’t be able to retrieve the corresponding userId. If you would know the algorithm by which shardKey was generated, in this case – yes – you would be able to traverse emailbuckets one-by-one looking for the appropriate email, but without that knowledge you wouldn’t be able to tell which shard the email address you’re looking for, ended up in.

One solution to that problem is to use a different key for sharding data; something that is computed directly based on the data you’re interested in partitioning. An obvious candidate here is the email address itself. If you’d be able to generate shardKey based on email address you could reproduce the same scenario every time a user provides you with his email during signing in and retrieve his userId (which you could use later on (for example) to further query another hash in Redis – “users:id” – that stores complete user profile).


This seems like an ideal task for a hash function… Let’s start first with some background on hashing. According to Neil Coffey’s Javamex article Introduction to hashing:

  • Hashing means using some function or algorithm to map object data (eg. content of a String object) to some representative integer value. This so-called hash code (or simply hash) can then be used as a way to narrow down our search when looking for the item…


Also, when you search Wikipedia after Java hashCode() function, you’ll get the following definition:

  • In the Java programming language, every class must provide a hashCode() method which digests the data stored in an instance of the class into a single hash value (a 32-bit signed integer). This hash is used by other code when storing or manipulating the instance – the values are intended to be evenly distributed for varied inputs in order to use in clustering. This property is important to the performance of hash tables and other data structures that store objects in groups (“buckets”) based on their computed hash values.


Looks like this is exactly what we’re interested in – …evenly distributed values for varied inputs…, which …is important to the performance of data structures that store objects in groups (shards in our case) based on their computed hash values.

Conclusion: Java hashCode() function is what we’ll proceed with.


More from Wikipedia on Java hashCode():

  • Starting with Java version 1.2, the java.lang.String class implements its hashCode() using a product sum algorithm over the entire text of the string.
  • An instance s of the java.lang.String class, would have a hash code h(s) defined by:
    Java String hashCode()


  • where terms are summed using Java 32-bit int addition, s[i] denotes the i-th character of the string, and n is the length of s.


Now, looking at Java docs on String.hashCode() function we can read:

  • The hash code for a String object is computed as
    s[0]*31^(n-1) + s[1]*31^(n-2) + … + s[n-1]


  • using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)


Finally let’s take a look at some Java code of String object showing how hashCode() has actually been implemented:

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        hash = h;
    return h;


An alternative (faster) implementation may look like this (from Apache Harmony JDK):

public int hashCode() {
    if (hashCode == 0) {
        int hash = 0, multiplier = 1;
        for (int i = offset + count - 1; i >= offset; i--) {
            hash += value[i] * multiplier;
            int shifted = multiplier << 5;
            multiplier = shifted - multiplier;
        hashCode = hash;
    return hashCode;

what’s the difference between the two above code snippets? As you can see, multiplication can be replaced by a bitwise shift operation and a subtraction for better performance. “(multiplier << 5) – multiplier” is just 31*multiplier after all (however VMs nowadays do this optimization automatically). If you’re interested in good reading on the subject of Binary numbers i strongly recommend Neil Coffey’s Javamex article: Introduction to binary numbers.


OK, applying all this knowledge to our sharding code example results in the following implementation:

while(i<1000000) {
    String userId = String.valueOf(i++);
    String emailAddress = String.format("user_%s@mariuszprzydatek.com", userId);
    int shardKey = emailAddress.hashCode();
    redisTemplate.opsForHash().put(String.format("emailsbucket:%s", shardKey), emailAddress, userId);


Happy coding! 🙂




Redis data sharding – part 1

In one of my previous posts on Redis i provided a definition of data sharding, quoting a great book “Redis in Action” authored by Dr. Josiah L Carlson:

  • “Sharding is a method by which you partition your data into different pieces. In this case, you partition your data based on IDs embedded in the keys, based on the hash of keys, or some combination of the two. Through partitioning your data, you can store and fetch the data from multiple machines, which can allow a linear scaling in performance for certain problem domains.”


Today i’d like to elaborate some more on data sharding based on IDs embedded in the keys.


Let’s start with an example of a hypothetical data stored in an Redis instance:

redis>keys *
(empty list or set)
redis>set emails:1 me@mariuszprzydatek.com
redis>get emails:1

what i did here is to use the basic String data type to store an email of a user. As you can see, i embedded user id within the key (’emails:1′). Now, if a front-end application would ask for an email address of a user with id=1, on the back-end side i would concatenate the keyword which i usually use to denote keys i store emails under (ie. ’emails’) with the id of a user (‘1’), add a colon (‘:’) in between, and this way i’ll get the resulting key (’emails:1′) i should look after while making a call to the Redis instance.


This solution is nice but if i’ll have 1 million of users registered in my system and use Redis as the data store for keeping mappings between identifier of a user and his email, i will end up with 1 million keys (’emails:1′, ’emails:2′, ’emails:3′, etc.). This is a volume my Redis instance will easily handle (see my previous post on Redis Performance Basics) and it will use little more than 190MB to store everything in the memory (so much due to a lot of overhead when storing small keys and values; the ratio is much better with large keys/values), but this is only one attribute we’re talking about – and what about firstName, lastName, etc.?. Obviously, if my system will have millions of registered users and i’d use Redis as my primary data store for users-related info, i would be running multiple instances of Redis already and based on the id of a user, route the queries to a specific instance, but there’s still a lot we can do to optimize costs prior to thinking about scaling.


Small code snippet to generate 1M of emails stored in Redis using String data structure (and Spring Data Redis mentioned in my other post).

int i = 0;
while(i<1000000) {
    redisTemplate.opsForValue().set(String.format("emails:%s", i++), "me@mariuszprzydatek.com");

the loop above executes in 2 mins on my Windows 8 64bit i7 laptop and the ‘redis-server’ process allocates ca 190 MB of memory.


Now, what will happen if we change the data structure let’s say to a Redis Hash?

Next code snippet and we’re getting just that:

int i = 0;
while(i<1000000) {
    String userId = String.valueOf(i++);
    String emailAddress = String.format("user_%s@mariuszprzydatek.com", userId);
    redisTemplate.opsForHash().put("emails", emailAddress, userId)

2 mins and 165 MB of memory allocated – a 15 % gain absolutely for free.


Let’s try with data sharding/partitioning. Another code snippet using Redis Hash data structure and there you go:

int shardSize = 1024;
int i = 0;
while(i<1000000) {
    int shardKey = i/shardSize;
    String userId = String.valueOf(i++);
    String emailAddress = String.format("user_%s@mariuszprzydatek.com", userId);
    redisTemplate.opsForHash().put(String.format("emailsbucket:%s", shardKey), emailAddress, userId);

2 mins later and… only 30 MB allocated – now you’re talking Mariusz!

Staggering 530 % increase in memory allocation efficiency!


Hope you enjoyed the first part of this brief tutorial.






Redis performance basics

I wish i knew the below when starting my adventure with Redis…


General notes:

  • advantage of a in-memory database like Redis is that the memory representation of complex data structure is much simpler to manipulate compared to the same data structure on disk.
  • At the same time Redis’ on-disk data format does not need to be suitable for random access and therefore is compact and always generated in an append-only fashion
  • 1 Million keys with the key being the natural numbers from 0 to 999999 and the string “Hello World” as value use 100MB on a 32bit computer. The same data stored linearly in an unique string takes something like 16MB and this is because with small keys and values in Redis there is a lot of overhead. With large keys/values the ratio is much better.
  • 64 bit systems will use considerably more memory than 32 bit systems to store the same keys, especially if the keys and values are small, this is because pointers takes 8 bytes in 64 bit systems. But of course the advantage is that 64 bit systems can address/have a lot more of memory, so in order to run large Redis servers a 64 bit system is more or less required
  • because Redis stores everything in memory you cannot obviously have a dataset larger than the memory of your server. But Redis users like Craigslist and Groupon are distributing their data among multiple Redis nodes, using client-side hashing which can be an effective solution.
  • client libraries such Redis-rb (the Ruby client) and Predis (one of the most used PHP clients) are able to handle multiple Redis servers automatically using consistent hashing.
  • if you don’t want to use consistent hashing or data distribution across different nodes, you can consider using Redis as a secondary data store (metadata, small but often written info and all the other things that get accessed very frequently: user auth tokens, Redis Lists with chronologically ordered IDs of the last N-comments, N-posts, etc.).
  • Write scripts that will monitor your Redis servers (checking for critical conditions) using the INFO command that reports the memory utilization
  • Redis on-disk-snapshots are atomic – Redis background saving process is always fork(2)ed when the server is outside of the execution of a command, so every command reported to be atomic in RAM is also atomic from the point of view of the disk snapshot.
  • Although Redis is single threaded it’s very unlikely that CPU becomes a bottleneck, this is because Redis usually either memory or network bound. For example using pipelining Redis running on an average Linux system can deliver even 500k requests per second, so if an application mainly uses O(N) or O(log(N)) commands it is hardly going to use too much CPU. However, you can maximize CPU usage by starting multiple instances of Redis in the same box and treat them as different servers.
  • maximum number of keys a single Redis instance can hold is in theory up to 232. Redis was tested in practice to handle at least 250 million of keys per instance.
  • Every list, set, and sorted set, can hold in theory 232 – 1 (4294967295, more than 4 billion) elements as well.



  • are binary safe (ie. you can use any binary sequence as a key, including empty string)
  • too long keys (eg. 1kB) are a bad idea, both memory-wise as well as because the lookup of the key in the dataset may require several costly key-comparisons
  • the shorter the key the less memory it’ll consume, however it’s negligible compared to to the space used by the key object itself and the value object. Therefore go with human readable keys (eg. password:789 instead of p:789)
  • stick with a naming convention, eg. object-type:id:field or comment:1234:reply.to, etc.


Data structures:

  • Strings:
    • values can’t be bigger than 512 MB.
    • Keys can be anything including binary mp3
    • the INCR command parses the string as an integer, increments by one, and creates a new string
    • other related commands (INCRBY, DECR, DECRBY) all use internally INCR
    • INCR is atomic – multiple clients issuing INCR against the same key will never incur into a race condition
  • Hashes:
    • maps between string fields and string values (therefore a perfect data type to represent objects)
    • Hashes are encoded using a memory efficient data structure when they have a small number of entries, and the biggest entry does not exceed a given threshold. These thresholds can be configured in /redis.conf file with following directives:
      • hash-max-ziplist-entries (default value 512)
      • hash-max-ziplist-value (default value 64)
  • Lists:
    • are implemented via Linked Lists (not via Arrays)
    • in case of a list containing millions of elements, adding a new element in the head or in the tail of the list is still performed in constant time O(1)
    • main feature of lists from the point of time complexity is the support for constant time insertion and deletion of (even many millions) elements near the head and tail. Accessing elements is very fast near the extremes of the list but is slow if you try accessing the middle of a very big list, as it is an O(N) operation.
    • Accessing an element by index is not so fast in lists implemented by linked lists (Arrays are better in this case)
    • with LRANGE you can easily paginate results (constant length in constant time)
    • putting objects inside lists isn’t a good idea as you often need to access those objects and given that Redis is using Linked Lists as the underlying impl, it’s not efficient.
    • Lists are sortable
    • Similarly to hashes, small lists are also encoded in a special way in order to save a lot of space. The special representation is only used when you are under the following limits:
      • list-max-ziplist-entries 512
      • list-max-ziplist-value 64
  • Sets:
    • unordered collection of binary-safe strings
    • adding the same element multiple times will result in a set having a single copy of this element
    • are very good for expressing relations between objects (eg. to implement “tags”)
    • Sets are sortable
    • Sets have a special encoding in /redis.config in just one case: when a set is composed of just strings that happens to be integers in radix 10 in the range of 64 bit signed integers. The following configuration setting sets the limit in the size of the set in order to use this special memory saving encoding.
      • set-max-intset-entries 512
  • ZSets
    • sorted Sets are similar to Sets, collections of binary-safe strings, but this time with an associated score throughout the use of which you’re getting “sorting capabilities”
    • you can think of ZSet as of a equivalent of an Index in the SQL world
    • they are implemented via a dual-ported data structure containing both a skip list and a hash table, so every time you add an element Redis performs an O(log(N)) operation
    • ZSets have a “default” ordering but you are still free to call the SORT command against sorted sets to get a different ordering (but in this case the server will waste CPU). Solution for having multiple orders is to add every element in multiple sorted sets simultaneously.
    • calling again ZADD against an element already included in the sorted set will update its score (and position) in O(log(N)), so sorted sets are suitable even when there are tons of updates
    • Similarly to hashes and lists, sorted sets are also specially encoded in order to save a lot of space. This encoding is only used when the length and elements of a sorted set are below the following limits:
      • zset-max-ziplist-entries 128
      • zset-max-ziplist-value 64




Redis introduction

After reading another great book from Manning – Redis in Action, authored by Dr. Josiah L Carlson (whom you may know from the “Redis DB” mailing list on Google), i thought you may be interested in key takeaways.


Key facts:

  • NoSQL database (key/value store)
  • Stores a mapping of keys to five different types of values (strings, lists, sets, hashes, sorted sets)
  • Supports in-memory and and disk persistence
  • Supports master/slave replication to scale read performance
  • Support for client-side sharding to scale write performance
  • Publish/Subscribe capabilities
  • Support for scripting (equivalent of stored procedures)
  • Partial transaction support
  • Support of bulk operations

+ “Sharding is a method by which you partition your data into different pieces. In this case, you partition your data based on IDs embedded in the keys, based on the hash of keys, or some combination of the two. Through partitioning your data, you can store and fetch the data from multiple machines, which can allow a linear scaling in performance for certain problem domains.”


What happens when my server gets turned off?

  • Redis has two different forms of persistence available for writing in-memory data to disk in a compact format
    • point-in-time dump, either:
      • when certain conditions are met (a number of writes in a given period), or
      • when one of the two dump-to-disk commands is called.
    • append-only file that writes every command that alters data in Redis to disk as it happens. Depending on how careful you want to be with your data, append-only writing can be configured to:
      • never sync,
      • sync once per second, or
      • sync at the completion of every operation


Master/slave replication:

  • A good failover scenario if the server that Redis is running on crashes
  • Slaves connect to the master and receive an initial copy of the full database.
  • As writes are performed on the master, they’re sent to all connected slaves for updating the slave datasets in real time.
  • With continuously updated data on the slaves, clients can then connect to any slave for reads instead of making requests to the master.


Random writes:

  • are always fast, because data is always in memory,
  • queries to Redis don’t need to go through a typical query parser/optimizer


Five data structures available in Redis

Redis Data Types


Key takeaways:

  • Redis is:
    • fast – operates on in-memory data sets
    • remote – accessible to multiple clients/servers
    • persistent – opportunity to keep data on disk
    • scalable – via slaving and sharding