Redis performance basics

I wish i knew the below when starting my adventure with Redis…

General notes:

advantage of a in-memory database like Redis is that the memory representation of complex data structure is much simpler to manipulate compared to the same data structure on disk.
At the same time Redis’ on-disk data format does not need to be suitable for random access and therefore is compact and always generated in an append-only fashion
1 Million keys with the key being the natural numbers from 0 to 999999 and the string “Hello World” as value use 100MB on a 32bit computer. The same data stored linearly in an unique string takes something like 16MB and this is because with small keys and values in Redis there is a lot of overhead. With large keys/values the ratio is much better.
64 bit systems will use considerably more memory than 32 bit systems to store the same keys, especially if the keys and values are small, this is because pointers takes 8 bytes in 64 bit systems. But of course the advantage is that 64 bit systems can address/have a lot more of memory, so in order to run large Redis servers a 64 bit system is more or less required
because Redis stores everything in memory you cannot obviously have a dataset larger than the memory of your server. But Redis users like Craigslist and Groupon are distributing their data among multiple Redis nodes, using client-side hashing which can be an effective solution.
client libraries such Redis-rb (the Ruby client) and Predis (one of the most used PHP clients) are able to handle multiple Redis servers automatically using consistent hashing.
if you don’t want to use consistent hashing or data distribution across different nodes, you can consider using Redis as a secondary data store (metadata, small but often written info and all the other things that get accessed very frequently: user auth tokens, Redis Lists with chronologically ordered IDs of the last N-comments, N-posts, etc.).
Write scripts that will monitor your Redis servers (checking for critical conditions) using the INFO command that reports the memory utilization
Redis on-disk-snapshots are atomic – Redis background saving process is always fork(2)ed when the server is outside of the execution of a command, so every command reported to be atomic in RAM is also atomic from the point of view of the disk snapshot.
Although Redis is single threaded it’s very unlikely that CPU becomes a bottleneck, this is because Redis usually either memory or network bound. For example using pipelining Redis running on an average Linux system can deliver even 500k requests per second, so if an application mainly uses O(N) or O(log(N)) commands it is hardly going to use too much CPU. However, you can maximize CPU usage by starting multiple instances of Redis in the same box and treat them as different servers.
maximum number of keys a single Redis instance can hold is in theory up to 232. Redis was tested in practice to handle at least 250 million of keys per instance.
Every list, set, and sorted set, can hold in theory 232 – 1 (4294967295, more than 4 billion) elements as well.

Keys:

are binary safe (ie. you can use any binary sequence as a key, including empty string)
too long keys (eg. 1kB) are a bad idea, both memory-wise as well as because the lookup of the key in the dataset may require several costly key-comparisons
the shorter the key the less memory it’ll consume, however it’s negligible compared to to the space used by the key object itself and the value object. Therefore go with human readable keys (eg. password:789 instead of p:789)
stick with a naming convention, eg. object-type:id:field or comment:1234:reply.to, etc.

Data structures:

Strings:
- values can’t be bigger than 512 MB.
- Keys can be anything including binary mp3
- the INCR command parses the string as an integer, increments by one, and creates a new string
- other related commands (INCRBY, DECR, DECRBY) all use internally INCR
- INCR is atomic – multiple clients issuing INCR against the same key will never incur into a race condition
Hashes:
- maps between string fields and string values (therefore a perfect data type to represent objects)
- Hashes are encoded using a memory efficient data structure when they have a small number of entries, and the biggest entry does not exceed a given threshold. These thresholds can be configured in /redis.conf file with following directives:
  - hash-max-ziplist-entries (default value 512)
  - hash-max-ziplist-value (default value 64)
Lists:
- are implemented via Linked Lists (not via Arrays)
- in case of a list containing millions of elements, adding a new element in the head or in the tail of the list is still performed in constant time O(1)
- main feature of lists from the point of time complexity is the support for constant time insertion and deletion of (even many millions) elements near the head and tail. Accessing elements is very fast near the extremes of the list but is slow if you try accessing the middle of a very big list, as it is an O(N) operation.
- Accessing an element by index is not so fast in lists implemented by linked lists (Arrays are better in this case)
- with LRANGE you can easily paginate results (constant length in constant time)
- putting objects inside lists isn’t a good idea as you often need to access those objects and given that Redis is using Linked Lists as the underlying impl, it’s not efficient.
- Lists are sortable
- Similarly to hashes, small lists are also encoded in a special way in order to save a lot of space. The special representation is only used when you are under the following limits:
  - list-max-ziplist-entries 512
  - list-max-ziplist-value 64
Sets:
- unordered collection of binary-safe strings
- adding the same element multiple times will result in a set having a single copy of this element
- are very good for expressing relations between objects (eg. to implement “tags”)
- Sets are sortable
- Sets have a special encoding in /redis.config in just one case: when a set is composed of just strings that happens to be integers in radix 10 in the range of 64 bit signed integers. The following configuration setting sets the limit in the size of the set in order to use this special memory saving encoding.
  - set-max-intset-entries 512
ZSets
- sorted Sets are similar to Sets, collections of binary-safe strings, but this time with an associated score throughout the use of which you’re getting “sorting capabilities”
- you can think of ZSet as of a equivalent of an Index in the SQL world
- they are implemented via a dual-ported data structure containing both a skip list and a hash table, so every time you add an element Redis performs an O(log(N)) operation
- ZSets have a “default” ordering but you are still free to call the SORT command against sorted sets to get a different ordering (but in this case the server will waste CPU). Solution for having multiple orders is to add every element in multiple sorted sets simultaneously.
- calling again ZADD against an element already included in the sorted set will update its score (and position) in O(log(N)), so sorted sets are suitable even when there are tons of updates
- Similarly to hashes and lists, sorted sets are also specially encoded in order to save a lot of space. This encoding is only used when the length and elements of a sorted set are below the following limits:
  - zset-max-ziplist-entries 128
  - zset-max-ziplist-value 64

Resources:

Time complexity (http://en.wikipedia.org/wiki/Time_complexity)
Redis Data Type intro (http://redis.io/topics/data-types-intro)

Tagged: NoSQL, Performance, Redis, Scalability

mariuszprzydatek.com

Redis performance basics

Welcome to my Blog on Software Engineering

Recent Posts

Subscribe to RSS

Follow Blog via Email

Archives

Categories

Recent reads

Mariusz Przydatek

mariuszprzydatek.com

Redis performance basics

Share this:

Related

Welcome to my Blog on Software Engineering

Recent Posts

Subscribe to RSS

Follow Blog via Email

Archives

Categories

Recent reads

Mariusz Przydatek