Category Archives: Databases

Amazon AWS – Installing Redis on EBS

In this step-by-step guide i’ll show you how to install Redis on AWS (Amazon Linux AMI).

I’ll assume you’re performing steps below as a su (sudo -s).

  1. First thing you need is to have following tools installed:
    > gcc
    > gcc-c++
    > make

    yum -y install gcc gcc-c++ make


  2. Download Redis:
    cd /usr/local/src
    tar xzf redis-2.8.12.tar.gz
    rm -f redis-2.8.12.tar.gz


  3. Build it:
    cd redis-2.8.12
    make distclean


  4. Create following directories and copy binaries:
    mkdir /etc/redis /var/redis
    cp src/redis-server src/redis-cli /usr/local/bin


  5. Copy Redis template configuration file into /etc/redis/ (using Redis port number instance as its name (according to best practices mentioned on Redis site)):
    cp redis.conf /etc/redis/6379.conf


  6. Create directory inside /var/redis that will act as working/data directory for this Redis instance:
    mkdir /var/redis/6379


  7. Edit Redis config file to make necessary changes:
    nano /etc/redis/6379.conf


  8. Make following changes to 6379.conf
    > Set daemonize to yes (by default it is set to no).
    > Set pidfile to /var/run/
    > Set preferred loglevel
    > Set logfile to /var/log/redis_6379.log
    > Set dir to /var/redis/6379


  9. Don’t copy the standard Redis init script from utils directory into /etc/init.d (as it’s not Amazon Linux AMI/chkconfig compliant), instead download the following:


  10. Move and chmod downloaded redis init script:
    mv redis-server /etc/init.d
    chmod 755 /etc/init.d/redis-server


  11. Edit redis-server init script and change redis conf file name as following:
    > REDIS_CONF_FILE=”/etc/redis/6379.conf”

    nano /etc/init.d/redis-server


  12. Auto-enable Redis instance:
    chkconfig --add redis-server
    chkconfig --level 345 redis-server on


  13. Start Redis:
    service redis-server start


  14. (optional) Add ‘vm.overcommit_memory = 1’ to /etc/sysctl.conf (otherwise background save may fail under low memory condition – according to info on Redis site):
    > vm.overcommit_memory = 1

    nano /etc/sysctl.conf


  15. Activate the new sysctl change:
    sysctl vm.overcommit_memory=1


  16. Try pinging your instance with redis-cli:
    /usr/local/bin/redis-cli ping


  17. Do few tests with redis-cli and check that the dump file is correctly stored into /var/redis/6379/ (you should find a file called dump.rdb):
    >set testkey testval
    >get testkey
    >del testkey


  18. Check that your Redis instance is correctly logging in the log file:
    cat /var/log/redis_6379.log



And that would be basically it. Cheers.


Amazon AWS – SimpleDB – simple NoSQL

Basic information about Amazon SimpleDB Service:


AWS Free Tier availability:

  • 25 SimpleDB Machine Hours
  • 1GB of Storage


Developer Resources:



  • data-sets organized into domains (vs. tables in relational DB’s)
  • Domains are collections of items that are described by attribute-value pairs
  • automatically creates an index for every field in a domain
  • no need to pre-define a schema
  • scale-out by creating new domains on different instances
  • stores multiple geographically distributed copies of each domain to enable high availability and data durability.
  • a successful write (using PutAttributes, BatchPutAttributes, DeleteAttributes, BatchDeleteAttributes, CreateDomain or DeleteDomain) means that all copies of the domain will durably persist
  • by default, GetAttributes and Select perform an eventually consistent read (details below).
    • a consistent read can potentially incur higher latency and lower read throughput therefore it is best to use it only when an application scenario mandates that a read operation absolutely needs to read all writes that received a successful response prior to that read. For all other scenarios the default eventually consistent read will yield the best performance.
  • allows specifying consistency settings for each individual read request, so the same application could have disparate parts following different consistency settings.
  • currently enables domains to grow up to 10 GB each
  • initial allocation of domains is limited to 250


API Summary:

  • CreateDomain — Create a domain that contains your dataset.
  • DeleteDomain — Delete a domain.
  • ListDomains — List all domains.
  • DomainMetadata — Retrieve information about creation time for the domain, storage information both as counts of item names and attributes, as well as total size in bytes.
  • PutAttributes — Add or update an item and its attributes, or add attribute-value pairs to items that exist already. Items are automatically indexed as they are received.
  • BatchPutAttributes — For greater overall throughput of bulk writes, perform up to 25 PutAttribute operations in a single call.
  • DeleteAttributes — Delete an item, an attribute, or an attribute value.
  • BatchDeleteAttributes — For greater overall throughput of bulk deletes, perform up to 25 DeleteAttributes operations in a single call.
  • GetAttributes — Retrieve an item and all or a subset of its attributes and values.
  • Select — Query the data set in the familiar, “select target from domain_name where query_expression” syntax. Supported value tests are: =, !=, <, > <=, >=, like, not like, between, is null, is not null, and every (). Example: select * from mydomain where every(keyword) = ‘Book’. Order results using the SORT operator, and count items that meet the condition(s) specified by the predicate(s) in a query using the Count operator.


Consistency Options:

  • Eventually Consistent Reads (Default) — the eventual consistency option maximizes read performance (in terms of low latency and high throughput). However, an eventually consistent read (using Select or GetAttributes) might not reflect the results of a recently completed write (using PutAttributes, BatchPutAttributes, DeleteAttributes, BatchDeleteAttributes). Consistency across all copies of data is usually reached within a second; repeating a read after a short time should return the updated data.
  • Consistent Reads — in addition to eventual consistency, SimpleDB also gives flexibility to request a consistent read if your application, or an element of your application, requires it. A consistent read (using Select or GetAttributes with ConsistentRead=true) returns a result that reflects all writes that received a successful response prior to the read.



  • Conditional Puts/Deletes — enable to insert, replace, or delete values for one or more attributes of an item if the existing value of an attribute matches the value specified. If the value does not match or is not present, the update is rejected. Conditional Puts/Deletes are useful for preventing lost updates when different sources write concurrently to the same item.
    • Conditional puts and deletes are exposed via the PutAttributes and DeleteAttributes APIs by specifying an optional condition with an expected value.
    • For example, if the application is reserving seats or selling tickets to an event, you might allow a purchase (i.e., write update) only if the specified seat was still available (the optional condition). These semantics can also be used to implement functionality such as counters, inserting an item only if it does not already exist, and optimistic concurrency control (OCC). An application can implement OCC by maintaining a version number (or a timestamp) attribute as part of an item and by performing a conditional put/delete based on the value of this version number



  • Domain size: 10 GB per domain, 1 billion attributes per domain
  • Domain name: 3-255 characters (a-z, A-Z, 0-9, ‘_’, ‘-‘, and ‘.’)
  • Domains per account: 250
  • Attribute name-value pairs per item: 256
  • Attribute name length: 1024 bytes
  • Attribute value length: 1024 bytes
  • Item name length: 1024 bytes
  • Attribute name, attribute value, and item name allowed characters: All UTF-8 characters that are valid in XML documents. Control characters and any sequences that are not valid in XML are returned Base64-encoded. For more information, seeWorking with XML-Restricted Characters.
  • Attributes per PutAttributes operation: 256
  • Attributes requested per Select operation: 256
  • Items per BatchDeleteAttributes operation: 25
  • Items per BatchPutAttributes operation: 25
  • Maximum items in Select response: 2500
  • Maximum query execution time: 5 seconds
  • Maximum number of unique attributes per Select expression: 20
  • Maximum number of comparisons per Select expression: 20
  • Maximum response size for Select: 1MB





Amazon AWS – DynamoDB – advanced NoSQL

Basic information about Amazon DynamoDB Service:


AWS Free Tier availability:

  • 100MB of Storage,
  • 5 Units of Write Capacity,
  • 10 Units of Read Capacity
  • up to 40 million free operations each month with eventually consistent reads, or 25 million operations each month with strictly consistent reads


Developer Resources:



  • Local secondary indexes
  • SSD-storage
  • automatic 3-way replication
  • you pay a flat, hourly rate based on the capacity you reserve
  • when creating or updating tables, you specify how much capacity you wish to reserve for reads and writes



  • Table name: allowed characters are a-z, A-Z, 0-9, ‘_’ (underscore), ‘-‘ (dash), and ‘.’ (dot). Names can be between 3 and 255 characters long.
  • Local secondary index name: allowed characters are a-z, A-Z, 0-9, ‘_’ (underscore), ‘-‘ (dash), and ‘.’ (dot). Names can be between 3 and 255 characters long.
  • Table size: No practical limit in number of bytes or items
  • Tables per account: 256 per region (default). Increase can be requested.
  • Hash or hash-and-range primary key: No practical limit.
  • Number of hash key values: No practical limit.
  • Hash-and-range primary key: No practical limit for non-indexed tables (otherwise total sizes of all table and index items cannot exceed 10 GB)
  • Number of range keys per hash value: No practical limit for non-indexed tables (otherwise total sizes of all table and index items cannot exceed 10 GB)
  • Provisioned throughput capacity unit sizes: One read capacity unit = one strongly consistent read per second, or two eventually consistent reads per second, for items up 4 KB in size. One write capacity unit = one write per second, for items up to 1 KB in size.
  • Provisioned throughput minimum per table: 1 read capacity unit and 1 write capacity unit
  • Provisioned throughput limits: can request an increase but default values are:
    • US East (Northern Virginia) Region:
      • Per table – 40,000 read capacity units or 40,000 write capacity units
      • Per account – 80,000 read capacity units or 80,000 write capacity units
    • All Other Regions:
      • Per table – 10,000 read capacity units or 10,000 write capacity units
      • Per account – 20,000 read capacity units or 20,000 write capacity units
  • UpdateTable: Limits when increasing provisioned throughput: You can call UpdateTable as often as necessary to increase provisioned throughput. You can increase ReadCapacityUnits or WriteCapacityUnits for a table, subject to these conditions:
    • You can call the UpdateTable API to increase ReadCapacityUnits or WriteCapacityUnits (or both), up to twice their current values.
    • The new provisioned throughput settings do not take effect until the UpdateTable operation is complete.
    • You can call UpdateTable multiple times, until you reach the desired throughput capacity for your table.
  • UpdateTable: Limits when decreasing provisioned throughput: You can reduce the provisioned throughput on a table no more than four times in a single UTC calendar day. These reductions can be any of the following operations:
    • Decrease ReadCapacityUnits.
    • Decrease WriteCapacityUnits.
    • Decrease both ReadCapacityUnits and WriteCapacityUnits in a single request. This counts as one of your allowed reductions for the day.
  • Maximum concurrent Control Plane API requests (includes cumulative number of tables in the CREATINGUPDATING or DELETING state): In general, you can have up to 10 of these requests running concurrently. The only exception is when you are CREATING a table and you have defined a local secondary index on that table. You can only have one such request running at a time.
  • Maximum number of local secondary indexes per table: You can define up to 5 local secondary indexes per table.
  • Maximum number of projected attributes per table (local secondary indexes only): You can project a total of up to 20 attributes into all of a table’s local secondary indexes. This only applies to user-specified projected attributes.

    In a CreateTable operation, if you specify a ProjectionType of INCLUDE, the total count of attributes specified in NonKeyAttributes, summed across all of the local secondary indexes, must not exceed 20. If you project the same attribute name into two different indexes, this counts as two distinct attributes when determining the total.

    This limit does not apply for indexes with a ProjectionType of KEYS_ONLY or ALL.

  • Attribute name lengths: The following attribute names are length-restricted:
    • Primary key attribute names.
    • The names of any user-specified projected attributes (applicable only to local secondary indexes). In a CreateTableoperation, if you specify a ProjectionType of INCLUDE, then the names of the attributes in the NonKeyAttributes parameter are length-restricted. The KEYS_ONLY and ALL projection types are not affected.
    • For any of the attribute names listed above, the name must be between 1 and 255 characters long, inclusive. The name can be any UTF-8 encodable character, but the total size of the UTF-8 string after encoding cannot exceed 255 bytes.
  • Item size: Cannot exceed 64 KB which includes both attribute name binary length (UTF-8 length) and attribute value lengths (again binary length). The attribute name counts towards the size limit. For example, consider an item with two attributes: one attribute named “shirt-color” with value “R” and another attribute named “shirt-size” with value “M”. The total size of that item is 23 bytes. These limits apply to items stored in tables, and also to items in local secondary indexes.
  • Attribute values: cannot be null or empty
  • Attribute name-value pairs per item: The cumulative size of attributes per item must be under 64 KB
  • Hash primary key attribute value: 2048 bytes
  • Range primary key attribute value: 1024 bytes
  • String: All strings must conform to the UTF-8 encoding. Since UTF-8 is a variable width encoding, string sizes are determined using the UTF-8 bytes.
  • Number: A number can have up to 38 digits precision and can be between 10^-128 to 10^+126.
  • Maximum number of values in an attribute set: No practical limit on the quantity of values, as long as the item containing the values fits within the 64 KB item limit.
  • BatchGetItem item maximum per operation: Up to 100 items retrieved, with the request size not exceeding 1 MB.
  • BatchWriteItem item maximum per operation: Up to 25 items put or delete operations, with the request size not exceeding 1 MB.
  • Query: Result set limited to 1 MB per API call. You can use the LastEvaluatedKey from the query response to retrieve more results.
  • Scan: Scanned data set size maximum is 1 MB per API call. You can use the LastEvaluatedKey from the scan response to retrieve more results





Redis Replication

Continuing on my series of introductory posts on Redis DB, today i’ll address the subject of replication.



  • Replication is a method by which other servers receive a continuously updated copy of the data as it’s being written, so that the replicas can service read queries.


Basic info (

  • Redis uses asynchronous replication. Starting with Redis 2.8 there is however a periodic (one time every second) acknowledge of the replication stream processed by slaves.
  • A master can have multiple slaves.
  • Slaves are able to accept other slaves connections. Aside from connecting a number of slaves to the same master, slaves can also be connected to other slaves in a graph-like structure.
  • Redis replication is non-blocking on the master side, this means that the master will continue to serve queries when one or more slaves perform the first synchronization.
  • Replication is non blocking on the slave side: while the slave is performing the first synchronization it can reply to queries using the old version of the data set, assuming you configured Redis to do so in redis.conf. Otherwise you can configure Redis slaves to send clients an error if the link with the master is down. However there is a moment where the old dataset must be deleted and the new one must be loaded by the slave where it will block incoming connections.
  • Replications can be used both for scalability, in order to have multiple slaves for read-only queries (for example, heavy SORT operations can be offloaded to slaves), or simply for data redundancy.
  • It is possible to use replication to avoid the saving process on the master side: just configure your master redis.conf to avoid saving (just comment all the “save” directives), then connect a slave configured to save from time to time.


How Redis replication works (

  • If you set up a slave, upon connection it sends a SYNC command. And it doesn’t matter if it’s the first time it has connected or if it’s a re-connection.
  • The master then starts background saving, and collects all new commands received that will modify the dataset. When the background saving is complete, the master transfers the database file to the slave, which saves it on disk, and then loads it into memory. The master will then send to the slave all accumulated commands, and all new commands received from clients that will modify the dataset. This is done as a stream of commands and is in the same format of the Redis protocol itself.
  • You can try it yourself via telnet. Connect to the Redis port while the server is doing some work and issue the SYNC command. You’ll see a bulk transfer and then every command received by the master will be re-issued in the telnet session.
  • Slaves are able to automatically reconnect when the master <-> slave link goes down for some reason. If the master receives multiple concurrent slave synchronization requests, it performs a single background save in order to serve all of them.
  • When a master and a slave reconnects after the link went down, a full re-sync is always performed. However starting with Redis 2.8, a partial re-synchronization is also possible.


In order to configure the replication, all you have to do is to add the line below (or issue the same as a CLI command from slave) to the redis.conf file of the slave.

  • SLAVEOF <master_ip> <master_port>             (ex. SLAVEOF 6379)


to tune the replication process you can play with following options in the redis.conf file:

  • requirepass <password> – Require clients to issue AUTH <PASSWORD> before processing any other commands. This might be useful in environments in which you do not trust (eg. don’t run your own servers) others with access to the host running redis-server
  • masterauth <master-password> – If the master is password protected (using the “requirepass” configuration directive above) it is possible to tell the slave to authenticate before starting the replication synchronization process, otherwise the master will refuse the slave request
  • slave-serve-stale-data <yes|no> – When a slave loses its connection with the master, or when the replication is still in progress, the slave can act in two different ways:
    • still reply to client requests, possibly with out-of-date data (the default behavior if the switch is set to “yes”)
    • or reply with an error “SYNC with master in progress” to all the kind of commands, except for to INFO and SLAVEOF (otherwise)
  • slave-read-only <yes|no> – You can configure a slave instance to accept writes or not. Writing against a slave instance may be useful to store some ephemeral data (because data written on a slave will be easily deleted after re-sync with the master anyway), but may also cause problems if clients are writing to it because of a misconfiguration
  • repl-ping-slave-period <seconds> – Slaves send PINGs to server in a predefined interval. It’s possible to change this interval with the repl_ping_slave_period option from CLI. The default value is 10 seconds.
  • repl-timeout <seconds> – This option sets a timeout for both Bulk transfer I/O timeout and master data or ping response timeout. The default value is 60 seconds. It is important to make sure that this value is greater than the value specified for repl-ping-slave-period otherwise a timeout will be detected every time there is low traffic between the master and the slave.
  • repl-disable-tcp-nodelay <yes|no> – Controls whether to disable TCP_NODELAY on the slave socket after SYNC. If you select “yes” Redis will use a smaller number of TCP packets and less bandwidth to send data to slaves. But this can add a delay for the data to appear on the slave side, up to 40 milliseconds with Linux kernels using a default configuration. If you select “no” the delay for data to appear on the slave side will be reduced but more bandwidth will be used for replication. Default value of “no” is an optimization for low latency, but in very high traffic conditions or when the master and slaves are many hops away, turning this to “yes” may be a good idea.
  • slave-priority <integer> – The slave priority is an integer number published by Redis in the INFO output. It is used by Redis Sentinel in order to select a slave to promote into a master if the master is no longer working correctly. A slave with a low priority number is considered better for promotion, so for instance if there are three slaves with priority 10, 100, 25 Sentinel will pick the one with priority 10, that is the lowest. However a special priority of 0 marks the slave as not able to perform the role of master, so a slave with priority of 0 will never be selected by Redis Sentinel for promotion.


Allowing writes only with N attached replicas (

  • Starting with Redis 2.8 it is possible to configure a Redis master in order to accept write queries only if at least N slaves are currently connected to the master, in order to improve data safety.
  • However because Redis uses asynchronous replication it is not possible to ensure the write actually received a given write, so there is always a window for data loss.
  • This is how the feature works:
    • Redis slaves ping the master every second, acknowledging the amount of replication stream processed.
    • Redis masters will remember the last time it received a ping from every slave.
    • The user can configure a minimum number of slaves that have a lag not greater than a maximum number of seconds.
    • If there are at least N slaves, with a lag less than M seconds, then the write will be accepted.
  • There are two configuration parameters for this feature:
    • min-slaves-to-write <number of slaves>
    • min-slaves-max-lag <number of seconds>


Have a nice weekend!

Redis Persistence

In my today’s post i’d like to touch on Redis persistence mechanisms.


What we can choose from are basically two options (or the combination of those):

  • The RDB persistence – which performs point-in-time snapshots of your dataset at specified intervals.
  • The AOF (append-only file) persistence – which logs every write operation received by the server, that can later be “played” again at server startup, reconstructing the original dataset (commands are logged using the same format as the Redis protocol itself).


Both of those options are controlled by two different groups of configuration settings in the redis.conf file:

  • RDB persistence:
    • save <seconds> <changes> – saving the DB on disk – the command will save the DB if both the given number of seconds and the given number of write operations against the DB occurred. You can have multiple save configurations “stacked” one after another, handling saves in different “seconds/changes” scenarios or you can disable saving at all commenting out all the “save” lines.
    • stop-writes-on-bgsave-error <yes|no> – by default Redis will stop accepting writes if RDB snapshots are enabled (at least one save point) and the latest background save failed. This will make the user aware (in an hard way) that data is not persisting on disk properly. If the background saving process will start working again Redis will automatically allow writes again.
    • rdbcompression <yes|no> – compression of string objects using LZF when dump .rdb databases.
    • rdbchecksum <yes|no> – since version 5 of RDB a CRC64 checksum is placed at the end of the file which makes the format more resistant to corruption but there is a performance hit to pay (around 10%) when saving and loading RDB files.
    • dbfilename <name> – The filename (default dump.rdb) where to dump the DB.
    • dir <path> – The working directory (default value is ./) where the DB will be written. The Append Only File will also be created inside this directory
  • AOF persistence:
    • appendonly <yes|no> – controls whether AOF mode should be turned on. By default Redis asynchronously

      dumps the dataset on disk (RDB Persistence) which is a mode good enough in many applications, but an issue with the Redis process or 

      a power outage may result into a few minutes of writes lost (depending on 

      the configured save points). AOF provides much better durability. Using the default data 

      fsync policy Redis can lose just one second of writes in a dramatic event like a server power outage, 

      or a single write if something wrong with the Redis process itself happens, but the operating system 

      is still running correctly. AOF and RDB persistence can be enabled at the same and they play very nicely

      together. If the AOF is enabled on startup Redis will load the AOF, that is the file with the better 

      durability guarantees.

    • appendfilename <name> – The name of the append only file (default: “appendonly.aof”)
    • appendfsync <mode> – mode in which fsync should operate. The fsync() call tells the Operating System to actually write data on disk 

      instead to wait for more data in the output buffer. Some OS will really flush 

      data on disk, some other OS will just try to do it ASAP. Redis supports three different modes:

      • <no>: don’t fsync, just let the OS flush the data when it wants. Faster.
      • <always>: fsync after every write to the append only log . Slow, Safest.
      • <everysec>: fsync only one time every second. Compromise. (default)
    • no-appendfsync-on-rewrite <yes|no> – when the AOF fsync policy is set to always or everysec, and a 

      background saving process (a background save or AOF log background rewriting) is 

      performing a lot of I/O against the disk, in some Linux configurations R

      edis may block too long on the fsync() call. In order to mitigate this problem it’s possible to use this option 

      which will prevent fsync() from being called in the main process while a BGSAVE or BGREWRITEAOF is in progress. 

      In practical terms, this means that it is 

      possible to lose up to 30 seconds of log in the worst scenario (with the default 

      Linux settings).

    • auto-aof-rewrite-percentage <percentage> and auto-aof-rewrite-min-size <size> – are both related to automatic rewrite of the append only file. Redis is able to automatically rewrite the log file (implicitly calling BGREWRITEAOF) when the AOF log size grows by the specified percentage. This is how it works: Redis remembers the size of the AOF file after the latest rewrite (if no rewrite has happened since the restart, the size of the AOF at startup is used). This base size is compared to the current size. If the current size is bigger than the specified percentage, the rewrite is triggered. Also you need to specify a minimal size for the AOF file to be rewritten, this is useful to avoid rewriting the AOF file even if the percentage increase is reached but it is still pretty small. Specify a percentage of zero in order to disable the automatic AOF rewrite feature.


Advantages and disadvantages of both methods (

  • RDB advantages
    • RDB is a very compact single-file point-in-time representation of your Redis data. RDB files are perfect for backups. For instance you may want to archive your RDB files every hour for the latest 24 hours, and to save an RDB snapshot every day for 30 days. This allows you to easily restore different versions of the data set in case of disasters.
    • RDB is very good for disaster recovery, being a single compact file can be transfered to far data centers, or on Amazon S3 (possibly encrypted).
    • RDB maximizes Redis performances since the only work the Redis parent process needs to do in order to persist is forking a child that will do all the rest. The parent instance will never perform disk I/O or alike.
    • RDB allows faster restarts with big datasets compared to AOF.
  • RDB disadvantages
    • RDB is NOT good if you need to minimize the chance of data loss in case Redis stops working (for example after a power outage). You can configure different save points where an RDB is produced (for instance after at least five minutes and 100 writes against the data set, but you can have multiple save points). However you’ll usually create an RDB snapshot every five minutes or more, so in case of Redis stopping working without a correct shutdown for any reason you should be prepared to lose the latest minutes of data.
    • RDB needs to fork() often in order to persist on disk using a child process. Fork() can be time consuming if the dataset is big, and may result in Redis to stop serving clients for some millisecond or even for one second if the dataset is very big and the CPU performance not great. AOF also needs to fork() but you can tune how often you want to rewrite your logs without any trade-off on durability.
  • AOF advantages
    • Using AOF Redis is much more durable: you can have different fsync policies: no fsync at all, fsync every second, fsync at every query. With the default policy of fsync every second write performances are still great (fsync is performed using a background thread and the main thread will try hard to perform writes when no fsync is in progress.) but you can only lose one second worth of writes.
    • The AOF log is an append only log, so there are no seeks, nor corruption problems if there is a power outage. Even if the log ends with an half-written command for some reason (disk full or other reasons) the redis-check-aof tool is able to fix it easily.
    • Redis is able to automatically rewrite the AOF in background when it gets too big. The rewrite is completely safe as while Redis continues appending to the old file, a completely new one is produced with the minimal set of operations needed to create the current data set, and once this second file is ready Redis switches the two and starts appending to the new one.
    • AOF contains a log of all the operations one after the other in an easy to understand and parse format. You can even easily export an AOF file. For instance even if you flushed everything for an error using a FLUSHALL command, if no rewrite of the log was performed in the meantime you can still save your data set just stopping the server, removing the latest command, and restarting Redis again.
  • AOF disadvantages
    • AOF files are usually bigger than the equivalent RDB files for the same dataset.
    • AOF can be slower then RDB depending on the exact fsync policy. In general with fsync set to every second performances are still very high, and with fsync disabled it should be exactly as fast as RDB even under high load. Still RDB is able to provide more guarantees about the maximum latency even in the case of an huge write load.
    • Redis AOF works incrementally updating an existing state, like MySQL or MongoDB does, while the RDB snapshotting creates everything from scratch again and again, that is conceptually more robust.


The general advice from Redis team is that you should use both persistence methods if you want a degree of data safety comparable to what PostgreSQL can provide you.



Take care!



Redis data sharding – part 2 – hash-based keys

In my previous post on Redis data sharding i introduced the concept of data sharding/partitioning and provided a small Java code example to illustrate the idea. As you noticed i was creating fixed-size “emailbuckets”, containing 1024 emails each. An email address was the key of my hash, while user id was the value. For a shard identifier i used a simple integer value (shardKey) obtained as a result of “i mod shardSize” operation.

Whereas such approach illustrates the concept well, it’s impractical in “real life” applications. This is due to a simple reason – knowing the email address alone (which may often be the only thing you’d know at some point of the app execution flow; for example when you’re using Spring Security and requesting emails “as usernames” during sign-in process), you wouldn’t be able to retrieve the corresponding userId. If you would know the algorithm by which shardKey was generated, in this case – yes – you would be able to traverse emailbuckets one-by-one looking for the appropriate email, but without that knowledge you wouldn’t be able to tell which shard the email address you’re looking for, ended up in.

One solution to that problem is to use a different key for sharding data; something that is computed directly based on the data you’re interested in partitioning. An obvious candidate here is the email address itself. If you’d be able to generate shardKey based on email address you could reproduce the same scenario every time a user provides you with his email during signing in and retrieve his userId (which you could use later on (for example) to further query another hash in Redis – “users:id” – that stores complete user profile).


This seems like an ideal task for a hash function… Let’s start first with some background on hashing. According to Neil Coffey’s Javamex article Introduction to hashing:

  • Hashing means using some function or algorithm to map object data (eg. content of a String object) to some representative integer value. This so-called hash code (or simply hash) can then be used as a way to narrow down our search when looking for the item…


Also, when you search Wikipedia after Java hashCode() function, you’ll get the following definition:

  • In the Java programming language, every class must provide a hashCode() method which digests the data stored in an instance of the class into a single hash value (a 32-bit signed integer). This hash is used by other code when storing or manipulating the instance – the values are intended to be evenly distributed for varied inputs in order to use in clustering. This property is important to the performance of hash tables and other data structures that store objects in groups (“buckets”) based on their computed hash values.


Looks like this is exactly what we’re interested in – …evenly distributed values for varied inputs…, which …is important to the performance of data structures that store objects in groups (shards in our case) based on their computed hash values.

Conclusion: Java hashCode() function is what we’ll proceed with.


More from Wikipedia on Java hashCode():

  • Starting with Java version 1.2, the java.lang.String class implements its hashCode() using a product sum algorithm over the entire text of the string.
  • An instance s of the java.lang.String class, would have a hash code h(s) defined by:
    Java String hashCode()


  • where terms are summed using Java 32-bit int addition, s[i] denotes the i-th character of the string, and n is the length of s.


Now, looking at Java docs on String.hashCode() function we can read:

  • The hash code for a String object is computed as
    s[0]*31^(n-1) + s[1]*31^(n-2) + … + s[n-1]


  • using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)


Finally let’s take a look at some Java code of String object showing how hashCode() has actually been implemented:

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        hash = h;
    return h;


An alternative (faster) implementation may look like this (from Apache Harmony JDK):

public int hashCode() {
    if (hashCode == 0) {
        int hash = 0, multiplier = 1;
        for (int i = offset + count - 1; i >= offset; i--) {
            hash += value[i] * multiplier;
            int shifted = multiplier << 5;
            multiplier = shifted - multiplier;
        hashCode = hash;
    return hashCode;

what’s the difference between the two above code snippets? As you can see, multiplication can be replaced by a bitwise shift operation and a subtraction for better performance. “(multiplier << 5) – multiplier” is just 31*multiplier after all (however VMs nowadays do this optimization automatically). If you’re interested in good reading on the subject of Binary numbers i strongly recommend Neil Coffey’s Javamex article: Introduction to binary numbers.


OK, applying all this knowledge to our sharding code example results in the following implementation:

while(i<1000000) {
    String userId = String.valueOf(i++);
    String emailAddress = String.format("", userId);
    int shardKey = emailAddress.hashCode();
    redisTemplate.opsForHash().put(String.format("emailsbucket:%s", shardKey), emailAddress, userId);


Happy coding! 🙂




Redis data sharding – part 1

In one of my previous posts on Redis i provided a definition of data sharding, quoting a great book “Redis in Action” authored by Dr. Josiah L Carlson:

  • “Sharding is a method by which you partition your data into different pieces. In this case, you partition your data based on IDs embedded in the keys, based on the hash of keys, or some combination of the two. Through partitioning your data, you can store and fetch the data from multiple machines, which can allow a linear scaling in performance for certain problem domains.”


Today i’d like to elaborate some more on data sharding based on IDs embedded in the keys.


Let’s start with an example of a hypothetical data stored in an Redis instance:

redis>keys *
(empty list or set)
redis>set emails:1
redis>get emails:1

what i did here is to use the basic String data type to store an email of a user. As you can see, i embedded user id within the key (’emails:1′). Now, if a front-end application would ask for an email address of a user with id=1, on the back-end side i would concatenate the keyword which i usually use to denote keys i store emails under (ie. ’emails’) with the id of a user (‘1’), add a colon (‘:’) in between, and this way i’ll get the resulting key (’emails:1′) i should look after while making a call to the Redis instance.


This solution is nice but if i’ll have 1 million of users registered in my system and use Redis as the data store for keeping mappings between identifier of a user and his email, i will end up with 1 million keys (’emails:1′, ’emails:2′, ’emails:3′, etc.). This is a volume my Redis instance will easily handle (see my previous post on Redis Performance Basics) and it will use little more than 190MB to store everything in the memory (so much due to a lot of overhead when storing small keys and values; the ratio is much better with large keys/values), but this is only one attribute we’re talking about – and what about firstName, lastName, etc.?. Obviously, if my system will have millions of registered users and i’d use Redis as my primary data store for users-related info, i would be running multiple instances of Redis already and based on the id of a user, route the queries to a specific instance, but there’s still a lot we can do to optimize costs prior to thinking about scaling.


Small code snippet to generate 1M of emails stored in Redis using String data structure (and Spring Data Redis mentioned in my other post).

int i = 0;
while(i<1000000) {
    redisTemplate.opsForValue().set(String.format("emails:%s", i++), "");

the loop above executes in 2 mins on my Windows 8 64bit i7 laptop and the ‘redis-server’ process allocates ca 190 MB of memory.


Now, what will happen if we change the data structure let’s say to a Redis Hash?

Next code snippet and we’re getting just that:

int i = 0;
while(i<1000000) {
    String userId = String.valueOf(i++);
    String emailAddress = String.format("", userId);
    redisTemplate.opsForHash().put("emails", emailAddress, userId)

2 mins and 165 MB of memory allocated – a 15 % gain absolutely for free.


Let’s try with data sharding/partitioning. Another code snippet using Redis Hash data structure and there you go:

int shardSize = 1024;
int i = 0;
while(i<1000000) {
    int shardKey = i/shardSize;
    String userId = String.valueOf(i++);
    String emailAddress = String.format("", userId);
    redisTemplate.opsForHash().put(String.format("emailsbucket:%s", shardKey), emailAddress, userId);

2 mins later and… only 30 MB allocated – now you’re talking Mariusz!

Staggering 530 % increase in memory allocation efficiency!


Hope you enjoyed the first part of this brief tutorial.