Category Archives: Spring

Redis data sharding – part 1

In one of my previous posts on Redis i provided a definition of data sharding, quoting a great book “Redis in Action” authored by Dr. Josiah L Carlson:

  • “Sharding is a method by which you partition your data into different pieces. In this case, you partition your data based on IDs embedded in the keys, based on the hash of keys, or some combination of the two. Through partitioning your data, you can store and fetch the data from multiple machines, which can allow a linear scaling in performance for certain problem domains.”

 

Today i’d like to elaborate some more on data sharding based on IDs embedded in the keys.

 

Let’s start with an example of a hypothetical data stored in an Redis instance:

redis 127.0.0.1:6379>keys *
(empty list or set)
redis 127.0.0.1:6379>set emails:1 me@mariuszprzydatek.com
OK
redis 127.0.0.1:6379>get emails:1
"me@mariuszprzydatek.com"

what i did here is to use the basic String data type to store an email of a user. As you can see, i embedded user id within the key (’emails:1′). Now, if a front-end application would ask for an email address of a user with id=1, on the back-end side i would concatenate the keyword which i usually use to denote keys i store emails under (ie. ’emails’) with the id of a user (‘1’), add a colon (‘:’) in between, and this way i’ll get the resulting key (’emails:1′) i should look after while making a call to the Redis instance.

 

This solution is nice but if i’ll have 1 million of users registered in my system and use Redis as the data store for keeping mappings between identifier of a user and his email, i will end up with 1 million keys (’emails:1′, ’emails:2′, ’emails:3′, etc.). This is a volume my Redis instance will easily handle (see my previous post on Redis Performance Basics) and it will use little more than 190MB to store everything in the memory (so much due to a lot of overhead when storing small keys and values; the ratio is much better with large keys/values), but this is only one attribute we’re talking about – and what about firstName, lastName, etc.?. Obviously, if my system will have millions of registered users and i’d use Redis as my primary data store for users-related info, i would be running multiple instances of Redis already and based on the id of a user, route the queries to a specific instance, but there’s still a lot we can do to optimize costs prior to thinking about scaling.

 

Small code snippet to generate 1M of emails stored in Redis using String data structure (and Spring Data Redis mentioned in my other post).

int i = 0;
while(i<1000000) {
    redisTemplate.opsForValue().set(String.format("emails:%s", i++), "me@mariuszprzydatek.com");
}

the loop above executes in 2 mins on my Windows 8 64bit i7 laptop and the ‘redis-server’ process allocates ca 190 MB of memory.

 

Now, what will happen if we change the data structure let’s say to a Redis Hash?

Next code snippet and we’re getting just that:

int i = 0;
while(i<1000000) {
    String userId = String.valueOf(i++);
    String emailAddress = String.format("user_%s@mariuszprzydatek.com", userId);
    redisTemplate.opsForHash().put("emails", emailAddress, userId)
}

2 mins and 165 MB of memory allocated – a 15 % gain absolutely for free.

 

Let’s try with data sharding/partitioning. Another code snippet using Redis Hash data structure and there you go:

int shardSize = 1024;
int i = 0;
while(i<1000000) {
    int shardKey = i/shardSize;
    String userId = String.valueOf(i++);
    String emailAddress = String.format("user_%s@mariuszprzydatek.com", userId);
    redisTemplate.opsForHash().put(String.format("emailsbucket:%s", shardKey), emailAddress, userId);
}

2 mins later and… only 30 MB allocated – now you’re talking Mariusz!

Staggering 530 % increase in memory allocation efficiency!

 

Hope you enjoyed the first part of this brief tutorial.

 

Cheers!

 

 

Resources:

Advertisement

Spring Data Redis overview

If you are, like me, a great fan of the Spring Framework, you probably know already the Spring Data product and corresponding spring-data-redis module. If not, let me introduce this wonderful tool in this brief post.

 

Spring Data Redis offers the following features (copied from the product homepage):

  • Connection package as low-level abstraction across multiple Redis drivers/connectors (Jedis,  JRedisLettuceSRP and RJC)
  • Exception translation to Spring’s portable Data Access exception hierarchy for Redis driver exceptions
  • RedisTemplate that provides a high level abstraction for performing various redis operations, exception translation and serialization support
  • Pubsub support (such as a MessageListenerContainer for message-driven POJOs)
  • JDK, String, JSON and Spring Object/XML mapping serializers
  • JDK Collection implementations on top of Redis
  • Atomic counter support classes
  • Sorting and Pipelining functionality
  • Dedicated support for SORT, SORT/GET pattern and returned bulk values
  • Redis implementation for Spring 3.1 cache abstraction

 

As of the time of writing this post, the latest product release is labeled ‘1.0.6.RELEASE’, and available as a Maven dependency:

<dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-redis</artifactId>
    <version>1.0.6.RELEASE</version>
</dependency>

 

Using Spring Data Redis in your project is as easy as defining the above dependency in your master pom.xml file, and configuring the RedisTemplate bean in either xml context file (example below) or using Java configuration:

    <context:property-placeholder location="classpath:redis.properties"/>

    <bean id="connectionFactory"
          class="org.springframework.data.redis.connection.jedis.JedisConnectionFactory"
          p:hostName="${redis.host}"
          p:port="${redis.port}"
          p:password="${redis.pass}"
          p:usePool="${redis.pool}" />

    <bean id="stringRedisSerializer" class="org.springframework.data.redis.serializer.StringRedisSerializer" />

    <bean id="redisTemplate" class="org.springframework.data.redis.core.RedisTemplate"
          p:connectionFactory-ref="connectionFactory"
          p:defaultSerializer-ref="stringRedisSerializer" />

 

and the corresponding redis.config file:

# Redis settings
redis.host=localhost
redis.port=6379
redis.pass=
redis.pool=true

 

…in code you’re using the RedisTemplate like this:

@Autowired
private RedisTemplate redisTemplate;

    public void saveEmail(String email, long userId) {
        redisTemplate.opsForHash().put("emails", String.valueOf(userId), email);
    }

 

I did also i quick overview of the extend to which Redis native API commands, related to performing operations on 5 basic Redis data types, have been implemented in the product. Below you’ll find a short visual summary:

 

  • Strings

Spring Data Redis String

 

 

  • Lists

Spring Data Redis List

 

 

  • Sets

Spring Data Redis Set

 

 

  • Hashes

Spring Data Redis Hash

 

 

  • ZSets

Spring Data Redis ZSet

 

 

Cheers!

 

 

 

Resources:

Tomcat, JNDI and Spring bean application configuration

While setting up a Continuous Integration environment recently i faced an issue related to application (REST API in this case) configuration not being deployment-environment independent. Namely as the code pushed to Git repository and picked up by Jenkins build server was later on automatically deployed across several server environments (DEV, INT, STAGING, PROD) it turned out that in each of those environments the API application (war archive deployed in Tomcat container) requires to be fed with a specific/different configuration (environment-specific settings).

 

This is how i managed to solve this issue:

 

1. I created the following Tomcat context entry in “conf/context.xml” file:

<Context>
    <Resource name="config/Api"
        auth="Container"
        type="com.mycompany.model.ApiConfigBean"
        factory="com.mycompany.api.jndi.CustomApiJNDIFactory"
        scheme="https"
        server="api.mycompany.com"
        port="8443"
        version="1.0"
        sso="sso.mycompany.com" />
</Context>

 

2. Created the “CustomApiJNDIFactory” class:

public class CustomApiJNDIFactory implements ObjectFactory {

    @Override
    public Object getObjectInstance(Object obj, Name name, Context nameCtx, Hashtable<?,?> environment) throws Exception {

        validateProperty(obj, "Invalid JNDI object reference");

        String scheme = null;
        String host = null;
        String port = null;
        String version = null;
        String sso = null;

        Reference ref = (Reference) obj;

        Enumeration props = ref.getAll();

        while (props.hasMoreElements()) {

            RefAddr addr = props.nextElement();
            String propName = addr.getType();
            String propValue = (String) addr.getContent();

            switch (propName) {
                case "scheme":
                    scheme = propValue;
                    break;
                case "host":
                    host = propValue;
                    break;
                case "port":
                    port = propValue;
                    break;
                case "version":
                    version = propValue;
                    break;
                case "sso":
                    sso = propValue;
                    break;
            }

        }

        // validate properties
        validateProperty(scheme, "Invalid or empty scheme type");
        validateProperty(host, "Invalid or empty server host name");
        validateProperty(port, "Invalid or empty port number");
        validateProperty(version, "Invalid or empty API version number");
        validateProperty(sso, "Invalid or empty SSO server name");

        //create API Configuration Bean
        return new ApiConfigBean(scheme, host, port, version, sso);

    }

    /**
     * Validate internal String properties
     *
     * @param property the property
     * @param errorMessage the error message
     * @throws javax.naming.NamingException
     */
    private void validateProperty(String property, String errorMessage) throws NamingException {

        if (property == null || property.trim().equals("")) {
            throw new NamingException(errorMessage);
        }

    }

}

 

3. Defined an jndi-lookup entry in my “spring-api-context.xml” file that will read Tomcat JNDI configuration entry and expose it as a Spring bean of name jndiApi:

<jee:jndi-lookup id="jndiApi" jndi-name="java:/comp/env/config/Api" expected-type="com.mycompany.model.ApiConfigBean" />

 

4. Created the “jndiApi” Spring bean backing pojo

public class ApiConfigBean {

    private String scheme;
    private String host;
    private String port;
    private String version;
    private String sso;

    public ApiConfigBean(String scheme, String host, String port, String version, String sso) {
        this.scheme = scheme;
        this.host = host;
        this.port = port;
        this.version = version;
        this.sso = sso;
    }

    // getters and setters ommited.

}

 

5. and finally wired-in the bean to my classes where i needed to make use of the “externalized” configuration:

@Autowired
@Qualifier("jndiApi")
private ApiConfigBean apiConfigBean;

    public void foo() {

        String host = apiConfigBean.getHost();
        ...

    }

 

 

That’s it! Have a wonderful day! 🙂