mariuszprzydatek.com

Redis performance basics

Databases, NoSQL, Redis August 7, 2013

I wish i knew the below when starting my adventure with Redis…

General notes:

advantage of a in-memory database like Redis is that the memory representation of complex data structure is much simpler to manipulate compared to the same data structure on disk.
At the same time Redis’ on-disk data format does not need to be suitable for random access and therefore is compact and always generated in an append-only fashion
1 Million keys with the key being the natural numbers from 0 to 999999 and the string “Hello World” as value use 100MB on a 32bit computer. The same data stored linearly in an unique string takes something like 16MB and this is because with small keys and values in Redis there is a lot of overhead. With large keys/values the ratio is much better.
64 bit systems will use considerably more memory than 32 bit systems to store the same keys, especially if the keys and values are small, this is because pointers takes 8 bytes in 64 bit systems. But of course the advantage is that 64 bit systems can address/have a lot more of memory, so in order to run large Redis servers a 64 bit system is more or less required
because Redis stores everything in memory you cannot obviously have a dataset larger than the memory of your server. But Redis users like Craigslist and Groupon are distributing their data among multiple Redis nodes, using client-side hashing which can be an effective solution.
client libraries such Redis-rb (the Ruby client) and Predis (one of the most used PHP clients) are able to handle multiple Redis servers automatically using consistent hashing.
if you don’t want to use consistent hashing or data distribution across different nodes, you can consider using Redis as a secondary data store (metadata, small but often written info and all the other things that get accessed very frequently: user auth tokens, Redis Lists with chronologically ordered IDs of the last N-comments, N-posts, etc.).
Write scripts that will monitor your Redis servers (checking for critical conditions) using the INFO command that reports the memory utilization
Redis on-disk-snapshots are atomic – Redis background saving process is always fork(2)ed when the server is outside of the execution of a command, so every command reported to be atomic in RAM is also atomic from the point of view of the disk snapshot.
Although Redis is single threaded it’s very unlikely that CPU becomes a bottleneck, this is because Redis usually either memory or network bound. For example using pipelining Redis running on an average Linux system can deliver even 500k requests per second, so if an application mainly uses O(N) or O(log(N)) commands it is hardly going to use too much CPU. However, you can maximize CPU usage by starting multiple instances of Redis in the same box and treat them as different servers.
maximum number of keys a single Redis instance can hold is in theory up to 232. Redis was tested in practice to handle at least 250 million of keys per instance.
Every list, set, and sorted set, can hold in theory 232 – 1 (4294967295, more than 4 billion) elements as well.

Keys:

are binary safe (ie. you can use any binary sequence as a key, including empty string)
too long keys (eg. 1kB) are a bad idea, both memory-wise as well as because the lookup of the key in the dataset may require several costly key-comparisons
the shorter the key the less memory it’ll consume, however it’s negligible compared to to the space used by the key object itself and the value object. Therefore go with human readable keys (eg. password:789 instead of p:789)
stick with a naming convention, eg. object-type:id:field or comment:1234:reply.to, etc.

Data structures:

Strings:
- values can’t be bigger than 512 MB.
- Keys can be anything including binary mp3
- the INCR command parses the string as an integer, increments by one, and creates a new string
- other related commands (INCRBY, DECR, DECRBY) all use internally INCR
- INCR is atomic – multiple clients issuing INCR against the same key will never incur into a race condition
Hashes:
- maps between string fields and string values (therefore a perfect data type to represent objects)
- Hashes are encoded using a memory efficient data structure when they have a small number of entries, and the biggest entry does not exceed a given threshold. These thresholds can be configured in /redis.conf file with following directives:
  - hash-max-ziplist-entries (default value 512)
  - hash-max-ziplist-value (default value 64)
Lists:
- are implemented via Linked Lists (not via Arrays)
- in case of a list containing millions of elements, adding a new element in the head or in the tail of the list is still performed in constant time O(1)
- main feature of lists from the point of time complexity is the support for constant time insertion and deletion of (even many millions) elements near the head and tail. Accessing elements is very fast near the extremes of the list but is slow if you try accessing the middle of a very big list, as it is an O(N) operation.
- Accessing an element by index is not so fast in lists implemented by linked lists (Arrays are better in this case)
- with LRANGE you can easily paginate results (constant length in constant time)
- putting objects inside lists isn’t a good idea as you often need to access those objects and given that Redis is using Linked Lists as the underlying impl, it’s not efficient.
- Lists are sortable
- Similarly to hashes, small lists are also encoded in a special way in order to save a lot of space. The special representation is only used when you are under the following limits:
  - list-max-ziplist-entries 512
  - list-max-ziplist-value 64
Sets:
- unordered collection of binary-safe strings
- adding the same element multiple times will result in a set having a single copy of this element
- are very good for expressing relations between objects (eg. to implement “tags”)
- Sets are sortable
- Sets have a special encoding in /redis.config in just one case: when a set is composed of just strings that happens to be integers in radix 10 in the range of 64 bit signed integers. The following configuration setting sets the limit in the size of the set in order to use this special memory saving encoding.
  - set-max-intset-entries 512
ZSets
- sorted Sets are similar to Sets, collections of binary-safe strings, but this time with an associated score throughout the use of which you’re getting “sorting capabilities”
- you can think of ZSet as of a equivalent of an Index in the SQL world
- they are implemented via a dual-ported data structure containing both a skip list and a hash table, so every time you add an element Redis performs an O(log(N)) operation
- ZSets have a “default” ordering but you are still free to call the SORT command against sorted sets to get a different ordering (but in this case the server will waste CPU). Solution for having multiple orders is to add every element in multiple sorted sets simultaneously.
- calling again ZADD against an element already included in the sorted set will update its score (and position) in O(log(N)), so sorted sets are suitable even when there are tons of updates
- Similarly to hashes and lists, sorted sets are also specially encoded in order to save a lot of space. This encoding is only used when the length and elements of a sorted set are below the following limits:
  - zset-max-ziplist-entries 128
  - zset-max-ziplist-value 64

Resources:

Time complexity (http://en.wikipedia.org/wiki/Time_complexity)
Redis Data Type intro (http://redis.io/topics/data-types-intro)

Let’s say we want to task the back-end REST API application with validating the syntax of an email address provided by a new user while doing the registration. Aside from the fact that the “first level” validation should always happen on the front-end side (using for example a JavaScript solution) it is still a good idea to do a “second level” validation on the back-end (a user can have JavaScript support disabled in his/her browser or try some dirty hacks on our webpage and be sending invalid data to the server causing inconsistencies at best).

What would be a nice approach to handle server-side validation? A custom annotation – something like @ValidEmailSyntax

Here’s an idea on how to implement it:

1) Create an interface for the Validation Annotation:

@Target({ElementType.FIELD, ElementType.METHOD, ElementType.ANNOTATION_TYPE})
@Retention(RetentionPolicy.RUNTIME)
@Constraint(validatedBy = EmailSyntaxValidator.class)
@Documented
public @interface ValidEmailSyntax {

    String message() default "Invalid email syntax";
    Class<?>[] groups() default { };
    Class<? extends Payload>[] payload() default { };

}

As you can see from the code above, our @ValidEmailSyntax annotation can be used to annotate a field, method or another annotation
“Invalid email syntax” is the message returned by the server in case of failed validation

2) Create a EmailSyntaxValidator class that will be responsible for making decision whether the subject of validation (field, method, etc…) is valid or not:

@Component
public class EmailSyntaxValidator implements ConstraintValidator<ValidEmailSyntax, EmailAddressType> {

    protected final Logger logger = LoggerFactory.getLogger(getClass());

    @Override
    public void initialize(ValidEmailSyntax constraintAnnotation) {
        logger.debug("EmailSyntaxValidator initialization successful");
    }

    @Override
    public boolean isValid(EmailAddressType value, ConstraintValidatorContext context) {

        return value == null || Pattern.compile(EmailStructure.PATTERN, Pattern.CASE_INSENSITIVE).matcher(value.getEmail()).matches();

    }

}

3) The validator class above will validate the the email represented by following EmailAddressType object:

@JsonSerialize(using = ToStringSerializer.class)
public class EmailAddressType implements Serializable {

    private static final long serialVersionUID = 1L;

    private String email;

    public EmailAddressType(String email) {
        this.email = email;
    }

    public String getEmail() {
        return email;
    }

    public void setEmail(String email) {
        this.email = email;
    }

    @Override
    public String toString() {
        return email;
    }

}

4) …against the following email structure pattern (represented by an EmailStructure interface):

public interface EmailStructure {

    static final String ATOM = "[a-z0-9!#$%&'*+/=?^_`{|}~-]";
    static final String DOMAIN = "(" + ATOM + "+(\\." + ATOM + "+)+";
    static final String IP_DOMAIN = "\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\]";

    static final String PATTERN = "^" + ATOM + "+(\\." + ATOM + "+)*@" + DOMAIN + "|" + IP_DOMAIN + ")$";

}

5) finally this is how we will use the validation annotation in our code:

public class User {

    @ValidEmailSyntax
    private EmailAddressType email;
    ...
    //getters and setters ommited

}

Happy coding! 🙂

Tomcat, JNDI and Spring bean application configuration

Java, JNDI, Spring, Tomcat August 6, 2013 Leave a comment

While setting up a Continuous Integration environment recently i faced an issue related to application (REST API in this case) configuration not being deployment-environment independent. Namely as the code pushed to Git repository and picked up by Jenkins build server was later on automatically deployed across several server environments (DEV, INT, STAGING, PROD) it turned out that in each of those environments the API application (war archive deployed in Tomcat container) requires to be fed with a specific/different configuration (environment-specific settings).

This is how i managed to solve this issue:

1. I created the following Tomcat context entry in “conf/context.xml” file:

<Context>
    <Resource name="config/Api"
        auth="Container"
        type="com.mycompany.model.ApiConfigBean"
        factory="com.mycompany.api.jndi.CustomApiJNDIFactory"
        scheme="https"
        server="api.mycompany.com"
        port="8443"
        version="1.0"
        sso="sso.mycompany.com" />
</Context>

2. Created the “CustomApiJNDIFactory” class:

public class CustomApiJNDIFactory implements ObjectFactory {

    @Override
    public Object getObjectInstance(Object obj, Name name, Context nameCtx, Hashtable<?,?> environment) throws Exception {

        validateProperty(obj, "Invalid JNDI object reference");

        String scheme = null;
        String host = null;
        String port = null;
        String version = null;
        String sso = null;

        Reference ref = (Reference) obj;

        Enumeration props = ref.getAll();

        while (props.hasMoreElements()) {

            RefAddr addr = props.nextElement();
            String propName = addr.getType();
            String propValue = (String) addr.getContent();

            switch (propName) {
                case "scheme":
                    scheme = propValue;
                    break;
                case "host":
                    host = propValue;
                    break;
                case "port":
                    port = propValue;
                    break;
                case "version":
                    version = propValue;
                    break;
                case "sso":
                    sso = propValue;
                    break;
            }

        }

        // validate properties
        validateProperty(scheme, "Invalid or empty scheme type");
        validateProperty(host, "Invalid or empty server host name");
        validateProperty(port, "Invalid or empty port number");
        validateProperty(version, "Invalid or empty API version number");
        validateProperty(sso, "Invalid or empty SSO server name");

        //create API Configuration Bean
        return new ApiConfigBean(scheme, host, port, version, sso);

    }

    /**
     * Validate internal String properties
     *
     * @param property the property
     * @param errorMessage the error message
     * @throws javax.naming.NamingException
     */
    private void validateProperty(String property, String errorMessage) throws NamingException {

        if (property == null || property.trim().equals("")) {
            throw new NamingException(errorMessage);
        }

    }

}

3. Defined an jndi-lookup entry in my “spring-api-context.xml” file that will read Tomcat JNDI configuration entry and expose it as a Spring bean of name jndiApi:

<jee:jndi-lookup id="jndiApi" jndi-name="java:/comp/env/config/Api" expected-type="com.mycompany.model.ApiConfigBean" />

4. Created the “jndiApi” Spring bean backing pojo

public class ApiConfigBean {

    private String scheme;
    private String host;
    private String port;
    private String version;
    private String sso;

    public ApiConfigBean(String scheme, String host, String port, String version, String sso) {
        this.scheme = scheme;
        this.host = host;
        this.port = port;
        this.version = version;
        this.sso = sso;
    }

    // getters and setters ommited.

}

5. and finally wired-in the bean to my classes where i needed to make use of the “externalized” configuration:

@Autowired
@Qualifier("jndiApi")
private ApiConfigBean apiConfigBean;

    public void foo() {

        String host = apiConfigBean.getHost();
        ...

    }

That’s it! Have a wonderful day! 🙂

mariuszprzydatek.com

Redis performance basics

Custom Validation Annotations

Tomcat, JNDI and Spring bean application configuration

Welcome to my Blog on Software Engineering

Recent Posts

Subscribe to RSS

Follow Blog via Email

Archives

Categories

Recent reads

Mariusz Przydatek