mariuszprzydatek.com

Amazon AWS – SQS – Simple Queue Service

Amazon AWS, Others September 19, 2013

Basic information about Amazon SQS Service:

AWS Free Tier availability:

1 million requests

Developer Resources:

Functionality:

no limits on number of queues and number of messages.
queue can be created in any region.
message payload can contain up to 256KB of text in any format. Each 64KB ‘chunk’ of payload is billed as 1 request. For example, a single API call with a 256KB payload will be billed as four requests.
messages can be sent, received or deleted in batches of up to 10 messages or 256KB. Batches cost the same amount as single messages, meaning SQS can be even more cost effective for customers that use batching.
long polling reduces extraneous polling to help you minimize cost while receiving new messages as quickly as possible. When your queue is empty, long-poll requests wait up to 20 seconds for the next message to arrive. Long poll requests cost the same amount as regular requests.
messages can be retained in queues for up to 14 days.
messages can be sent and read simultaneously.
when a message is received, it becomes “locked” while being processed. This keeps other computers from processing the message simultaneously. If the message processing fails, the lock will expire and the message will be available again. In the case where the application needs more time for processing, the “lock” timeout can be changed dynamically via the ChangeMessageVisibility operation.
developers can securely share Amazon SQS queues with others. Queues can be shared with other AWS accounts and Anonymously. Queue sharing can also be restricted by IP address and time-of-day.
when combined with Amazon SNS, developers can ‘fanout’ identical messages to multiple SQS queues in parallel. When developers want to process the messages in multiple passes, fanout helps complete this more quickly, and with fewer delays due to bottlenecks at any one stage. Fanout also makes it easier to record duplicate copies of your messages, for example in different databases.

Common design patterns with SQS and other AWS components:

Work Queues: Decoupling components of a distributed application that may not all process the same amount of work simultaneously.
Buffer and Batch Operations: Adding scalability and reliability to the architecture, and smooth out temporary volume spikes without losing messages or increasing latency.
Request Offloading: Moving slow operations off of interactive request paths by enqueing the request.
Fanout: Combined with SNS to send identical copies of a message to multiple queues in parallel for simultaneous processing.

Service interface:

CreateQueue: Create queues for use with your AWS account.
ListQueues: List your existing queues.
DeleteQueue: Delete one of your queues.
SendMessage: Add messages to a specified queue.
SendMessageBatch: Add multiple messages to a specified queue.
ReceiveMessage: Return one or more messages from a specified queue.
ChangeMessageVisibility: Change the visibility timeout of previously received message.
ChangeMessageVisibilityBatch: Change the visibility timeout of multiple previously received messages.
DeleteMessage: Remove a previously received message from a specified queue.
DeleteMessageBatch: Remove multiple previously received messages from a specified queue.
SetQueueAttributes: Control queue settings like the amount of time that messages are locked after being read so they cannot be read again.
GetQueueAttributes: Get information about a queue like the number of messages in it.
GetQueueUrl: Get the queue URL.
AddPermission: Add queue sharing for another AWS account for a specified queue.
RemovePermission: Remove an AWS account from queue sharing for a specified queue.

Message Lifecycle:

A system that needs to send a message will find an Amazon SQS queue, and use SendMessage to add a new message to it.
A different system that processes messages needs more messages to process, so it calls ReceiveMessage, and this message is returned.
Once a message has been returned by ReceiveMessage, it will not be returned by any other ReceiveMessage until the visibility timeout has passed. This keeps multiple computers from processing the same message at once.
If the system that processes messages successfully finishes working with this message, it calls DeleteMessage, which removes the message from the queue so no one else will ever process it. If this system fails to process the message, then it will be read by another ReceiveMessage call as soon as the visibility timeout passes.

Resources:

SQS long pooling (http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html)
Subscribing Queue to Amazon SNS Topic (http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqssubscribe.html)
Product page (http://aws.amazon.com/sqs/)

Displaying GIT build number (hash) in your REST API

Java, Spring September 18, 2013 Leave a comment

The product i’m working on currently (a PaaS cloud offering) had a requirement to provide an API resource (GET call) throughout which a user could obtain basic details about the actual version of the API exposed (the api version, build time, corresponding git repo build number (hash id) and the jvm version used to compile the API). Except for the git repo hash part, everything else seemed to be quite easy to obtain. Below you’ll find the solution (step-by-step guide) i came up with.

End result (goal):

> curl http://api.my-system.company.com/1.0/
{
  "Implementation-Build" : "2951e7e",
  "Implementation-Build-Time" : "2013/09/17 12:40:02 AM,UTC",
  "Implementation-Jdk" : "1.7.0_15",
  "Implementation-Version" : "1.0-SNAPSHOT",
  "Implementation-Vendor" : "My Company, Inc.",
  "Implementation-Title" : "My System"
}

Technologies used:

Steps required:

1. First let’s add the <scm> configuration tag to your master pom.xml file. The connection string represents the repository for which the buildnumber-maven-plugin will obtain the git hash id.

<scm>
    <!-- Replace the connection below with your project connection -->
    <connection>scm:git:git://github.com/mariuszprzydatek/hyde-park.git</connection>
</scm>

2. configure the maven-war-plugin to generate project’s MANIFEST.MF file, where the git hash id will be stored. Also, the Spring MVC controller will read this file in order to return its content as a result of GET call.

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-war-plugin</artifactId>
    <version>2.3</version>
    <configuration>
        <archive>
            <addMavenDescriptor>false</addMavenDescriptor>
            <manifest>
                <addDefaultImplementationEntries>true</addDefaultImplementationEntries>
            </manifest>
        </archive>
        <warName>1.0</warName>
    </configuration>
</plugin>

3. In the <properties> section of the pom we can define the format for the date timestamp that will be returned as the value of “Implementation-Build-Time” attribute.

<properties>
    <maven.build.timestamp.format>yyyy/MM/dd hh:mm:ss a,z</maven.build.timestamp.format>
</properties>

4. Next, let’s add the remaining pom sections that we’ll be storing in the MANIFEST.MF file for further read:

    <version>1.0-SNAPSHOT</version>
    <organization>
        <name>My Company, Inc.</name>
    </organization>
    <name>My System</name>

5. within the <archive> key of the maven-war-plugin <configuration> section, we need to add additional manifest entries including the one (<Implementation-Build>) that will be generated by the buildnumber-maven-plugin:

<archive>
    ...
    <manifestEntries>
        <Implementation-Build>${buildNumber}</Implementation-Build>
        <Implementation-Build-Time>${maven.build.timestamp}</Implementation-Build-Time>
    </manifestEntries>
</archive>

6. Add the buildnumber-maven-plugin itself which will do the hard work:

<plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>buildnumber-maven-plugin</artifactId>
    <version>1.1</version>
    <executions>
        <execution>
            <phase>validate</phase>
            <goals>
                <goal>create</goal>
            </goals>
        </execution>
    </executions>
</plugin>

7. Finally, add the <configuration> section to the buildnumber-maven-plugin together with the <shortRevisionLength> key that is responsible for the length of git hash id we want to export:

<configuration>
    <shortRevisionLength>7</shortRevisionLength>
</configuration>

Now, let’s create the Spring MVC controller that will be handling the MANIFEST.FM file read and returning its content to the presentation layer.

@Controller
@RequestMapping
public class ApiController {

    /**
     * Handling GET request to retrieve details from MANIFEST.MF file
     * @return implementation details
     */
    @RequestMapping(method = RequestMethod.GET)
    public @ResponseBody Map<String, String> getBuildNumber(HttpServletRequest request) throws IOException {

        ServletContext context = request.getSession().getServletContext();
        InputStream manifestStream = context.getResourceAsStream("/META-INF/MANIFEST.MF");
        Manifest manifest = new Manifest(manifestStream);

        Map<String, String> response = new HashMap<>();
        response.put("Implementation-Vendor", manifest.getMainAttributes().getValue("Implementation-Vendor"));
        response.put("Implementation-Title", manifest.getMainAttributes().getValue("Implementation-Title"));
        response.put("Implementation-Version", manifest.getMainAttributes().getValue("Implementation-Version"));
        response.put("Implementation-Jdk", manifest.getMainAttributes().getValue("Build-Jdk"));
        response.put("Implementation-Build", manifest.getMainAttributes().getValue("Implementation-Build"));
        response.put("Implementation-Build-Time", manifest.getMainAttributes().getValue("Implementation-Build-Time"));

        return response;

    }

}

Hope you enjoyed this post.

Take care!

Redis Replication

Databases, NoSQL, Redis August 30, 2013

Continuing on my series of introductory posts on Redis DB, today i’ll address the subject of replication.

Definition:

Replication is a method by which other servers receive a continuously updated copy of the data as it’s being written, so that the replicas can service read queries.

Basic info (redis.io):

Redis uses asynchronous replication. Starting with Redis 2.8 there is however a periodic (one time every second) acknowledge of the replication stream processed by slaves.
A master can have multiple slaves.
Slaves are able to accept other slaves connections. Aside from connecting a number of slaves to the same master, slaves can also be connected to other slaves in a graph-like structure.
Redis replication is non-blocking on the master side, this means that the master will continue to serve queries when one or more slaves perform the first synchronization.
Replication is non blocking on the slave side: while the slave is performing the first synchronization it can reply to queries using the old version of the data set, assuming you configured Redis to do so in redis.conf. Otherwise you can configure Redis slaves to send clients an error if the link with the master is down. However there is a moment where the old dataset must be deleted and the new one must be loaded by the slave where it will block incoming connections.
Replications can be used both for scalability, in order to have multiple slaves for read-only queries (for example, heavy SORT operations can be offloaded to slaves), or simply for data redundancy.
It is possible to use replication to avoid the saving process on the master side: just configure your master redis.conf to avoid saving (just comment all the “save” directives), then connect a slave configured to save from time to time.

How Redis replication works (redis.io):

If you set up a slave, upon connection it sends a SYNC command. And it doesn’t matter if it’s the first time it has connected or if it’s a re-connection.
The master then starts background saving, and collects all new commands received that will modify the dataset. When the background saving is complete, the master transfers the database file to the slave, which saves it on disk, and then loads it into memory. The master will then send to the slave all accumulated commands, and all new commands received from clients that will modify the dataset. This is done as a stream of commands and is in the same format of the Redis protocol itself.
You can try it yourself via telnet. Connect to the Redis port while the server is doing some work and issue the SYNC command. You’ll see a bulk transfer and then every command received by the master will be re-issued in the telnet session.
Slaves are able to automatically reconnect when the master <-> slave link goes down for some reason. If the master receives multiple concurrent slave synchronization requests, it performs a single background save in order to serve all of them.
When a master and a slave reconnects after the link went down, a full re-sync is always performed. However starting with Redis 2.8, a partial re-synchronization is also possible.

In order to configure the replication, all you have to do is to add the line below (or issue the same as a CLI command from slave) to the redis.conf file of the slave.

SLAVEOF <master_ip> <master_port> (ex. SLAVEOF 127.0.0.1 6379)

to tune the replication process you can play with following options in the redis.conf file:

requirepass <password> – Require clients to issue AUTH <PASSWORD> before processing any other commands. This might be useful in environments in which you do not trust (eg. don’t run your own servers) others with access to the host running redis-server
masterauth <master-password> – If the master is password protected (using the “requirepass” configuration directive above) it is possible to tell the slave to authenticate before starting the replication synchronization process, otherwise the master will refuse the slave request
slave-serve-stale-data <yes|no> – When a slave loses its connection with the master, or when the replication is still in progress, the slave can act in two different ways:
- still reply to client requests, possibly with out-of-date data (the default behavior if the switch is set to “yes”)
- or reply with an error “SYNC with master in progress” to all the kind of commands, except for to INFO and SLAVEOF (otherwise)
slave-read-only <yes|no> – You can configure a slave instance to accept writes or not. Writing against a slave instance may be useful to store some ephemeral data (because data written on a slave will be easily deleted after re-sync with the master anyway), but may also cause problems if clients are writing to it because of a misconfiguration
repl-ping-slave-period <seconds> – Slaves send PINGs to server in a predefined interval. It’s possible to change this interval with the repl_ping_slave_period option from CLI. The default value is 10 seconds.
repl-timeout <seconds> – This option sets a timeout for both Bulk transfer I/O timeout and master data or ping response timeout. The default value is 60 seconds. It is important to make sure that this value is greater than the value specified for repl-ping-slave-period otherwise a timeout will be detected every time there is low traffic between the master and the slave.
repl-disable-tcp-nodelay <yes|no> – Controls whether to disable TCP_NODELAY on the slave socket after SYNC. If you select “yes” Redis will use a smaller number of TCP packets and less bandwidth to send data to slaves. But this can add a delay for the data to appear on the slave side, up to 40 milliseconds with Linux kernels using a default configuration. If you select “no” the delay for data to appear on the slave side will be reduced but more bandwidth will be used for replication. Default value of “no” is an optimization for low latency, but in very high traffic conditions or when the master and slaves are many hops away, turning this to “yes” may be a good idea.
slave-priority <integer> – The slave priority is an integer number published by Redis in the INFO output. It is used by Redis Sentinel in order to select a slave to promote into a master if the master is no longer working correctly. A slave with a low priority number is considered better for promotion, so for instance if there are three slaves with priority 10, 100, 25 Sentinel will pick the one with priority 10, that is the lowest. However a special priority of 0 marks the slave as not able to perform the role of master, so a slave with priority of 0 will never be selected by Redis Sentinel for promotion.

Allowing writes only with N attached replicas (redis.io):

Starting with Redis 2.8 it is possible to configure a Redis master in order to accept write queries only if at least N slaves are currently connected to the master, in order to improve data safety.
However because Redis uses asynchronous replication it is not possible to ensure the write actually received a given write, so there is always a window for data loss.
This is how the feature works:
- Redis slaves ping the master every second, acknowledging the amount of replication stream processed.
- Redis masters will remember the last time it received a ping from every slave.
- The user can configure a minimum number of slaves that have a lag not greater than a maximum number of seconds.
- If there are at least N slaves, with a lag less than M seconds, then the write will be accepted.
There are two configuration parameters for this feature:
- min-slaves-to-write <number of slaves>
- min-slaves-max-lag <number of seconds>

Have a nice weekend!

mariuszprzydatek.com

Amazon AWS – SQS – Simple Queue Service

Displaying GIT build number (hash) in your REST API

Redis Replication

Welcome to my Blog on Software Engineering

Recent Posts

Subscribe to RSS

Follow Blog via Email

Archives

Categories

Recent reads

Mariusz Przydatek