awsdevops auto_scaling

Amazon Web Services (AWS) Auto-Scaling

Auto-scaling has always been a feature of Amazon Web Services (AWS). Until today, it could be done in 2 ways:

  • Using command line tool (See resources section for the link)
  • Using Elastic BeanStalk to deploy your application

Yesterday (10/12/2013) they announced they added Auto-Scaling support to AWS console. I was planning to create auto-scaling my blog anyway so I cannot think of a better time to apply this.

Auto-scaling using AWS Management Console

Step 01: Launch Configuration

First we tell AWS what we want to launch. This step is a lot like creating a new EC2 instance. First you select an AMI. So before I started I created an AMI of my current blog and selected that one for the launch configuration. Then we select the instance properties. In this wizard we have the option for using spot instances. They are not suitable for Internet-facing applications so I’ll skip that part.

Step 02: Auto Scaling Group

At the end of Launch Configuration wizard we can select create auto-scaling group with that launch configuration and jump right into Step 2. First we specify the name and the initial instance count for the group. Also we need to choose at least 1 availability zone. I always select all of them, I’m not sure if there is any trade-off with narrowing down your selection.

An important point to pay attention here is to expand Advanced Details section because it contains the load balancer selection. For web applications auto-scaling makes sense when the instances are behind a load-balancer. Otherwise new instances could not be reached anyway. Once you create the auto-scaling group you cannot associate it with an ELB so make sure you select your load balancer at this step.

Create Auto Scaling Group

After comes another important step: Specifying scaling policies. Basically, telling AWS the action to take when it needs to scale up or down and when to do it. “When” is defined by CloudWatch alarms. For scaling up, I added an alarm for average CPU utilization over 80% for 5 minutes and for scaling up CPU utilization under 20% for 5 minutes. When high CPU alarm goes off it will take the action we select, which in my case is adding 1 more instance. And scaling down is just the opposite: remove 1 instance from the existing machine farm.

Create Auto Scaling Group

On next step we define the notifications we want to receive when an AS event is triggered. I would definitely would like to know everything that happens to my machines so I requested an email for all events.

Create Auto Scaling Group

That’s all it takes to create an AS group using the wizard.

Testing the scaling

The easiest way to test auto-scaling group is to terminate the instance it just launched. As you can see below once I killed the instance it immediately launched another one to match the minimum number requirement of AS group. So auto-scaling group is working but how can I be sure that it will launch a new instance when I need it most. Time to make it sweat a little! But first we have to setup an environment to create load on the system:

Installing Siege

The easiest and simplest load testing tool I know is a Linux-based one called Siege. To prepare my simple load testing environment I quickly downloaded siege:

wget http://www.joedog.org/pub/siege/siege-latest.tar.gz

tar -xzvf siege-latest.tar.gz

It requires a C compiler which doesn’t come out-of-the-box with an Amazon Linux AMI. So first we need to install that:

yum install gcc*

And configure it by

./configure

At the end of the configuration it instructs us to run the following commands:

Siege configuration

So after running make Siege is ready to go. We can check the configuration by

/usr/local/bin/siege -C

It should display the current version and other details about the tool.

Siege Configuration

Ready to go

Now, we have a micro instance running Siege and a small instance launched by auto-scaling.

AWS Instances

The auto-scaling is supposed to launch another instance and add it to load balancer if the CPU usage is too high on the existing one. Let’s see if it’s really working.

Under Siege!

I first created a URL file from my sitemap so that the load can be more realistic. I fired up 20 threads and it started to bombard my site:

Siege in Action

When I try to load my site it was incredibly slow. The CPU usage kept rising on the single instance until the CloudWatch alarm went off. It triggered auto-scale to launch a new instance.

AWS Instances

Now, I had 2 instances to share the load but that could only happen if the new instance was added to the Elastic Load Balancer (ELB) automatically. After a few minutes it passed the health checks and went in service.

Auto-scaling using AWS Management Console - Elastic Load Balancer Overview

At this point I had 2 instances and when I tried to load posts from my blog I noticed it was quite fast again. The CPU usage graph below tells how it all went down:

Auto-scaling using AWS Management Console - CPU utilization

My first instance (orange) was running silently and peacefully until it was attacked by Siege. After a few minutes of hard times the cavalry came to rescue (blue instance) and started getting its fair share of the load. Then ELB distributed load as evenly as possible making the system running smoothly again. OK, so the system can withstand a spike and scale itself but it costs money. What’s going to happen after the storm. So I stopped Siege and sure enough, as we’d expect, after a few minutes Low CPU alarm kicked off and set the instance count back to 1 by terminating one of the instances.

AWS Instances

Also, I was notified in every step of this process. So that I could be able to keep track of my instances at all times.

Auto-scaling using AWS Management Console - Notifications

Architecture of the system

So at this point the architecture of the system looks like this:

Auto-scaling using AWS Management Console - System Architecture

I’m planning to cover some basics (EC2, RDS, S3) in more detail in a later post. Also I’ll try to add more AWS services and enhance this architecture as I go along.

Final Words

  • If you are planning to use auto-scaling in production environment make sure to backup all your stuff externally. Also create snapshots for all the volumes.
  • Even though network traffic is cheap it still costs. So for extended tests I suggest you keep an eye on your billing statement
  • In Amazon Linux AMI Apache and MySQL don’t start automatically so you may need to update your configuration like I did. I used the script I found here.

Resources

devops

DevOps (Development + Operations) is one of most popular terms in the IT world recently. From what I’ve read and listened to so far, my understanding is it is all about continuous deployment (or delivery). Basically, you have to automate everything from development to deployment to practice DevOps.

Current problem

Traditionally, successful deployment is a huge challenge. It is mostly a manual and cumbersome process. Because of its sensitive nature the system admins are not huge fans of deployments. Also, another challenge is the miscommunication (or no communication in some cases) between system admin and development teams. They are generally run by different high-level executives and their priorities conflict most of the time.

Solution

On the philosophical side, DevOps is bringing these teams together and work in harmony. Having social events with both teams’ attendance is a key to build confidence among team members. As Richard Campbell (from RunAsRadio and .NET Rocks podcasts) says “Pizza and beer is a global lubricant”.

Dev…

On the development side, the key requirement is continuous integration. You have to able to run unit tests and acceptance tests automatically on build servers. This means development has to be done in short sprints in an agile way with frequent check-ins. One step further of this stage is continuous deployment.

…Ops

This is where the IT team comes into play. When the whole system is automated, deploying to production frequently and without much headache becomes possible. Cloud computing is one of the core technologies that makes DevOps possible. Ability to manage virtual machines programmatically (i.e. AWS, OpenStack) leads to a whole bunch of possibilities.

This is a fairly complex topic encompassing many disciplines and technologies. Also it’s quite dynamic and open to innovation. Definitely worth keeping an eye on.

Resources

security ssl

I used to wonder what different key sizes meant when dealing with SSL. Also, I noticed that SSL certificate I had purchased said “128/256 bit encryption” in its feature list which only made me more confused. What does it actually mean and why should it use 128-bit if it supports 256 anyway? I checked the website that’s running on a Linux machine and saw that it used 256-bit encryption whereas another website of mine was running with 128-bit encryption. And I bought both certificates from the same vendor so it has to do something with the server.

What’s with the naming?

For the uninitiated, TLS is the new name for the protocol. SSL name was discontinued after version 3 and after that TLS 1.0 was released. As of this writing the latest version is TLS 1.2 which was released in 2008. So technically the name of the protocol is Transport Layer Security (TLS) but many people, including me, still refer to it as SSL.

Key Sizes

SSL Key Sizes

Basically the key size (2048 bit in the image) is the public/private key pair size. This size is determined when CSR is created for the certificate. This is what determines how vulnerable the key is to brute-force attacks. Currently 2048-bit is considered to be very strong.

128/256-bit is the length of the session key. A session key is generated during the handshake. A random data (of length 128 or 256 bit) is generated by the client and encrypted using the server’s public key. The server decrypts the message with its private key. Afterwards, server and client use this session key and use symmetric encryption. RSA keys are just used in the beginning of the communication.

Let’s see it in action

I might have had a better understanding after the research but I still I had to resolve my issue. I needed to see 256-bit encryption. Since this is a rather sensitive operation I wanted to test it on a completely expandable machine. So I created two new small instances running Windows 2008 and Windows 2012. I quickly installed the IIS to both instances and checked what they looked like. As I suspected they were using 128-bit out of the box.

SSL_Key_Sizes_Win2008_Before

SSL_Key_Sizes_Win2012_Before

The problem is AES-256 option is not high in the list in the cipher suite that the server supports. This requires some registry update and group policy changes. Normally all these have to be done manually. You can find a resource below that explains how to do it (I haven’t tested it myself). Instead, I decided to use a tool which makes the whole process a lot easier and less error-prone. It’s called IISCrypto.

IIS Crypto

I just downloaded the tool and ran the best practices option. Restarted the server and here are the results:

SSL_Key_Sizes_Win2008_After

SSL_Key_Sizes_Win2012_After

Windows 2012 version prioritize TLS 1.2 over TLS 1.0 so it uses the newer version of the protocol even the browser I used was the same for both tests.

Resources