Development comments

If you have multiple accounts for git providers (i.e multiple accounts on Github and/or Bitbucket) then you’d need to update your SSH configuration to be able to access all your repositories seamlessly. Of course you can use HTTPS but then you’d have to enter username and password every time.

If you don’t specify which key to use for each account SSH agent will try the default key if there is one (id_rsa) and will most likely fail if you didn’t grant access to that key in your git provider settings.

SSH Access Denied

To resolve the issue you need to create a config file under .ssh folder that looks like this:

SSH Config

If you are just using one account per provider you don’t need to create multiple keys, you can just use id_rsa for both accounts. But if you have multiple accounts for a provider you’d need a key for each account. In the example configuration above I used a new key for BitBucket anyway. After creating the config file and adding the keys to your accounts you can start cloning repositories from various sources.

Final step to accomplish this is to use the hostname you set in the config file when cloning the repository. For example when you copy the SSL clone URL it looks something like this: git@github.com:{account name}/{repository name}.git. So let’s say it’s a repository from the corporate account in my sample config file then I’d have to modify the URL as follows: git@github-corporate:{account name}/{repository name}.git so that the correct host name and RSA file can be used.

SSH Success

Resources

DevelopmentLegoMindstormsEV3 comments

I had my eye on the new Lego Mindstorms set for a while. Finally I decided to order it from Amazon. It’s still a bit pricey but I think it’s worth it. I read nice review of EV3 here which also includes comparisons to the previous generation of the Mindstorms kit.

Lego Mindstorms EV3

Programming EV3: The Official Way

Programming the kit is very easy using Lego’s official graphical programming tool6. You just have to drag and drop the components and fiddle with the parameters. Check out the following very basic application:

Lego Mindstorms EV3

It powers the motors connected to ports B and C. It keeps doing that in a loop as long as the value read from Infrared Sensor is larger than 20. If there is an object closer to 20 centimetres it breaks and ends the program. And here’s the output in action:

Programming EV3: The .NET Way

The .NET API is an open-source project and code can be found on CodePlex. I recommend watching the introductory video which shows the basics. It shows how to move the robot by sending direct commands to turn the motors and how to read values from the sensors. The following part the test application shows the event handlers for the direction buttons and setup code to connect to the brick.

Lego Mindstorms EV3

I also added a similar implementation of the NXT-G program above. It’s a while loop which breaks when the value from the IR sensor is lower than 10.

Lego Mindstorms EV3

And voila! Here is a clever robot that senses the object in front of it and avoids the collision by stopping!

You can find the source code for the test application on GitHub

Tips & Tricks

  • When I first implemented the NXT-G equivalent version the sensor value wasn’t updating properly. I checked the discussion forums in the CodePlex project page and found out that other people were having a similar issue. A workaround was adding the Thread.Sleep(10) line. After that I could read the updated sensor values without any problems. Although it doesn’t feel like the right solution it works fine as a temporary workaround.

  • During the testing and debugging I managed to crash the Lego brick a few times. First I feared I actually “bricked” the brick but luckily a reset resolved it. Resetting the brick is not obvious though, I had to check the manual for that. So in case you need to reset it you have to hold down Back, Center and Left buttons. Then release the Back button when the screen goes blank and release the other two when the screen says “Starting” as shown in EV3 User Guide

Resources

  1. Lego EV3 review and comparison to NXT 2.0
  2. API code on CodePlex
  3. Channel9 video on programming EV3 with .NET API
  4. Sample Track3r robot project and building instructions
  5. EV3 User Guide
  6. Lego software
  7. Codeplex discussion on IR sensor not updating values
  8. Source code for the test application

Development comments

My previous post was about grid-based clustering. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) takes another approach called density-based clustering. It grows regions with high density (above threshold provided) into clusters and discovers clusters of arbitrary shape.

DBSCAN Implementation

First off the data is loaded and a distance matrix is calculated based on the data points.

DBSCAN 1

The algorithm visits every data point and finds its neighbours. If the neighbours are dense enough than the cluster is expanded to include those points as well.

DBSCAN 3

So data points that are close enough to each other are included in the same cluster.

DBSCAN 2

Joining dense clusters is a similar approach taken in grid clustering. The difference is this way the clusters can have arbitrary shapes.

Resources

Development comments

CLIQUE (CLustering In QUEst) algorithm is a grid-based clustering algorithm that partitions each dimension of the dataset as a grid.

CLIQUE Implementation

The algorithm starts by assigning each data point to the closest grid in the matrix.

CLIQUE 1

After loading the data it looks something like this:

CLIQUE 2

In order for a grid cell to be considered as “dense” it has to have data points more than or equal to the threshold. In this sample the threshold was provided as 2. So after the adjacent dense cells are joined together we get 3 clusters:

CLIQUE 3

Resources

Development comments

A rough set is an approximation of a set that gives the lower and upper borders of the set.

Rough set implementation

The sample implementation starts off with reading the data set and the configuration file. The object and attribute index lists are acquired from the configuration. The items that have indices specified in the object list are considered as the set of objects and the rest of the items are the complement of the data set.

RoughSet 1

If a given index in attribute index list equal in both object list and complementary object list are not equal the object is considered to belong to the negative border, if not it belongs to the positive border.

Sample Output

RoughSet Sample Output

Resources

Development comments

Genetic algorithm sounds fascinating to me as in a way it mimics evolution. It’s quite easy to implement and understand.

Genetic Algorithm

The algorithm has 3 stages:

  • Selection: Every generation is subjected to a selection process which eliminates the unfit ones.
  • Cross-over: This happens randomly among the selected generation and some genomes swap genes.
  • Mutation: Some genes are mutated randomly based on a given threshold value for mutation.

Implementation

The implementation starts with generating random bit sequences to represent genomes. The length of the genome and the number of genomes in a generation are specified in the configuration.

Genetic Algorithm 1

First generation is calculated randomly and their fitness is calculated based on the fitness function the system uses. After the groundwork has been done we have a genome pool each with their own fitness values and selection ranges. The width of the selection range is determined by the fitness of the genome. So that the larger the range the more likely they are selected for the next generation.

Genetic Algorithm 2

In selection process, a random value is generated and the genome whose selection range includes that value is selected. Same genome can be selected more than once.

Genetic Algorithm 3

After the next generation is selected and their new fitness values are calculated cross-over starts. In cross-over 2 random genomes are picked and they swap genes. The point in the genome string is specified in the configuration (3 in this example)

Genetic Algorithm 4

The final phase of the algorithm is mutation. In this step a number of genomes are picked and some bits is flipped randomly.

Genetic Algorithm 5

The most important thing about this algorithm is having a good fitness function that can accurately compute the fitness values for a given feature.

Resources

Development comments

Another data mining algorithm: AGNES (Agglomerative Nesting)

AGNES Algorithm

AGNES takes a different approach than k-means which was the subject of my previous post. Initially it considers all data points as a separate cluster.

AGNES 1

Then finds the minimum distance between clusters and merges the closest clusters:

AGNES 2

The resulting cluster is added to the all cluster list the merged clusters are removed as they are no longer valid.

Resources

Development comments

I’m keeping on reviving my old projects. This is the second data mining algorithm implementation. It is another clustering algorithm called k-means.

k-means Algorithm

Algorithm groups and creates k clusters from n data points. First the cluster centres are picked randomly from the data points. Then the entire dataset is iterated and all points are assigned to their closest cluster. Closest cluster is determined by measuring the distance of the data point to the centroid of the clusters. This process is repeated until there is no change in the dataset and all points are assigned to the closest ones.

K-means Results

Implementation

The project contains 6 libraries:

  • VP.KMeans.Core: Core library including the algorithm implementation
  • VP.KMeansClient.GUI: User interface for entering the parameters and plotting the clusters
  • VP.KMeansClient.Console: Console user interface. No fancy plots, just an output file is generated
  • VP.KMeans.DataGenerator.Core: Library to generate test data
  • VP.KMeans.DataGenerator.Console: Console application to feed the core library to generate test data
  • CPI.Plot3D: External library to plot the results

Resources

Development comments

I was digging through some old projects and found out the Data Mining and Machine Learning projects I implemented. Instead of letting them gather dust(!) in the hard disk I decided to review and publish the source code. This will also give me a chance to revise my data mining knowledge. Let’s start with Apriori algorithm.

Apriori Algorithm

It is an algorithm to determine to most frequent items in a given collection. The term “frequent” is defined by a given threshold or “minimum support count”.

Implementation

The project is implemented in Java. It uses a Hashtable to keep count of itemsets in the database. The algorithm starts with finding the most frequent 1-item sets first. Then using the previous most frequent item list of item size k, it generates candidate item list of size k+1.

For example, for the sample data below and a threshold 4, 1,2,3,4,5 are all frequent 1-itemsets. From this list we generate a 2-item candidate list (all 10 combinations) and check if the subsets are also frequent. For 1-itemsets they are all frequent so they all pass pruning. Then we count the occurences of these candidates. Only 7 of them are equal to or greater than the threshold. From this list we generate our 3-item candidates. Such as 1,2 and 1,4 combined to generate 1,2,4. Then we count the occurrence of 3-itemsets and prune the results by checking all of its subsets.

Apriori Check Subsets

The idea of pruning is if the there are some infrequent subsets inside an itemset then the larger set cannot be frequent so it is removed from the candidate list. (1,2,3 is a candidate but as 2,3 is not a frequent 2-itemset it is removed from the candidate list) This process helps improve the performance of the algorithm as it reduced the number of iterations.

Output

1,3,4,5

1,2,4,5

1,2,4

1,3

3

1,5

1

3

1,3,5

4

1,2,4

2

1,2

3,4

3,5

1,3,4

2

3,5

1,2,3,4

Results:

Apriori Results

Source Code

I created a public repository for the source code. You can check it out here if it tickles your fancy!

Resources

reviewgadget comments

Fitbit Flex

I bought this about 2 months ago. I wore it every single day since then and I just loved it! It is basically a motion sensor that detects and keeps track of your daily movements. You can set your own daily goals steps you walked, distance you took or calories you burnt.

I can keep track of distance by using my Garmin ForeRunner 10 (which I reviewed here) but this one is easier to use because it does everything in the background. Garmin takes a few minutes to start because it needs to find your GPS coordinates but that’s not the case for Flex.

Flex lets you keep track of distance, active minutes, calories and steps.

Fitbit Goals

Also you can log your weight, other exercises and food intake so that you can calculate the net calories throughout a time period.

Another great feature about it is tracking your sleep quality. You can use this data in conjunction with your daily activities.

Fitbit Sleep

And here is my favourite feature: Alarm! It turns out if something on your wrist start to vibrate you wake up. Instantly! Of course I keep my phone’s alarm still running as a fallback method but this one works pretty good.

Conclusion

I bought it for £68 and as of now it is listed as £83 on Amazon. Apparently the price fluctuates a bit but I think £70 – 80 price range is good for this product. I charge once every 3 days or so. Other than that I completely forget about it while it does its job in the background. It motivates you to reach your goals and be more active in general and the silent alarm is absolutely fantastic. I’d recommend this to anyone who would like to have more exercise.

Resources

reviewbook comments

Framework Design Guidelines

Fitbit Flex

Having a common framework is quite important to reduce code reuse. But designing that framework properly and consistently is a huge challenge. Even though we are living in a RESTful world now, I think having a framework or a set of common libraries for personal or commercial projects is still relevant. A well-designed well-tested framework would significantly improve any application built on top of it.

I had referred to this book partially before but this time decided to read it from cover to cover and make sure I “digest” all. It contains countless gems that every developer should know. Anyone developing even small libraries can benefit from this book a lot. You don’t need to design the .NET framework (like the authors). Also it comes with a DVD full of presentations of authors.

Companion DVD

Unfortunately I lost my DVD. Probably it’s inside one of the many CD cake boxes. I was hoping to check it out as I went along the book. But luckily I found out that it is freely available from the publisher. Check you the download link in the resources section. One thing to beware about the download is that you can come across another link in the Brad Abrams’s blog here. The download works fine but one of the presentations inside it is corrupted so I suggest you download each section separately from the site in the resources.

Some notes

The book is full of gems and very useful tips. Here are just a few:

  • Keep it simple: “You can always add, you cannot ever remove”
  • There is no perfect design: You always have to make some sacrifices and consider the trade-offs
  • Well-designed frameworks are consistent
  • Scenario-driven design: Imagine scenarios to visualize the real-world use of the API When in doubt leave the feature out, add it later. Conduct usability studies to get developers’ opinions
  • Keep type initialization as simple as possible. It lowers the barrier of entry.
  • Throw exceptions to communicate the proper usage of API. It makes the API self-documenting and supports the learning-by-doing approach.
  • Going overboard with abstractions may deteriorate the performance and usability of the framework.

Even though it’s been a few years since this books was released it is still a very helpful resource.

Resources

Amazon S3

I have two AWS accounts and I made a mistake by using mixing the usage of services. More specifically, I hosted an application on one account but used S3 on the other. So I perpetually had to switch back and forth between accounts to access all services I used. First I thought fixing it would be a non-issue but it proved to be a rather daunting task.

Bucket naming in S3

In S3, all buckets must have unique names. You cannot use a name if it’s already taken (much like domain names). Since I was using the bucket already, creating the same bucket in the other account and copying its contents was not an option. The second idea was to create the target bucket with a temporary name, copy the contents, delete the first one and rename the target bucket. Well, guess what? You cannot rename a bucket either! Another problem is when you delete a bucket you can create a new one with the same name right away. I’m guessing this is because of the redundancy S3 provides. It takes time to propagate the operation to all the nodes. My tests showed that I could re-create the bucket in the other account only after 45 – 50 minutes.

To develop or not to develop

My initial instinct was to develop a tool to handle this operation but I decided to check out the what’s already available. I was occasionally using Cloudberry but wanted to check its competitors hoping one of the tools would support the functionality I need.

Cloudberry Explorer for Amazon S3

I find this tool quite handy. It has lots of functions and a nice intuitive. It comes in flavours: Free and Pro version. I used free version so far and unless you are a big enterprise it seems sufficient. It allows you to manage multiple AWS account. It allows copying objects among accounts but not moving a bucket (actually after my findings above I wasn’t very hopeful anyway)

AmazonS3 CloudBerry Main

As you can see in the menu bar, it supports lots of features.

S3 Browser

This one comes with a free version too as well as a paid version. The free version is limited to 2 accounts and you can only see one account at a time.

S3 Browser

I tried to copy a file and paste to another but it got an Access Denied error. I could do the same thing with Cloudberry in seconds by simply dragging and dropping to the target folder.

Bucket Explorer

Third candidate only has a 30-day trial version as opposed to a free one. The second I installed it I knew it was a loser for me because it doesn’t support multiple accounts. Also as you can see below the UI is hideous so this is not a tool for me.

Bucket explorer

..and the winner is

Cloudberry won by a landslide! It looks much more superior than both of the other tools combined.

Operation Bucket Migration

So I backed up everything locally and deleted the source bucket so that I could create the same one in the new account. After periodically checking for 45 minutes I finally created the bucket and uploaded the files. Set the permissions and the operation was completed without any casualties.. Well, at least I thought that was the case..

Nobody is perfect!

After I uploaded the images I reloaded my blog. The first image re-appeared and I was ready for the celebrations which were abruptly interrupted by the missing images in the second post. The images were nowhere to be found locally in none of the two backups I took. I think Cloudberry has a bug when handling filenames with hyphens. I’m still not certain that is the case but that’s the only characteristic that differs from the other files. Anyway, the moral of the story is triple-check everything before you’re initiating a destructive process and don’t trust external tools blindly.

Resources

Amazon Web Services (AWS) Auto-Scaling

Auto-scaling has always been a feature of Amazon Web Services (AWS). Until today, it could be done in 2 ways:

  • Using command line tool (See resources section for the link)
  • Using Elastic BeanStalk to deploy your application

Yesterday (10/12/2013) they announced they added Auto-Scaling support to AWS console. I was planning to create auto-scaling my blog anyway so I cannot think of a better time to apply this.

Auto-scaling using AWS Management Console

Step 01: Launch Configuration

First we tell AWS what we want to launch. This step is a lot like creating a new EC2 instance. First you select an AMI. So before I started I created an AMI of my current blog and selected that one for the launch configuration. Then we select the instance properties. In this wizard we have the option for using spot instances. They are not suitable for Internet-facing applications so I’ll skip that part.

Step 02: Auto Scaling Group

At the end of Launch Configuration wizard we can select create auto-scaling group with that launch configuration and jump right into Step 2. First we specify the name and the initial instance count for the group. Also we need to choose at least 1 availability zone. I always select all of them, I’m not sure if there is any trade-off with narrowing down your selection.

An important point to pay attention here is to expand Advanced Details section because it contains the load balancer selection. For web applications auto-scaling makes sense when the instances are behind a load-balancer. Otherwise new instances could not be reached anyway. Once you create the auto-scaling group you cannot associate it with an ELB so make sure you select your load balancer at this step.

Create Auto Scaling Group

After comes another important step: Specifying scaling policies. Basically, telling AWS the action to take when it needs to scale up or down and when to do it. “When” is defined by CloudWatch alarms. For scaling up, I added an alarm for average CPU utilization over 80% for 5 minutes and for scaling up CPU utilization under 20% for 5 minutes. When high CPU alarm goes off it will take the action we select, which in my case is adding 1 more instance. And scaling down is just the opposite: remove 1 instance from the existing machine farm.

Create Auto Scaling Group

On next step we define the notifications we want to receive when an AS event is triggered. I would definitely would like to know everything that happens to my machines so I requested an email for all events.

Create Auto Scaling Group

That’s all it takes to create an AS group using the wizard.

Testing the scaling

The easiest way to test auto-scaling group is to terminate the instance it just launched. As you can see below once I killed the instance it immediately launched another one to match the minimum number requirement of AS group. So auto-scaling group is working but how can I be sure that it will launch a new instance when I need it most. Time to make it sweat a little! But first we have to setup an environment to create load on the system:

Installing Siege

The easiest and simplest load testing tool I know is a Linux-based one called Siege. To prepare my simple load testing environment I quickly downloaded siege:

wget http://www.joedog.org/pub/siege/siege-latest.tar.gz

tar -xzvf siege-latest.tar.gz

It requires a C compiler which doesn’t come out-of-the-box with an Amazon Linux AMI. So first we need to install that:

yum install gcc*

And configure it by

./configure

At the end of the configuration it instructs us to run the following commands:

Siege configuration

So after running make Siege is ready to go. We can check the configuration by

/usr/local/bin/siege -C

It should display the current version and other details about the tool.

Siege Configuration

Ready to go

Now, we have a micro instance running Siege and a small instance launched by auto-scaling.

AWS Instances

The auto-scaling is supposed to launch another instance and add it to load balancer if the CPU usage is too high on the existing one. Let’s see if it’s really working.

Under Siege!

I first created a URL file from my sitemap so that the load can be more realistic. I fired up 20 threads and it started to bombard my site:

Siege in Action

When I try to load my site it was incredibly slow. The CPU usage kept rising on the single instance until the CloudWatch alarm went off. It triggered auto-scale to launch a new instance.

AWS Instances

Now, I had 2 instances to share the load but that could only happen if the new instance was added to the Elastic Load Balancer (ELB) automatically. After a few minutes it passed the health checks and went in service.

Auto-scaling using AWS Management Console - Elastic Load Balancer Overview

At this point I had 2 instances and when I tried to load posts from my blog I noticed it was quite fast again. The CPU usage graph below tells how it all went down:

Auto-scaling using AWS Management Console - CPU utilization

My first instance (orange) was running silently and peacefully until it was attacked by Siege. After a few minutes of hard times the cavalry came to rescue (blue instance) and started getting its fair share of the load. Then ELB distributed load as evenly as possible making the system running smoothly again. OK, so the system can withstand a spike and scale itself but it costs money. What’s going to happen after the storm. So I stopped Siege and sure enough, as we’d expect, after a few minutes Low CPU alarm kicked off and set the instance count back to 1 by terminating one of the instances.

AWS Instances

Also, I was notified in every step of this process. So that I could be able to keep track of my instances at all times.

Auto-scaling using AWS Management Console - Notifications

Architecture of the system

So at this point the architecture of the system looks like this:

Auto-scaling using AWS Management Console - System Architecture

I’m planning to cover some basics (EC2, RDS, S3) in more detail in a later post. Also I’ll try to add more AWS services and enhance this architecture as I go along.

Final Words

  • If you are planning to use auto-scaling in production environment make sure to backup all your stuff externally. Also create snapshots for all the volumes.
  • Even though network traffic is cheap it still costs. So for extended tests I suggest you keep an eye on your billing statement
  • In Amazon Linux AMI Apache and MySQL don’t start automatically so you may need to update your configuration like I did. I used the script I found here.

Resources

DevOps (Development + Operations) is one of most popular terms in the IT world recently. From what I’ve read and listened to so far, my understanding is it is all about continuous deployment (or delivery). Basically, you have to automate everything from development to deployment to practice DevOps.

Current problem

Traditionally, successful deployment is a huge challenge. It is mostly a manual and cumbersome process. Because of its sensitive nature the system admins are not huge fans of deployments. Also, another challenge is the miscommunication (or no communication in some cases) between system admin and development teams. They are generally run by different high-level executives and their priorities conflict most of the time.

Solution

On the philosophical side, DevOps is bringing these teams together and work in harmony. Having social events with both teams’ attendance is a key to build confidence among team members. As Richard Campbell (from RunAsRadio and .NET Rocks podcasts) says “Pizza and beer is a global lubricant”.

Dev…

On the development side, the key requirement is continuous integration. You have to able to run unit tests and acceptance tests automatically on build servers. This means development has to be done in short sprints in an agile way with frequent check-ins. One step further of this stage is continuous deployment.

…Ops

This is where the IT team comes into play. When the whole system is automated, deploying to production frequently and without much headache becomes possible. Cloud computing is one of the core technologies that makes DevOps possible. Ability to manage virtual machines programmatically (i.e. AWS, OpenStack) leads to a whole bunch of possibilities.

This is a fairly complex topic encompassing many disciplines and technologies. Also it’s quite dynamic and open to innovation. Definitely worth keeping an eye on.

Resources

EncryptionSecurity comments

I used to wonder what different key sizes meant when dealing with SSL. Also, I noticed that SSL certificate I had purchased said “128/256 bit encryption” in its feature list which only made me more confused. What does it actually mean and why should it use 128-bit if it supports 256 anyway? I checked the website that’s running on a Linux machine and saw that it used 256-bit encryption whereas another website of mine was running with 128-bit encryption. And I bought both certificates from the same vendor so it has to do something with the server.

What’s with the naming?

For the uninitiated, TLS is the new name for the protocol. SSL name was discontinued after version 3 and after that TLS 1.0 was released. As of this writing the latest version is TLS 1.2 which was released in 2008. So technically the name of the protocol is Transport Layer Security (TLS) but many people, including me, still refer to it as SSL.

Key Sizes

SSL Key Sizes

Basically the key size (2048 bit in the image) is the public/private key pair size. This size is determined when CSR is created for the certificate. This is what determines how vulnerable the key is to brute-force attacks. Currently 2048-bit is considered to be very strong.

128/256-bit is the length of the session key. A session key is generated during the handshake. A random data (of length 128 or 256 bit) is generated by the client and encrypted using the server’s public key. The server decrypts the message with its private key. Afterwards, server and client use this session key and use symmetric encryption. RSA keys are just used in the beginning of the communication.

Let’s see it in action

I might have had a better understanding after the research but I still I had to resolve my issue. I needed to see 256-bit encryption. Since this is a rather sensitive operation I wanted to test it on a completely expandable machine. So I created two new small instances running Windows 2008 and Windows 2012. I quickly installed the IIS to both instances and checked what they looked like. As I suspected they were using 128-bit out of the box.

SSL_Key_Sizes_Win2008_Before

SSL_Key_Sizes_Win2012_Before

The problem is AES-256 option is not high in the list in the cipher suite that the server supports. This requires some registry update and group policy changes. Normally all these have to be done manually. You can find a resource below that explains how to do it (I haven’t tested it myself). Instead, I decided to use a tool which makes the whole process a lot easier and less error-prone. It’s called IISCrypto.

IIS Crypto

I just downloaded the tool and ran the best practices option. Restarted the server and here are the results:

SSL_Key_Sizes_Win2008_After

SSL_Key_Sizes_Win2012_After

Windows 2012 version prioritize TLS 1.2 over TLS 1.0 so it uses the newer version of the protocol even the browser I used was the same for both tests.

Resources

DevelopmentNOSQLProgramming comments

I updated my toy project. You can find the source code and live demo for the final version below:

Source Code: https://github.com/volkanx/BeerExplorer

If you don’t want to bother deploying it without first seeing what it looks like, here’s a screenshot:

Beer Explorer

It’s just a simple exercise to browse Couchbase repositories. It was helpful for me and I hope you find it helpful too.

Cloud Computing comments

**It’s been a while since I’ve started using Amazon Web Services (AWS) to host my sites. I think it’s a great platform as you only pay for what you use and there are lots of options. And the best part is anything you can do via their user interface (and more) can be done programmatically via their API. I’m extremely happy using AWS but still I wanted to see what its competitors are doing.

Enter RackSpace

RackSpace

So I decided to test RackSpace first. One reason for selecting it is that it has a data centre in London (the closest AWS data centre to UK is in Dublin). Also it is based on OpenStack platform which I wanted to play with for some time. I created my free account but it needs to be activated after you receive a call from a staff member. He just asked basic questions like my username and the reason I created the account. After the call the account was activated and I was ready to explore this new land.

Servers

First Impressions

This is still a work in progress actually, I cannot say I have fully covered everything about it. Here are just my first impressions and comparisons with AWS:

Pricing & Billing

Maybe I’m cheap but my first order of business was compare the prices! The cheapest Linux configuration starts from £0.030/hr. You can find the entire list here. As the site I’m planning to migrate didn’t need much resources I decided to go with the cheapest one: 1GB RAM, 1vCPU, 20GB SSD. After the migration I’m quite happy with its performance.

One interesting thing I noticed is, unlike AWS, you pay for the machine even if you stop it. Excerpt from a documentation says “Shutting down a server will NOT stop billing, since the virtual hard drives are persistent, server resources are always in use whether the servers is powered on or not.” Now that’s not cool! Actually if you are running web-based systems you never stop the machines anyway. But there are many times I preferred to keep the old machine stopped for a period until the new machine proves to functioning fully for example. It’s nice to have the chance to rollback easily if need be. Of course you can do it here too, but you just have to pay twice as much during that period.

Features

When trying to configure the machine I noticed there isn’t a feature like Security Groups of AWS. I had to update the iptables configuration on the machine. Which would make it hard to manage firewall rules in a multi-machine environment. In AWS you just add the new machine to an existing security group and forget about it because all the existing rules are applied to the new one automatically.

Programmability and API

OpenStack

Even though I haven’t developed anything for it yet, I just wanted to see what are our capabilities and how would I develop something when I needed. All I needed to do was get the NuGet package and I was ready to get the list of my machines in a a few minutes. Basically you can manage machines, images, volumes pretty much like AWS. I’ll put a pin to it for now and develop some tools for myself later.

Program

Conclusion

I think the best thing about RackSpace is that it is built on top of OpenStack. This means if you your system to another vendor your applications using the API can remain intact. Also as it is open source software you can build your own data centre if you wanted to. Of course it sounds good to geek ears but I guess in real world it doesn’t have much value as such migration of systems are quite often. Other than that I didn’t see any advantages over AWS but I’ll keep the machine running for a while and see how it goes.

Resources

Site news comments

I decided to switch to FeedBurner to keep better track of my RSS feed. The new address is http://feeds.feedburner.com/PlaygroundForTheMind.

Hopefully current link will be redirected automatically. (Well not exactly automatically, I installed FD FeedBurner plugin to take care of that).

If it doesn’t work it’s likely that current subscribers are not going to receive this update via RSS but I thought a notification post wouldn’t hurt anyway.

Big DataCertificationNOSQL comments

Online education sites have around for some time now. One of my favourites, Udacity, has recently started a new series of courses: Data Science and Big Data Track. Big Data is a fascinating subject and I’ve been wanting to learn more about it. But so far my introductions were generally short lived. This time I intend to finish all these courses and have at least a guided tutorial. Their first course in this track is Introduction to Hadoop and MapReduce.

Hadoop

Hadoop Logo

Named after the main developer’s child’s toy’s name, Hadoop is an open-source framework based on MapReduce that can run distributed data-intensive tasks. It has its own file system called Hadoop distributed file system (HDFS). It handles data redundancy by dividing the data into 64MB chunks and storing several copies of them (3 copies by default).

MapReduce

A programming model first developed at Google. It consists of 2 steps: Map and Reduce. Map function takes the input data and divides it into smaller datasets. In Reduce function takes the sub-problems as input and calculates the final output.

Udacity Course

The course they are offering is very concise and to-the-point. It doesn’t take too long to finish. It’s instructors are employees of Cloudera and they do a very good job in explaining the basic concepts in simple terms. Also, in the course they provide a download to a virtual machine fully loaded with Hadoop and tools. It also contains the example datasets and code they use throughout the course so it makes it quite easy to practice on your own.

Final Project

Final project was fun to implement. It’s based on the examples so you can develop on top of the code shown in the class. I submitted my answers to GitHub Gist. If you’re interested they’re available here. Files are named with “_xy” prefix where x is the project number (there are two parts for the final project) and y is the quesiton number.

Udacity Certification

I’m also curious about their new certification model. I haven’t enrolled to any of their paid programs. Basically the courses are still free to enroll but with paid program you have a dedicated tutor who reviews your code and gives you feedback. Also there is an exit interview and if you pass you get a verified certification. I’m not sure how that interviews is going to be conducted though. It’s not cheap ($150/month) though. You still work at your own pace but since you’re paying for it probably you’d want to finish it as soon as possible.

Resources

DevelopmentGadgetLeap Motion comments

TicTacToe

Like most people I got my hopes high when ordering this gizmo and again like most people I was disappointed by it. It’s not quite the mouse-replacement as I hoped it would be. Anyway, I mostly bought it to develop applications using it. It comes with an SDK and libraries for .NET so I cannot complain much about that. I wanted to develop something simple just to get the grasp of it. Recently PluralSight published a course for Leap Motion development and I thought it was a great chance to start my own little app: Tic-Tac-Toe. The course was very helpful and I’d recommend it as a starting point for Leap Motion development.

So there is still work needed on my TicTacToe but you can find below a sneak preview of the current version.

Basically it does what it’s supposed to do at the moment: draw things on screen using your finger! So I think I accomplished what I set out for. What I want to add is a custom gesture for X. Circle gesture is built-in to SDK so drawing circles is easy. But I implemented ScreenTap gesture for playing Xs which is not intuitive obviously. Also it requires precision because it’s not quite easy to target a cell while tapping. If you watched the video you may have noticed I missed the cell for Xs second move for example. So that would be the most improvement I can make apart from the basic things like player info, statistics, undo moves etc. But as they are not directly related to Leap Motion development they are not very important in this context.

Resources