csharp, development, javascript, aws, route53 comments edit

Part 1: Getting supported TLD List

Around this time last year Amazon announced the Route 53 update which allowed domain registrations.

Domain search in Route 53

Unfortunately, the AWS Management Console UI doesn’t allow searching all TLDs at once for a given domain. So I set out to write a console application in C# that retrieved the TLDs from AWS API in order to “enhance” AWS a bit. As all of my projects, the scope got out of hand very quickly so I decided to break this adventure into smaller parts. In this post I’ll talk about how to get the supported TLD list.

AWS: Everything is API-based, right? Not quite!

The first step to a domain checker application is to acquire a list of TLDs to check. So I started exploring AWS API to get the supported TLD list. To my surprise there wasn’t any! I asked the question in AWS support forums and the response is it’s not supported. There is a webpage that has the list but that’s it! So until they make it available I decided to do a little scraping to generate the list myself. Obviously this is not the proper way of doing it and it’s prone to breaking very easily but they gave me no choice!

First method: C# Library

It’s just one method and it was easy to implement (apart from fiddling with regular expressions!)

public List<Tld> GetSupportedTldList()
{
    Uri supportedTldPage = new Uri("http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/registrar-tld-list.html");
    string pageHtml = string.Empty;
    using (var webclient = new WebClient())
    {
        pageHtml = webclient.DownloadString(supportedTldPage);
    }

    string pattern = "<a class=\"xref\" href=\"registrar-tld-list.html#.+?\">[.](.+?)</a>";
    var matches = Regex.Matches(pageHtml, pattern);

    var result = new List<Tld>();
    foreach (Match m in matches)
    {
        result.Add(new Tld() { Name = m.Groups[1].Value });
    }

    return result;
}

It first downloads the raw HTML from the AWS documentation page. The TLDs on the page have a link in this format:

<a class="xref" href="registrar-tld-list.html#cab">.cab</a> 

The regular expression pattern retrieves all anchors in this format. Only the ones that contain a dot in the value are retrieved to eliminate irrelevant links. To extract the actual TLD from this matching string I used parenthesis to create a group. By default the whole matched string is the first group. Additional groups can be created by parenthesis so that we can only get the values we are interested in. For example the group values for a match look like this

m.Groups[0].Value: "<a class=\"xref\" href=\"registrar-tld-list.html#academy\">.academy</a>"
m.Groups[1].Value: "academy"

Adds these extracted values to a list (for now Tld class only contains Name property, more info like type of TLD or description can be added in the future)

Second method: JavaScript

It’s not very complicated with C# to accomplish this task so I thought it would be even easier with JavaScript as I already tackled the regular expression bit. I couldn’t be wronger!

I created a simple AJAX call with jQuery and got the following error:

Apparently you cannot get external resources willy nilly using jQuery! It took some time but I found a neat workaround here It’s essentially a jQuery plugin sending the request to query.yahooapis.com instead of the URL we request. Apparently Yahoo API returns the results to a callback method we provide. I hadn’t heard about YQL before. The pitch on their site is:

"The YQL (Yahoo! Query Language) platform enables you to query, filter, and combine data across the web through a single interface." 

so a bit explains why they allow Cross-domain queries.

In the test page I added the plugin script

<script-- src="jquery.xdomainajax.js"></script>

Then the main function using jQuery worked just fine:

var supportedTldPage = "http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/registrar-tld-list.html";

function getTldList(callback) {
    $.ajax({
        url: supportedTldPage,
        type: 'GET',
        success: function (res) {
            var pageHtml = res.responseText;
            var jsonResult = parseHtml(pageHtml);
            callback(jsonResult);
        }
    });
}

Another thing I tried was using pure JavaScript:

function getTldList(callback) {
    var request = new XMLHttpRequest();
    request.open("GET", supportedTldPage, true);
    request.onreadystatechange = function () {
        if (request.readyState != 4 || request.status != 200) return;

        var pageHtml = request.responseText;
        var jsonResult = parseHtml(pageHtml);
        callback(jsonResult);
    };
    request.send();
}

This version doesn’t work by default because of the same CORS restrictions. But apparently there’s an extension for that. I installed it and it worked like a charm. Wondering what was happening behind the scenes I captured the request with Fiddler and it looked like this:

Host: docs.aws.amazon.com
Connection: keep-alive
Cache-Control: max-age=0
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36
Origin: http://evil.com/
Accept: */*
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,tr;q=0.6
If-None-Match: "19ce8-51b1a64cfe880-gzip"
If-Modified-Since: Fri, 17 Jul 2015 23:17:38 GMT

Origin was “http://evil.com”! Works great as long as it’s not null! In the sample code I’ll leave both versions. Of course all this hassle is to bypass browser restrictions. Depending on your use case, you can always fire the HTTP request above in Fiddler and get the results. I guess it’s a nice mechanism if you need to implement a closed system and want to ensure that resources are not consumed from the outside. Of course for public web pages like this which are meant to be accessed from anywhere in the world there are no restrictions on the server side.

So that now we can get the raw HTML from AWS documentation, the rest is smooth sailing. The part that handles finding the TLDs in the HTML is very similar:

function parseHtml(pageHtml) {
    var pattern = /<a class="xref" href="registrar-tld-list.html#.+?">[.](.+?)<\/a>/g;
    var regEx = new RegExp(pattern);

    var result = {};
    result.url = supportedTldPage;
    result.date = new Date().toUTCString();
    result.tldList = [];

    while ((match = regEx.exec(pageHtml)) !== null) {
        result.tldList.push(match[1]);
    }

    var myString = JSON.stringify(result);
    return myString;
}

The regular expression pattern is the same. The only difference is .exec method doesn’t return mutliple matches so you have to loop through unless there are no matches.

So finally the result in JSON format is:

{
  "url": "http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/registrar-tld-list.html",
  "date": "Tue, 29 Jul 2015 20:26:02 GMT",
  "tldList": [
    "academy",
    "agency",
    "bargains",
    "bike",
    "biz",
    "blue",
    "boutique",
    "-- removed for brevity --",
    "eu",
    "fi",
    "fr",
    "it",
    "me",
    "me.uk",
    "nl",
    "org.uk",
    "ruhr",
    "se",
    "wien"
  ]
}

What’s next?

Now I have the supported TLD list. I can easily use it in my application but where’s the fun that? I’d rather convert this code into an API so that we can get the new TLDs whenever they are added to that page. I just they don’t change their HTML markup anytime soon! :-)

So in the next post I’ll work on the API…

Resources

dotnet, configuration, github comments edit

I hate configuration files! They contain highly sensitive information and there is always a risk of them leaking out. Especially when I read horror stories like this I remind myself to be more careful about it. It is particularly important when starting off with a private repository and converting it to open-source.

There are a few approaches and I will try to make pros and cons list for each of them which hopefully would make it easier for me to make a decision.

Method 1: Ignore config files completely

Simply add to gitignore before checking in the config files. This is not really a viable option but to cover all basis I wanted to add this one as well.

Pros

  • Very easy to implement
  • Easy to be consistent (use the same gitignore everywhere)

Cons

  • When someone checks out the project initially it wouldn’t compile because of the missing config file.
  • For a new user there is no way to figure out what the actual config should look like
  • For internal applications, you have to maintain local copies and handle distribution among developers manually.

Method 2: Check in configuration with placeholders

Check in the initial file then run the following command:

git update-index --assume-unchanged <file>

After this you can add the actual values and use the config as usual. But those values will be ignored. This way when someone checks out they can at least compile the project.

Pros

  • Less disruptive as the project can be compiled

Cons

  • When you make a change to the config file (e.g. add/rename keys) you can’t check in those changes as well

Method 3: Ignore config file, check in a sample file instead

This is essentially a merger of Method 1 and 2. Maintain two files (e.g. App.config and App.config.sample). Ignore the app.config from the getgo and only check in .sample file. Structurally it will be exactly the same as app.config without the confidential values.

Pros

  • Won’t compile by default, extra step for the people who check out (small one but still)

Cons

  • Both files need to be kept in sync manually
  • Still no online copy of the actual config file available

Method 4: Reference to another configuration file

A slight variation of the previous method. .NET allows cascading and linking with configuration files. For example say this is my app.config file:

<configuration>
	<appSettings file="Actual.config">
		<add key="Twilio.AccountSID" value="SAMPLE SID" />
		<add key="Twilio.AuthToken" value="SAMPLE TOKEN" />
		<add key="Non-sensitive-info" value="some value" />
	</appSettings>
</configuration>

I can link the appSettings section to another file named actual.config which would look like this:

<appSettings>
  <add key="Twilio.AccountSID" value="REAL 123456" />
  <add key="Twilio.AuthToken" value="REAL Token" />
  <add key="Non-sensitive-info" value="some value" />
</appSettings>

Actual.config never goes into source control. When the application cannot find the actual.config it just uses the values in the app.config. When the actual.config is present, those values override the sample values.

Pros

  • Only keys with sensitive values can be hidden

Cons

  • Doesn’t work with custom config sections
  • Still no online copy of the actual config file available

The idea is to maintain a private repository that contains the config files and add it as a submodule:

git submodule add git@github.com:volkanx/private-config.git

And link to that configuration file in the app.config as in Method 4.

Pros

  • Actual values are online and accessible

Cons

  • Requires a paid account for private repositories
  • More complicated

Verdict

As I already have a paid GitHub account the con for Method 5 is not quite valid for me so I think I will go with that one. Otherwise the actual values will be stored locally on the disk only and will eventually get lost. Looks like there is no convenient and easy way of doing it after all. If you have any suggestions I’m all ears.

As I said, I hate configuration files!

Resources

csharp, development comments edit

Generally we don’t need to understand how every single line of code is compiled but sometimes it helps to have a look at what’s going on under the hood. The code we write doesn’t always translate directly to IL (Intermediate Language) by the compiler (assuming optimizations come into play). It also helps what really happens when we use shorthand notations in the language. So I decided to explore the IL and see how simple constructs are compiled. Some of them are very simple and commonly known but some of them were new to me so compiled all of them in this post.

Toolkit

I used a disassembler and decompilers to analyze the output. When we compile a .NET program, it’s converted into IL code. Using a disassembler we can view the IL. Decompiling is the process of regenerating C# code. It’s hard to read IL so getting the C# code back helps to analyse easier. I used the following tools to view the IL and C# code:

  • ILDASM: Main tool I used is ILDASM (IL Disassembler). It’s installed with Visual Studio so doesn’t require anything special. It’s hard to make sense of IL but generally helpful to get an idea about what’s going on.
  • Telerik JustDecompile: Decompiler takes an assembly and re-generates .NET code from it. It may not fully return the original source code and sometimes it optimizes stuff so it may skew the results a little bit. But it’s generally helpful when you need to see a cleaner representation
  • ILSpy: Same functionality as JustDecompile. I used it to compare the decompiled results.

So without further ado, here’s how some language basics we use are compiled:

01. Constructor

When you create a new class the template in Visual Studio creates a simple blank class like this:

namespace DecompileWorkout
{
    class TestClass
    {
    }
}

and we can create instances of this class as usual:

var newInstance = new TestClass();

It works even though we don’t have a constructor in our source code. The reason it works is that the compiler adds the default constructor for us. When we decompile the executable we can see it’s in there:

But when we add a constructor that accepts parameters the default constructor is removed:

It makes sense otherwise all classes would have parameterless constructor no matter what you do.

02. Properties

Properties have been supported since forever and they are extremely handy. In the olden days we used to create a field and create getter and setter methods that access that field such as:

private string _name;
public void SetName(string name)
{
    _name = name;
}
public string GetName()
{
    return _name;
}

Now we can get the same results by

public string Name { get; set; }

What happens under the hood is the compiler generates the backing field and getter/setter methods. For example if we disassemble the class with the above we’d get something like this:

Apart from the weird naming (I guess it’s to avoid naming collisions) it’s exactly the same code.

The backing field is only generated when this shorthand form is used. If we implement the getter/setter like this

public string Name 
{
    get
    {
        throw new NotImplementedException();
    }
    set
    {
        throw new NotImplementedException();
    }
}

Then the generated IL only contains the methods and not the backing field

One last thing to note is that we cannot have fields on interfaces but we are allowed to have properties. It’s because when used on interfaces the compiler again generates only the getter and setter methods.

03. Using statement

A handy shortcut to use when dealing with IDisposable objects is the using statement. To see what happens when we use using I came up with this simple code:

After decompiling:

So basically it just surrounds the code with a try-finally block and calls Dispose method of the object if it’s not null. Instead of writing this block of code it’s very handy to use the using statement. But at the end there is nothing magic about it, it’s just a shortcut to dispose objects securely.

04. var keyword and anonymous types

var keyword is used to implicitly specify a type. The compiler infers the type from the context and generates the code accordingly. Variables defined with var keyword are still strongly typed because of this feature. Initially I resisted using it as a shortcut because it’s main purpose is not to provide the luxury not to create implicitly typed variables. It’s introduced at the same time as anonymous types which makes sense because without having such a keyword you cannot store anonymous objects. But I find myself using it more and more for implicitly specifying the types. Looking around I see that I’m not the only one so if it’s just laziness at least I’m not the only one to blame!

So let’s check out how this simple code with one var and an anonymous type:

public class UsingVarAndAnonymousTypes
{
    public UsingVarAndAnonymousTypes()
    {
        var someType = new SomeType();
        var someAnonymousType = new { id = 1, name = "whatevs" };
    }
}

Decompiling the code doesn’t show the anonymous type side of the story but we can see that var keyword is simply replaced with the class type.

To see what happens with anonymous types let’s have a look at the IL:

For the anonymous type the compiler created a class for us with a very user-friendly name: <>f__AnonymousType0`2. It also generated readonly private fields and only getters that means they are immutable and we cannot set those values once the object is initialized.

06. Async and await

Async/await helps asynchronous programming much easier for us. It’s hiding a ton of complexity and I think it makes sense to invest some time into investigating how it works to use it properly. First of all, marking a method async doesn’t make everything asynchronous automagically. If you don’t use await inside the function it will run synchronously (the compiler will generate a warning about it). When you call an async method the execution continues without blocking the main thread. If it’s not awaited it returns a Task instead of the actual expected result. This means it's an ongoing process and it will run in the background without blocking the main flow. This is great so you can fire up a bunch of tasks you need and then wait for the results whenever you need all of them. For example:

public async void DownloadImageAsync(string imageUrl, string localPath)
{
    using (var webClient = new WebClient())
    {
        Task task = webClient.DownloadFileTaskAsync(imageUrl, localPath);

        var hashCalculator = new SynchronousHashCalculator();

        // Do some more work while download is running

        await task;
        using (var fs = new FileStream(localPath, FileMode.Open, FileAccess.Read))
        {
            byte[] testData = new byte[new FileInfo(localPath).Length];
            int bytesRead = await fs.ReadAsync(testData, 0, testData.Length);
            
            // Do something with testData
        }
    }
}

In this example, a file is downloaded asynchronously. The execution continues after the call has been made so we can do other unrelated work while the download is in progress. Only when we need the actual value, to open the file in this instance, we await the task. If we didn’t await we would try to access the file before it’s completely written and closed resulting in an exception.

One confusing aspect maybe calling await on the same line as in fs.ReadAsync in the above example. It looks like it’s synchronous as we are still waiting on the same line but the difference is the main thread is not blocked. If it’s a GUI-based application for exmaple it would still be responsive.

So let’s decompile the assembly to see how it works behind the scenes. This is one of the times I like having redundancy. Because Just Decompile failed to generate C# code for the class using async/await so I had to switch to ILSpy.

It generates a struct implementing IAsyncStateMachine interface. The “meaty” part is the MoveNext method:

It’s very hard to read compiler-generated code but I think this part is interesting:

switch (this.<>1__state)
{
case 0:
	taskAwaiter = this.<>u__$awaiter7;
	this.<>u__$awaiter7 = default(TaskAwaiter);
	this.<>1__state = -1;
	break;
case 1:
	goto IL_102;
default:
	this.<task>5__2 = this.<webClient>5__1.DownloadFileTaskAsync(this.imageUrl, this.localPath);
	this.<hashCalculator>5__3 = new SynchronousHashCalculator();
	taskAwaiter = this.<task>5__2.GetAwaiter();
	if (!taskAwaiter.IsCompleted)
	{
		this.<>1__state = 0;
		this.<>u__$awaiter7 = taskAwaiter;
		this.<>t__builder.AwaitUnsafeOnCompleted<TaskAwaiter, AsyncMethods.<DownloadImageAsync>d__0>(ref taskAwaiter, ref this);
		flag = false;
		return;
	}
	break;
}
taskAwaiter.GetResult();
taskAwaiter = default(TaskAwaiter);

If the task has not been completed it sets the state to 0 which then comes back to default so it goes back and forth until taskAwaiter.IsCompleted is true.

07. New C# 6.0 Features

C# 6.0 has just been released a few days ago. There aren’t many groundbreaking features as the team at Microsoft stated. Mostly the new features are shothand notations to make the code easier to read and write. I decided to have a look at the decompiled code of some of the new structs

Null-conditional operators

This is a great remedy to ease the neverending null-checks. For example in the code below

var people = new List<Person>();
var name = people.FirstOrDefault()?.FullName;

If the list is empty name will be null. If we didn’t have this operator it would throw an exception therefore we would have to do the null checks on our own. The decompiled code for the above block is like this:

private static void Main(string[] args)
{
    string fullName;
    Program.Person person = (new List<Program.Person>()).FirstOrDefault<Program.Person>();
    if (person != null)
    {
        fullName = person.FullName;
    }
    else
    {
        fullName = null;
    }
}

As we can see, the generated code is graciously carrying out the null check for us!

String interpolation

This is one of my favourites: Now we can simply place put the parameters directly inside a string instead of placeholders. Expanding on the example above, imagine the Person class is like this:

class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string FullName => $"{FirstName} {LastName}";
}

The decompiled code looks familiar though:

public string FullName
{
    get
    {
        return string.Format("{0} {1}", this.FirstName, this.LastName);
    }
}

This is exactly how I would do it if we didn’t have this new feature.

Auto-property initializers

We can set the default value for a property on the declaration line now, like this:

public string FirstName { get; set; } = "Volkan";

And the IL generated for the code is like this:

Decompiled code varies. Just Decompile doesn’t just displays the exact same statement as above. ILSpy on the other hand is closer to the IL:

public Person()
{
	this.<FirstName>k__BackingField = "Volkan";
	this.<BirthDate>k__BackingField = new DateTime(1966, 6, 6);
	base..ctor();
}

So what it does is, in the constructor it sets the private backing field with this default value we assign.

Expression bodied members

We can now provide the body of the function as an expression following the declaration like this:

public int GetAge() => (int)((DateTime.Now - BirthDate).TotalDays / 365);

The decompiled code is not very fancy:

public int GetAge()
{
    TimeSpan now = DateTime.Now - this.BirthDate;
    return (int)(now.TotalDays / 365);
}

It simply puts the expression back in the body.

Index initializers

It’s a shorthand for index initialization such as

Dictionary<int, string> dict = new Dictionary<int, string>
{
    [1] = "string1",
    [2] = "string2",
    [3] = "string3" 
};

And the decompiled code is no different than what it is now:

Dictionary<int, string> dictionary = new Dictionary<int, string>();
dictionary[1] = "string1";
dictionary[2] = "string2";
dictionary[3] = "string3";

As I mentioned there is nothing fancy about the new features in terms of the generated IL but some of them will help save a lot of time for sure.

08. Compiler code optimization

It is interesting to see how decompilers use optimization on their own. Before I went into this I thought they would just reverse engineer whatever is in the assembly but turns out that’s not the case.

First, let’s have a look at the C# compiler’s behaviour. In Visual Studio, Optimize code flag can be found under Project Properties -> Debug menu

In order to test code optimization I created a dummy method like this:

public class UnusedVariables
{
    public void VeryImportantMethod()
    {
        var testClass = new TestClass("some value");
		TestClass testClass2 = null;
        var x = "xyz";
        var n = 123;
    }
}

First I compiled it by default values:

Then disassembled the output:

I’m no expert in reading IL code but it’s obvious in the locals section there are 2 TestClass instances, a string and a integer.

When I compiled it with the optimization option (/o):

I got the following when disassembled:

There’s no trace of the value types (string and the int). Also the TestClass instance with null value is gone. But the TestClass instance, which is created in the heap rather than the stack, is still there even though there is nothing referencing it. It’s interesting to see that difference.

When I decompiled both versions with ILSpy and JustDecompile unused local variables were removed all the time. Apparently they do their own optimization.

Conclusion

It was a fun exercise to investigate and see the actual generated code. It was helpful to understand better complex constructs like async/await and also helpful to see there’s nothing to fear about the new features of C# 6.0!

Resources

csharp, development, aws, s3, rareburg, rss comments edit

Recently I got burned by another Windows update and my favorite podcatcher on the desktop, Miro, stopped functioning. I was already planning to develop something on my own so that I wouldn’t have to manually backup OPMLs (I’m sure there must be neat solutions already but again couldn’t resist the temptation of DIY!). So started exploring RSS feeds and the ways to produce and consume them. My first toy project is a feed generator for Rareburg, one of my favourite sites, Knowledge Base. Word on the street is the official feed for articles is on its way. Before it’s out there I decided play around with their API and generate my own feed.

Implementation

I developed a console application in C# that can be scheduled to generate the feed and upload it to AWS S3. Source code is here

Apparently .NET Framework has a System.ServiceModel.Syndication namespace since version 3.5 that contains all the tools need to consume and create an RSS feed with a few lines of code. The core part of the application is the part that generates the actual feed:

public SyndicationFeed GetFeed(List<Article> articles)
{
    SyndicationFeed feed = new SyndicationFeed(_feedServiceSettings.Title, _feedServiceSettings.Description, new Uri(_feedServiceSettings.BaseUri));
    feed.Title = new TextSyndicationContent(_feedServiceSettings.Title);
    feed.Description = new TextSyndicationContent(_feedServiceSettings.Description);
    feed.BaseUri = new Uri(_feedServiceSettings.BaseUri);
    feed.Categories.Add(new SyndicationCategory(_feedServiceSettings.Category));

    var items = new List<SyndicationItem>();
    foreach (var article in articles
        .Where(a => a.ispublished)
        .Where(a => a.ispubliclyvisible)
        .OrderByDescending(a => a.publisheddate))
    {
        var item = new SyndicationItem(article.title, 
            article.bodysnippet,
            new Uri (string.Format(_feedServiceSettings.ArticleUrl, article.slug)),
            article.articleid.ToString(),
            article.publisheddate);
        
        item.Authors.Add(new SyndicationPerson("", article.authorname, string.Format(_feedServiceSettings.UserUrl, article.user.username)));
        items.Add(item);
    }
    
    feed.Items = items;
    return feed;
}

The feed itself is independent from the format (RSS or Atom). The classes that do the actual formatting are derived from the abstract SyndicationFeedFormatter class: Rss20FeedFormatter and Atom10FeedFormatter. The format is read from the config file so the application supports both formats.

public SyndicationFeedFormatter CreateFeedFormatter(SyndicationFeed feed)
{
    string feedFormat = _feedSettings.FeedFormat;
    switch (feedFormat.ToLower())
    {
        case "atom": return new Atom10FeedFormatter(feed);
        case "rss": return new Rss20FeedFormatter(feed);
        default: throw new ArgumentException("Unknown feed format");
    }
}

and in the publisher service it gets the output feed:

var memStream = new MemoryStream();
var settings = new XmlWriterSettings(){ Encoding = Encoding.UTF8 };
using (var writer = XmlWriter.Create(memStream, settings))
{
    feedFormatter.WriteTo(writer);
}

I added the output of the API call as a JSON file under samples. Also implemented a fake API client called OfflineRareburgClient. It reads the response from a file instead of actually making an API call. It comes in handy if you don’t have an Internet connection or a valid API key. To use it in offline mode you have to change the the line that creates the client from this

var rareburgClient = new RareburgClient(configFactory.GetApiSettings());

to this

var rareburgClient = new OfflineRareburgClient(configFactory.GetOfflineClientSettings());

Lessons learned / Implementation Notes

So since the motivation behind the project is to learn more about manipulating RSS by myself here’s a few things that I’ve learned and used:

  • I created an IAM account that has write-only access to the bucket the feed will be stored in. It works with default permissions which is private access. But since RSS reader service will need access to the feed I had to upload the file with public access, Apparently changing ACL requires different permissions, namely s3:PutObjectAcl. Weird thing is just replacing s3:PutObject with s3:PutObjectAcl didn’t work either. They had to be both allowed. So after a few retries the final policy shaped up to be like this:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1418647210000",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::vp-public-temp/*"
            ]
        }
    ]
}
  • In this implementation I used Visual Studio’s neat feature Paste JSON As Classes.

Edit -> Paste Special -> Paste JSON As Classes

First I captured the API response with Fiddler. Then created a blank .cs file and using this option created the class to deserialize the response. Using strongly typed objects can easily be a daunting task if you are wrapping a whole API so I’d prefer to use dynamic objects like this:

var response = client.Execute<dynamic>(request);
var latestArticles = ((IEnumerable) response.Data.payload.articles).Cast<dynamic>()
                        .OrderByDescending(a => a.publisheddate)
                        .Select(a => new 
                        { 
                            slug = a.slug,
                            id = a.articleid,
                            title = a.title,
                            publishedDate = a.publisheddate,
                            ispublished = a.ispublished,
                            isvisible = a.ispubliclyvisile
                        });

This works fine but the problem in this case was the hyphens in some of the JSON property names which are not supported in C#. I can get around it if I use the strongly typed objects and specify the property name explicitly, such as:

[JsonProperty("is-published?")]
public bool ispublished { get; set; }

But I cannot do it in the dynamic version. I’ll put a pin into it and move on for now but have a feeling it will come back and haunt me in the future!

  • Default output of the RSS feed passes the validation but get 3 warnings. I’m sure they can be safely ignored but just of curiosity researched a little bit to see if I could pass with flying colors. Two of the three warnings were

    line 1, column 39: Avoid Namespace Prefix: a10 [help] <rss xmlns:a10=”http://www.w3.org/200 …

    line 12, column 5302: Missing atom:link with rel=”self” [help] … encompassing the Ch</description></item></channel></rss>

I found the solution on StackOverflow (not surprisingly!)

I made a few changes in the formatter factory

case "rss":
{
    var formatter = new Rss20FeedFormatter(feed);
    formatter.SerializeExtensionsAsAtom = false;
    XNamespace atom = "http://www.w3.org/2005/Atom";
    feed.AttributeExtensions.Add(new XmlQualifiedName("atom", XNamespace.Xmlns.NamespaceName), atom.NamespaceName);
    feed.ElementExtensions.Add(new XElement(atom + "link", new XAttribute("href", _feedSettings.FeedUrl), new XAttribute("rel", "self"), new XAttribute("type", "application/rss+xml")));
    return formatter;
}

I liked the fact I only had to make changes in one place so the factory could return a customized formatter instead of the default one and the rest of the application didn’t care at all. But unfortunately the fix required the publish URL or the feed. I got around it by adding to FeedSettings in the config but now the S3 settings and Feed settings need to be changed at the same time.

My idea was to make it like a pipeline so that the feed generator didn’t have to care how and where it’s published but this fix contradicted with that approach a little bit. Unfortunately it doesn’t look possible to use variables in the config files so that I could generate Feed.Url using the other settings.

  • The 3rd warning was encoding-related. If don’t explicitly specify the API uses ISO-8859-1 charset. I tried playing around a with a few headers to get the response in UTF-8 but the solution came from a friend: Accept-Charset header. So adding the header fixed that issue as well:
request.AddHeader("Accept-Charset", "UTF-8");

Conclusion

The genereated Atom feed doesn’t pass the validation but I will handle it later on. Since Atom is a newer format I think I’ll go with that in the future but so far it’s good to know that it’s fairly easy to play with RSS/Atom feeds with C# so it was a fun experiment after all…

Resources

personal, leisure, comedy, travel comments edit

I may not go out too often but I couldn’t miss this one for the world! Jim Jefferies is my favourite comedian and I finally had a chance to see a live show.

Jim Jefferies Ticket

When I saw the terrace option I went for it. Hoping paying extra would warrant better seats. I couldn’t be wronger! Even the MC made a lot of jokes about rich people sitting miles away from the stage while regular ticket owners had very close places.

There were 3 support acts which were about 10-15 minutes each. Jim took the stage around 20:40 and stayed until 22:00.

The first half was mostly about his newborn son and his relation with his girlfriend. Second half is mostly about religion, guns and sex. Overall it was hilarious as I anticipated. I especially loved how he encouraged people to overpower the security and have free beers :-)

I’ve seen all his shows before so seeing all new material was quite refreshing. Even though the stage was in galaxy far far away and the sound was not too loud still it was very nice to watch Jim Jefferies in open air.

Resources

xml, programming, csharp comments edit

Even though it’s a noisy data format it’s still commonly used and I happen to end up in situations that I need to use .NET to serialize and deserialize to and from XML documents. Here are a few problems that I had to tackle in the past. All the sample source code can be found in a GitHub repository.

01. Skip serializing unassigned values

Let’s assume we have a hypothetical Player class that has a few fields that looks like this:

public class Player
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int TotalGoalsScored { get; set; }
    public double AverageGoalsPerGame { get; set; }
    public Team Team { get; set; }
}

public class Team
{
    public string Name { get; set; }
    public int YearEstablished { get; set; }
}

So it contains both value and reference fields.

Now let’s create two random players and serialize them:

XmlSerializer serializer = new XmlSerializer(typeof(List<Player>));
Player player1 = new Player() { Id = 1, FirstName = "John", LastName = "Smith", TotalGoalsScored = 50, AverageGoalsPerGame = 0.7, Team = new Team() { Name = "Arsenal" } };
Player player2 = new Player() { Id = 2, FirstName = "Jack" };
using (StringWriter writer = new StringWriter())
{
    serializer.Serialize(writer, new List<Player>() { player1, player2 });
    Console.WriteLine(writer.ToString());
}

This code yields the following XML:

<?xml version="1.0" encoding="utf-16"?>
<ArrayOfPlayer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="
http://www.w3.org/2001/XMLSchema">
  <Player>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
    <Team>
      <Name>Arsenal</Name>
      <YearEstablished>0</YearEstablished>
    </Team>
  </Player>
  <Player>
    <Id>2</Id>
    <FirstName>Jack</FirstName>
    <TotalGoalsScored>0</TotalGoalsScored>
    <AverageGoalsPerGame>0</AverageGoalsPerGame>
  </Player>
</ArrayOfPlayer>

The thing to note here is that reference types were not serialized when they were unassigned, i.e Team object and LastName field in Player2. But same didn’t go for values fields. TotalScore, AverageScorePerGame and YearEstablished fields were serialized as 0. Of course this might be the desired outcome depending on your business requirements but in my case I didn’t want this because it might mislead the client consuming this data. At the very least I find it inconsistent as some unassigned values are serialized and some aren’t.

So to change the behaviour all we have to do is set the DefaultValue attribute for the numeric values like this:

public class Player
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }

    [DefaultValue(0)]
    public int TotalGoalsScored { get; set; }

    [DefaultValue(0)]
    public double AverageGoalsPerGame { get; set; }
    
    public Team Team { get; set; }
}

public class Team
{
    public string Name { get; set; }

    [DefaultValue(0)]
    public int YearEstablished { get; set; }
}

With this change the output becomes:

<?xml version="1.0" encoding="utf-16"?>
<ArrayOfPlayer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="
http://www.w3.org/2001/XMLSchema">
  <Player>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
    <Team>
      <Name>Arsenal</Name>
    </Team>
  </Player>
  <Player>
    <Id>2</Id>
    <FirstName>Jack</FirstName>
  </Player>
</ArrayOfPlayer>

So as the int and double values defaulted to 0 and we explicitly set the default value they won’t be serialized unless they are assigned a value other than zero.

In case you are wondering making the int and double nullable doesn’t produce the same result. In that case they are serialized with null values:

<TotalScore xsi:nil="true" />
<AverageScorePerGame xsi:nil="true" />

I think this if-you-set-it-you-get-it-back approach is consistent and makes most sense to me. I created a test project to fiddle with these classes. It has all 3 versions of the classes under different names and a console application displaying the outputs. If you want to play around you can get the source code here

02. Deserialize straight to list

Sometimes what you get in an XML document is just a list of items and the list is just a container so that the XML is well-formed. For example in the following example we have a list of players:

<PlayerList>
  <Player>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
    <Team>
      <Name>Arsenal</Name>
      <YearEstablished>0</YearEstablished>
    </Team>
  </Player>
  <Player>
    <Id>2</Id>
    <FirstName>Jack</FirstName>
    <TotalGoalsScored>0</TotalGoalsScored>
    <AverageGoalsPerGame>0</AverageGoalsPerGame>
  </Player>
</PlayerList>

The sole purpose of PlayerList tag is to act as root and contain multiple objects. Other than that it has no function. When we deserialize this to C# objects we would normally need 2 objects like this:

public class Player
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int TotalGoalsScored { get; set; }
    public double AverageGoalsPerGame { get; set; }
}

and

[XmlRoot]
public class PlayerList
{
    [XmlElement("Player")]
    public List<Player> Players { get; set; }
}

and we can get the list of objects by:

string inputXmlPath1 = @".\InputXml.xml";
using (StreamReader reader = new StreamReader(inputXmlPath1))
{
    XmlSerializer playerListSerializer = new XmlSerializer(typeof(PlayerList));
    PlayerList playerList = (PlayerList)playerListSerializer.Deserialize(reader);
}

In such cases I generally tend to eliminate the “middle man”. I don’t want a container class which only holds a List. So I’d like to deserialize this XML directly into List.

What I want to do is actually this:

using (StreamReader reader = new StreamReader(inputXmlPath1))
{
    XmlSerializer playerListSerializer = new XmlSerializer(typeof(List<FinalPlayer>));
    List<FinalPlayer> playerList = (List<FinalPlayer>)playerListSerializer.Deserialize(reader);
}

But without any modifications to our classes it throws an exception:

By eliminating PlayerList class we actually stopped providing XmlRoot info to the serializer. But that can quickly be remedied by using a constructor overload of XmlSerializer:

using (StreamReader reader = new StreamReader(inputXmlPath1))
{
    XmlSerializer playerListSerializer = new XmlSerializer(typeof(List<FinalPlayer>), new XmlRootAttribute("PlayerList"));
    List<FinalPlayer> playerList = (List<FinalPlayer>)playerListSerializer.Deserialize(reader);
}     

This works when the class name matches the XmlElement name. If you need to customize your class’s name (like FinalPlayer in my example) you need to decorate the class with XmlType and supply the element name so that the serializer can do the mapping.

[XmlType("Player")]
public class FinalPlayer
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int TotalGoalsScored { get; set; }
    public double AverageGoalsPerGame { get; set; }
}

So now we can have any class name mapping to corresponding elements and deserialized straight to a List.

03. Remove namespaces

I know using namespaces is considered a good practice as it helps avoid name conflicts but honestly I never suffered from such a problem before so I don’t mind removing them from my XML documents and clean the clutter. XML is already a noisy data format no need to bloat it any further. I think it might help when you are working with data from different sources but if you are only working with your own classes and data structures name conflict is generally not something to worry about (assuming you name your objects properly)

So say you have a simple Player class and you serialize it with a out-of-the-box XmlSerializer:

Player player = new Player() { Id = 102, FirstName = "Danny", LastName = "TopScorer", AverageGoalsPerGame = 3.5, TotalGoalsScored = 150 };
XmlSerializer serializer = new XmlSerializer(typeof(Player));
XmlWriterSettings settings = new XmlWriterSettings() { OmitXmlDeclaration = true, Indent = true, Encoding = Encoding.UTF8 };
StringBuilder output = new StringBuilder();
XmlWriter writer = XmlWriter.Create(output, settings);
serializer.Serialize(writer, player);
Console.WriteLine(output.ToString());

This yields the following output:

<Player xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://
www.w3.org/2001/XMLSchema">
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

So in order to get rid of namespaces we have to specify our custom namespaces, which is empty in this case and use the overloaded XmlSerializer constructor to pass it in:

XmlSerializerNamespaces xns = new XmlSerializerNamespaces();
xns.Add(string.Empty, string.Empty);
serializer.Serialize(writer, player, xns);

to get the XML without any namespaces:

<Player>
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

04. Change output encoding

When you serialize with the default options and with the XML declaration the encoding is UTF-16. Oddly enough there is no option to specify the output encoding. In order to achieve that and sometimes you may need to change it. For example a 3rd party was expecting UTF-8 in my case so the default value didn’t cut it for me.

So using the same Player class from the last example the following code produces an output with UTF-16

Player player = new Player() { Id = 102, FirstName = "Danny", LastName = "TopScorer", AverageGoalsPerGame = 3.5, TotalGoalsScored = 150 };
XmlSerializer serializer = new XmlSerializer(typeof(Player));
XmlWriterSettings settings = new XmlWriterSettings() { OmitXmlDeclaration = false, Indent = true, Encoding = Encoding.UTF8 };
StringBuilder output = new StringBuilder();
XmlWriter writer = XmlWriter.Create(output, settings);
XmlSerializerNamespaces xns = new XmlSerializerNamespaces();
xns.Add(string.Empty, string.Empty);
serializer.Serialize(writer, player, xns);
Console.WriteLine(output.ToString());

Output:

<?xml version="1.0" encoding="utf-16"?>
<Player>
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

I found the solution here on StackOverflow.

So the solution is to extend from StringWriter and add a new constructor that accepts Encoding. StringWriter already has an Encoding property as shown below, but unfortunately it doesn’t have a public setter so we need a subclass to fiddle with it.

StringWriterWithEncoding simply overrides th encoding field:

public sealed class StringWriterWithEncoding : StringWriter
{
    private readonly Encoding encoding;

    public StringWriterWithEncoding(Encoding encoding)
    {
        this.encoding = encoding;
    }

    public override Encoding Encoding
    {
        get { return encoding; }
    }
}

By using the new class the following code produces the desired outcome:

StringWriterWithEncoding utf8StringWriter = new StringWriterWithEncoding(Encoding.UTF8);
Player player = new Player() { Id = 102, FirstName = "Danny", LastName = "TopScorer", AverageGoalsPerGame = 3.5, TotalGoalsScored = 150 };
XmlSerializer serializer = new XmlSerializer(typeof(Player));
XmlWriterSettings settings = new XmlWriterSettings() { OmitXmlDeclaration = false, Indent = true, Encoding = Encoding.UTF8 };
XmlWriter writer = XmlWriter.Create(utf8StringWriter, settings);
XmlSerializerNamespaces xns = new XmlSerializerNamespaces();
xns.Add(string.Empty, string.Empty);
serializer.Serialize(writer, player, xns);
Console.WriteLine(utf8StringWriter.ToString());
Console.ReadLine();

Output:

<?xml version="1.0" encoding="utf-8"?>
<Player>
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

05. Fail deserialization when unexpected elements are encountered

By default XMLSerializer is very fault tolerant. It just salvages whatever it can and leaves alone the unmatching values. Sometimes you may need to be stricter. For example I had a case when the external source returned a whole different XML when there was an error on its end. So when that happened I wanted to be notified about it instead of getting null objects quietly.

For example, assume we have the usual PlayerList that we are deserializing to List. If for some reason we get a weird Player list like this:

<PlayerList>
  <Customer>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
  </Customer>
</PlayerList>

When we deserialize it with the following code block

try
{
    using (MemoryStream memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(rawXml)))
    {
        XmlSerializer serializer = new XmlSerializer(typeof(List<Player>), new XmlRootAttribute("PlayerList"));
        var result = (List<Player>)serializer.Deserialize(memoryStream);
    }
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
    if (ex.InnerException != null)
    {
        Console.WriteLine(ex.InnerException.Message);    
    }
}

XmlSerializer doesn’t complain at all. Instead it just returns an empty list because it cannot find any Player objects. In order to change this behaviour we can use the events it exposes such as UnknownNode, UnknownElement, UnknownAttribute. UnknownNode is just the combination of the first two events. In my case I didn’t want to be too strict so I didn’t want an exception in case of a missing attribute but hooked into the UnknownElement event:

try
{
    using (MemoryStream memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(rawXml)))
    {
        XmlSerializer serializer = new XmlSerializer(typeof(List<Player>), new XmlRootAttribute("PlayerList"));
        serializer.UnknownElement += serializer_UnknownElement;
        var result = (List<Player>)serializer.Deserialize(memoryStream);
    }
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
    if (ex.InnerException != null)
    {
        Console.WriteLine(ex.InnerException.Message);
    }
}

and added the event handler:

void serializer_UnknownElement(object sender, XmlElementEventArgs e)
{
    throw new ArgumentException(string.Format("Unknown element: {0}", e.Element.LocalName));
}

So now at least I can distinguish a weird list from a really empty list.

Resources

programming, debug comments edit

Even though APIs are RESTful these days sometimes you might need to interact with a SOAP-based web service.

I had to consume an SOAP XML web service the other day and encountered a strange problem while debugging my test application. The application was working fine but when I tried to debug it was closing silently. So as a first step I opened Debug –> Exceptions and checked all to let the application break upon any type of exception it gets.

Break when any exception is thrown

After running the application with this setting at least I was able to see what the fuss was all about:

SOAP BindingFailure exception

There are various approaches to resolve this issue and finally I found the correct one on a Stackoverflow answer: Go to Project Properties and in the Build tab turn on Generate Serialization Assembly setting.

Turn on Generate Serialization Assembly setting

When this setting it turned on, it generates an assembly named {Your Application Name}.XmlSerializers.dll.

Out of curiosity I peeked into the assembly with ILSpy and it looks like this:

XmlSerializers.dll in ILSpy

Basically it just generates an abstract class deriving from XmlSerializer (named XmlSerializer1) and generates a bunch of sealed child classes deriving from that class.

I’ve never been a fan of auto-generated code anyway and looks like I’m not going to need it in my code but it’s used in the background by the framework. I added links to a few Stackoverflow answers related to that assembly though. I gather the advantage of turning it on is reduced startup time and being able to debug in this case. The disadvantage is increased deployment size which I think is negligible in this day and age so I’ll just keep it on and debug happily ever after!

Resources

events comments edit

Yesterday I attended AWS Summit event and wanted to post my impressions and notes from the event. As you can see in the image below there were quite a few people there:

Overview during lunch break

Keynote

Keynote was given by Amazon.com’s CTO Werners Vogel. Some notes from the keynote:

  • It was quite a big event: 3000+ attendees
  • Intel was introduced as platinum partner. Last month I attended to AWSome Day and they mentioned Intel have manufactured a special chip for EC2 only (C4-type instances). It’s designed specifically to address heat considerations.
  • They have now more than 1 minnion active customers
  • According to Gartner, AWS has 5-times compute capacity than 14 closest competitors combined (which also explains the specific requirements to reduce heat in chips)

  • There were guest speakers who emphasized different features of AWS
    • GoSquared: Experimentation, how it is easier to try out new technologies in a cost-effective way
    • iTv: Microservices as opposed to monolithic applications
    • OmniFone: Scalability
    • Just Giving: Distributed regions and data analysis capabilities
    • AWS VP of public sector: Usage in public sector. It started after 12 and everyone started leaving.

  • New features introduced

    • Elastic File System: SSD-based, automatically replicated, auto-resized file system. Currently in preview mode, will be released this summer
    • Machine Learning: Makes it easier to analyze big data for search intent, demand estimation etc.
    • Lambda: Event-driven, fully-managed compute service
    • EC2 Container Service: There is a big shift towards Docker and containers. He said: “The container is now the output of your development process”

  • Generally microservices approach is favored: Building large solutions with smaller blocks allows faster, cost-effective solutions which can be adapted easier to changes
  • Security and their compliance with major certification requirements is emphasized. But he didn’t mention shared-responsibility principle which AWS adopts. Just because you AWS doesn’t mean you’re compliant to all the regulations as well.
  • They have support for hybrid solutions but according to AWS’s vision it’s not the destination, just a transition
  • He made an analogy that fighting the cloud is like “fighting the gravity”: It’s a fight you cannot win!

Track sessions

After the lunch break there were a lot of sessions about various AWS features. I picked Technical Track 1 which included EC2 Container Services, Lambda, CloudFormation and CodeDeploy

EC2 Container Service

I know using containers is a big deal nowadays but still haven’t the chance to try it out myself. I was hoping from this session to find out more about it but didn’t benefit much from it as it didn’t cover the very basics. But in the light of keynote, it’s obvious there’s a huge demand on containers so will be the first service I’ll try next.

Lambda

This is a very cool new service. Instead of running every small job in small services in small EC2 instances now we can get rid of all the maintenance and costs and just let AWS run our code whenever an event is triggered.

  • It currently supports a wide variety of event sources such as objects put to S3 buckets, DynamoDB table updates, SNS messages etc.
  • SQS support is coming soon
  • Currently it runs node.js but it can be used to launch a Java application but native Java support is coming soon so it can directly execute Java code.
  • It even allows access to underlying processes, threads, file system and sockets
  • It’s charged per invocation so you don’t pay anything for idle times.

Infrastructure as code

The focus of the session was CloudFormation but a client of AWS showed how they used Eclipse to deploy with a single click so it can be done in several ways. (That’s why the title of the talk wasn’t CloudFormation)

This is also a great tool to automatically launch a whole infrastructure based on ERB configuration files. I don’t have much experience in this one but was a nice session to see its capabilities in action.

CodeDeploy

This is yet another cool new feature just gone out of preview mode. You can automatically deploy new code based on your rules. For example, you can deploy one-at-a-time. It verifies every deployment and moves to next one. Or you can deploy new version on half of the instances meaning that half of your system will be available even if the deployment fails. Or if you like some adrenaline rush you can deploy to all instances at once :-)

You can specify pre and post scripts that handle the clean-up tasks and verifying the deployment.

CodeDeploy has been GA (General Availability) but 2 more services were introduced yesterday: CodePipeline and CodeCommit

The idea is to fully-automate the whole process from source code checking to deployment which according to the speaker is a very time consuming task in their own applications.

Conclusion

It was a nice big event and I’m glad I had the chance to attend to it. The content was rich to cover every aspect of AWS. I decided to attend to the sessions instead of labs as I can do them online here. It’s also a bit overwhelming to see there’s so much to learn about but that’s always a challenge in this industry anyway. As Werner Vogels said in his keynote: “Innovation is continuous!”

Resources

gadget, raspberry pi, aws, route53, development comments edit

Everything is *-as-a-service nowadays and books are no exception. I have a Safari Books Online subscription which I enjoy a lot. It is extremely convenient to have thousands of books at your fingertips. But… DIY still rules! There are times you may still want to have your own collection and it doesn’t just have to be an e-book collection. And on top of all it’s purely fun and educational.

Ingredients

Basically all you need is an up-and-running Raspberry Pi. If you have one, you can skip this section. These are just the components I used in my implementation:

Keyboard and display are needed for the initial setup. Once it’s connected to network you can do everything over SSH.

Calibre, my old friend!

I’ve been a fan of Calibre for many years now. With Calibre you can manage any document collection you want. I love its user interface which allows me to easily tag and categorize my collections. Also it can convert between a wide range of formats. And when I plug in my Kindle it automatically recognizes the device and I can right-click on a book and send to device very easily. Check out this page for a full list of features.

My favorite feature is that it can act as a server. I mainly use Stanza on my iPad and connect to my Calibre server to download comic books over WiFi. The downside of running it locally on my computer is that the machines needs to be on and I have to enable the content server on Calibre manually before connecting from iPad.

Here comes the project

Instead, what I’d like to have is

  • An online server available all the time: Raspberry pi is very power-efficient little monster so I can keep it running

  • Isolated from my main machine: For security reasons I don’t want to open a port on my desktop computer

  • Accessible both from inside and outside: Of course I could just launch a cheap AWS instance and use it as the content server but

    • It’s not as fun!
    • If I need to move around GBs of data local network rocks!

Also, as I said it’s mostly for fun so I don’t have to justify it completely to myself :-)

Roll up your sleeves!

Step 0: Setup Raspberry Pi

If you haven’t done it already you can easily set it up by following the Quick Start Guide on raspberrypi.org

Step 1: Install Calibre on Raspberry Pi

This one was a bit harder than I expected. The official site says “Please do not use your distribution provided calibre package, as those are often buggy/outdated. Instead use the Binary install described below.”

and the suggested command is

sudo -v && wget -nv -O- https://raw.githubusercontent.com/kovidgoyal/calibre/master/setup/linux-installer.py | sudo python -c "import sys; main=lambda:sys.stderr.write('Download failed\n'); exec(sys.stdin.read()); main()"

Alas, after running the command I got the following error:

Calibre installation error

I asked in the Calibre forums about the error and I was advised to build it from source code. Because the compiled version is for Intel processors and it doesn’t work on an ARM processor which Raspberry Pi has. The instructions for building it from source is on the same page but I haven’t tried it myself.

As a fallback method I simply used apt-get to install it:

sudo apt-get update && sudo apt-get install calibre

It worked fine but the version is 0.8.51 (latest release at the time of this writing is 2.20.0 so you can see it’s a little bit outdated!). Content server has been implemented long time ago so for all intents and purposes it’s good enough for this project.

Step 2: Run it as server

Now that we have Calibre installed we can run the content server from command line:

calibre-server --with-library=/home/pi/calibre/myLibrary --daemonize

This will run the process in the background (because of the –daemonize flag) but id the Pi restarts it will not run automatically. To fix that I added the command to crontab by first entering the following command

crontab -e

and adding the following line after the comments

@reboot calibre-server --with-library=/home/pi/calibre/myLibrary --daemonize

so that the same command is run after every reboot.

Crontab on Raspberry Pi

Now let’s test if we’re online. By default, Calibre starts serving on port 8080 with no authentication required. So just find the IP address of the Raspberry Pi and try to connect it over HTTP from your machine such as http://{Local IP}:8080

and voila!

Now we can add some books and start using it from any machine on the network.

Step 3: Add some books

First I uploaded some files to a different folder using WinSCP. If you are not on Windows I’m sure you can find a similar tool to transfer files to Raspberry Pi.

We can add books by using calibredb command like this:

calibredb add Raspberry_Pi_Education_Manual.pdf --with-library=/home/pi/calibre/myLibrary

Please note if you try to use calibre instead of calibredb you’d get the following error:

calibre: cannot connect to X server 

Because we are using the GUI we cannot use calibre directly, instead we add it using calibredb.

Calibre always copies the files to its own library so once the books are added you can delete the original ones.

After the files are added refresh the page and you should get something like this:

At this point we can download the books on any machine on the local network.

Step 4: Connect from clients

  • Kindle

Kindle has an experimental browser (I currently have a Paperwhite, I don’t know about the newer versions). So to download books, I simply go to Settings -> Experimental Browser and enter the URL of my content server (http://{Local IP}:8080):

And after you download the file you can go to home and enjoy the book on your Kindle.

Please note that Kindle cannot download PDFs. When I tried to download Raspberry Pi manual I got the following error

Only files with the extension .AZW, .PRC, .MOBI or .TXT can be downloaded to your Kindle.

So make sure you upload the the right file formats.

  • iPad / Stanza

This is my favorite app on iPad. It’s not even on AppStore anymore but I still use it to download books from Calibre.

All I had to do was click Get Books and it found the OPDS server on the network automatically so I could browse and download books right away.

Stanza

  • iPad / Web

Alternatively you can just browse to server and open it with any eBook reader app available on your iPad.

Calibre UI on iPad

[Optional] Step 5: Setup Port Forwarding

For internal usage we are done! If you want to access your library over the Internet you have to define port forwarding rule. The way to do it is completely dependant on your router so you have to fiddle with your router’s administration interface.

Basically you map an external port to an internal IP and port.

For example I mapped port 7373 to local 192.168.175:8080 so whenever I connect to my home network’s external IP on port 7373 I get my Calibre user interface.

I recommend running the server with –username and –password flags so that only authenticated users can browse your library.

[Optional] Step 6: Setup Dynamic DNS

If you have a static IP you don’t need this step at all but generally personal broadbands don’t come with static IPs. My solution for this was using AWS Route 53 and updating the DNS using AWS Python SDK (Boto).

First I had to install pip to be able to install boto

sudo apt-get install python3-pip

Then boto itself

sudo pip install boto

I created an IAM user that only has access to a single domain which I use for this kind of stuff on Route 53 and added its credentials to the AWS credentials file as explained in the nice and brief tutorial here

The script calls AWS’s external IP checker and stores it in currentIP. Then gets the hosted zone and loops through all the record sets. When it finds the subdomain I’m going to use for Calibre (‘calibre.volki.info.’) it updates the IP address with the currentIP and commits the changes. Thanks to AWS that’s all it takes to create a Dynamic DNS application.

import boto.route53
import urllib2
currentIP = urllib2.urlopen("http://checkip.amazonaws.com/").read()

conn = boto.connect_route53()
zone = conn.get_zone("volki.info.")
change_set = boto.route53.record.ResourceRecordSets(conn, '{HOSTED_ZONE_ID}')

for rrset in conn.get_all_rrsets(zone.id):
    if rrset.name == 'calibre.volki.info.':
        u = change_set.add_change("UPSERT", rrset.name, rrset.type, ttl=60)
        rrset.resource_records[0] = currentIP
        u.add_value(rrset.resource_records[0])
        results = change_set.commit()

Of course this script needs to be added to crontab and should be run every 5-10 minutes. If the external IP changes there might be some disturbance to the service but it should just take a few minutes before it’s resolved.

With this script now we can access our library over the Internet and we don’t have to worry about changes in the IP address.

Resources

development, neo4j, graph databases, cypher comments edit

Visual tools are nice and all but they are not as fun as playing with a query language. When you write your own queries the possibilities are endless! In this post I’ll cover the basics of Cypher query language. Cypher is a declarative, SQL-like, pattern-matching query language for Neo4J graph databases.

Basic CRUD Operations

MATCH

Match clause allows to define patterns to search the database. It is similar to SELECT in the RDBMS-world.

MATCH (n) RETURN n

The query above returns all nodes in the database. In this example n is a temporary variable to store the result. The results can be filtered based on labels such as:

MATCH (n:Person) RETURN n

Property-based filters can be used in both MATCH and WHERE clauses. For example the two queries below return the same results:

MATCH (n:Movie {title: "Ninja Assassin"})
RETURN n
MATCH (n:Movie)
WHERE n.title = "Ninja Assassin"
RETURN n

Instead of returning the entire node we can just select some specific properties. If a node doesn’t have the property it simply returns null.

MATCH (a:Person)
RETURN a.name, a.born

Query returning all actors' names and years they were born in

Results can be filtered by relationships as well. For example the query below returns the movies Tom Hanks acted in

MATCH (p {name:"Tom Hanks"})-[:ACTED_IN]->(m)
RETURN p, m

Query results for movies Tom Hanks acted in

Sometimes we might need to learn the relationship from the query. In that case we can use TYPE function to get the name of the relationship:

MATCH (a)-[r1]->(m)<-[r2]-(a)
RETURN a.name, m.title, type(r1), type(r2)

Actors having multiple relationships with the same movie

Relationship type and direction can be omitted in the queries. The following query returns the names of the movies that have “some sort of relationship” with Tom Hanks:

Movie with Tom Hanks

OPTIONAL MATCH: OPTIONAL keyword fills in the missing parts in the results with nulls. For example the query below returns 1 row as null because there is no outgoing relationship from the movie The Matrix. If we didn’t use OPTIONAL we would have an empty resultset.

MATCH (a:Movie {title: 'The Matrix'})
OPTIONAL MATCH (a)-[]->(d)
RETURN d  

CREATE

To add new data CREATE query is used. In the simplest form the following query creates (a rather useless) node:

CREATE ()

Labels and properties can be set while creating new nodes such as:

CREATE (a:Actor {name: "Leonard Nimoy", born: 1931, died: 2015})

Relationships are created with CREATE as well. For example the following query creates 2 nodes with Person label and creates a MARRIED_TO relationship between them:

CREATE (NedStark:Person {firstname: "Eddard", lastname: "Stark", alias: "Ned", gender: "male"}),
	   (CatelynStark:Person {firstname: "Catelyn", lastname: "Stark", maidenName: "Tully", gender: "female"})
CREATE (NedStark)-[:MARRIED_TO]->(CatelynStark)

Relationships can have properties too:

CREATE
	(WaymarRoyce)-[:MEMBER_OF {order:"Ranger"}]->(NightsWatch)

Properties can be added or updated by using SET keyword such as:

MATCH (p:Person {firstname: "Eddard"})
SET p.title = "Lord of Winterfell"
RETURN p

The above query adds a “title” property to the nodes with label Person and with firstname “Eddard”.

An existing property can be deleted by REMOVE keyword

MATCH (p:Person {firstname: "Eddard"})
REMOVE p.aliasList

MERGE

Merge can be used to create new nodes/relationships or update them if they already exist. In the case of update, all the existing properties must match. For example the following query adds a new property to the node named House Stark of Winterfell

CREATE (houseStark:House {name: "House Stark of Winterfell", words: "Winter is coming"})

MERGE (houseStark {name: "House Stark of Winterfell", words: "Winter is coming"})
SET houseStark.founded = "Age of Heroes"

The following one, on the other hand creates a new node with the same name (because the words properties don’t match):

MERGE (houseStark:House {name: "House Stark of Winterfell", words: "Winter is coming!!!"})
SET houseStark.founded = "Age of Heroes"
RETURN houseStark

I find helpful when you have a long Cyper query and you might run it multiple times. If you just use CREATE every time you run the query you will end up with new nodes. If you use MERGE it will not give any errors and will not create new nodes.

Another way to prevent this is unique constraints. For example the following query will enforce uniqueness on _id property for nodes labelled as Book.

CREATE CONSTRAINT ON (book:Book) ASSERT book._id IS UNIQUE

Now we can run this query and create the books:

CREATE 
	(AGameOfThrones:Book {_id: "7abe0a1c-b9bd-4f00-b094-a82dfb32b053", title: "A Game of Thrones", orderInSeries: 1, released:"August 1996"}),
 	(AClashOfKings:Book {_id: "051dae64-dfdb-4134-bc43-f6d2b4b57d37", title: "A Clash of Kings", orderInSeries: 2, released:"November 1998"}),
 	(AStormOfSwords:Book {_id: "e9baa042-2fc8-49a6-adcc-6dd455f0ba12", title: "A Storm of Swords", orderInSeries: 3, released:"August 2000"}),
 	(AFeastOfCrows:Book {_id: "edffaa47-0110-455a-9390-ad8becf5c549", title: "A Feast for Crows", orderInSeries: 4, released:"October 2005"}),
 	(ADanceWithDragons:Book {_id: "5a21b80e-f4c4-4c15-bfa9-1a3d7a7992e3", title: "A Dance with Dragons", orderInSeries: 5, released:"July 2011"}),
 	(TheWindsOfWinter:Book {_id: "77144f63-46fa-49ef-8acf-350cdc20bf07", title: "The Winds of Winter", orderInSeries: 6 })

If we try to run it again we get the following error:

Cypher constraint error

Unique constraint enforces the existing data to be unique as well. So if we run the book creation query twice, before the constraint, then try to create the constraint we get an error:

To delete the constraint we use DROP keyword such as

DROP CONSTRAINT ON (book:Book) ASSERT book._id IS UNIQUE

DELETE

To delete nodes first we find them by MATCH. Instead of RETURN in the above examples we use DELETE to remove them.

MATCH (b:Book {_id: "7abe0a1c-b9bd-4f00-b094-a82dfb32b053"})
DELETE b

The above query would only work if the node doesn’t have any relationships. To delete a node as well as its relationships we can use the following:

MATCH (b {_id: "7abe0a1c-b9bd-4f00-b094-a82dfb32b053"})-[r]-()
DELETE b, r

We can generalize the above query to clean the entire database:

MATCH (n)
OPTIONAL MATCH (n)-[r]-() 
DELETE n, r

I find this quite useful during test and development phase as I need to start over quite often.

Useful keywords

  • ORDER BY

It is almost mandatory in all SQL-variants. Eventually you have to sort the data. For example the following returns all actors order by their names in alphabetical order:

  • LIMIT

Sometimes your query may return a lot of values that you’re not interested in. For example you may want to get top n results. In this case you can restrict the number of rows returned by LIMIT keyword such as:

The above query returns the top 3 actors who have the most ACTED_IN relationships with movies in descending order.

  • UNION

Another similar keyword from RDBMS is UNION. Similar to standard SQL, the returned columns must have the same number and names. For example the following will fail:

MATCH (p:Person)
RETURN p.name
UNION
MATCH (m:Movie)
RETURN m.title

But with using an alias it can be fixed like this:

MATCH (p:Person)
RETURN p.name AS name
UNION
MATCH (m:Movie)
RETURN m.title AS name

Measuring Query Performance

PROFILE keyword is very helpful as it lets you see the query plan and optimize it. It is very simple to use: You just put it before the MATCH clause such as

This is obviously a very simplistic example but I strongly recommend the GrapAware article on Labels vs Indexed Properties. PROFILE is heavily used to identify which option performs better in a given scenario. Also a great read to learn more about modelling Neo4J database.

Conclusion

Cypher is quite powerful and it can be very expressive in the right hands. It is quite comprehensive to cover in a blog post. I just wanted this post to contain various samples for the basic queries. I hope it can be of some help for someone new to Cypher in conjunction with the references I listed below.

Resources

aws comments edit

AWS team is certainly keeping themselves very busy. If you turn your head away for a while you end up falling behind the new features. In this post I’ll cover some of the fairly new features and improvements I’ve recently discovered and found useful and interesting.

Reserved Instance pricing model improvements

In the new simplified model they introduced 3 options: No upfront, partial upfront and full upfront. No upfront is a great feature as you don’t need to pay big amounts and just start paying less. Reserved instances are generally great for servers that run 24/7. For example currently if you launch a t2.micro Linux instance in EU-Ireland region you pay $0.014/hour. If you switch to No Upfront option the hourly rate falls to $0.010. (This is assuming the server will run continuously. You pay $7.30/month regardless of the instance’s state.)

AWS Reserved Instance Pricing

Auto recovery

This is a very cool feature that helps a great deal with system. maintenance. It’s not available in all regions yet (just N. Virginia as of this writing) but I’m guessing it will be rolled out to other regions soon. Now with this feature we can create an alarm for StatusCheckFailed_System metric and select “Recover this instance” as an action

AWS Instance Recovery

So when a machine is unresponsive it can be restarted automatically. Considering how many issues a simple restart can solve I think it’s a great feature :-)

Resource Groups

I’ve always found AWS Management Console a very dangerous place as all machines (dev, test prod) are all pooled and listed in the same place. In a careless moment you can easily terminate a production machine instead of a temporary test machine (that’s why I always turn on termination protection).

Resource groups, as the name implies, are meant to group related items. Although it’s primary goal is not to solve the problem I mentioned above (maybe it’s just me who sees it as a problem) but that’s I initially will use it for.

AWS Resource Groups Overview

Using resource groups is dead simple: Just add a tag to your resources (I added one called “type”) and create a resource group and use that tag as a filter. So now instead of listing the entire EC2 instancess I can just click on Dev group and play around with my development servers without even seeing the production machines.

AWS Resource Group Details

Lambda

AWS Lambda is a compute service that allows you to execute your code without launching an EC2 instance. It’s in preview mode at the moment. It’s not meant to run full-blown applications and you have no control over the actual machines running the code. It’s great for simple even-handlers such as triggering a function when an image is uploaded etc. Currently I’m working on an application that’s going to use this service so more on that later.

Aurora

I haven’t tried this one myself yet and currently it’s in preview stage. The premise is it’s fully-compatible with MySQL and costs 1/10th of a regular MySQL instance. It’s not quite clear whether it’s a fork of MySQL like MariaDB or just an independent RDBMS but either way being able to use it directly as a MySQL replacement sounds good. As an AWS fan I’m sure performance-wise it would be everything they said it would so no doubts there. I signed up for the preview and waiting for approval. When I get to try it out I will post my impressions as a separate blog post.

Conclusion

There is a big competition going on between the cloud computing platforms these days. Prices are constantly going down and new features are being added so it’s quite a lot of fun to follow the news in this market. I’ll try to keep an eye on new AWS features more closely from now on.

Resources

security, gadget comments edit

Wifi Pineapple has been around for quite some time now. Almost two years ago I posted a short review here but with the latest version (Mark V) it’s got badder than ever and they keep adding more features so it’s a nice time to catch up.

What is it?

Basically Wifi Pineapple is a WiFi honeypot that allows users to carry out man-in-the-middle attacks. Connected clients’ traffic go through the attacker which makes the attacker capable of pulling a number of tricks.

Mark IV was based on Alfa AP121U. Instead of buying a pineapple you could just buy an AP121U and create your own DIY pineapple by installing the firmware. Mark V, on the other hand, is a whole new animal.

Wifi Pineapple Mark V

Equipped with 2 radios it can work in client mode meaning it can piggyback on a nearby WiFi network and bridge the victim’s connections (In Mark IV the only way to provide internet access was 3G which is also still supported).

How does it work?

At the heart of the pineapple lies an attack method called Karma. It works by exploiting trusting devices to probe requests and responses. Our wireless devices, by default, constantly try to connect to the last networks they were on. To accomplish this they actively scan their neighbourhood by sending out probe requests. (A probe request is a special frame sent by a client station requesting information from either a specific access point, specified by SSID, or all access points in the area, specified with the broadcast SSID.)

Normally, access points (AP) that don’t broadcast the requested SSID just ignore the probe request. The correct AP responds with a probe response and the client initiates association with the AP again. That’s how we connect to our home or work network as soon as we arrive. That is convenient and user-friendly but the malicious devices running Karma attack can break this “honor-code” based system. The pineapple responds to whatever AP the device is asking for therefore deceiving it into believing they are home.

What’s the risk

Obviously these honeypots are set up for a reason: To get your traffic. Once you start sending your data through the attacker’s system you are opening yourself to all sorts of attacks. One of the most important ones is called sslstrip. (These “attacks” are called infusions in Pineapple parlance and they can be downloaded and installed in seconds to enhance its capabilities which makes the pineapple an even more powerful weapon)

sslstrip redirects your HTTPS traffic to HTTP equivalent (i.e. You land on http://www.google.com even though you requested https://www.google.com). So when you login to your favorite social network you essentially hand over your credentials to the attacker in plain text.

You can find a full list of available infusions here which can give you an idea what types of attacks are possible.

What’s New

As of Version 2 of the Pineapple firmware a new feature is introduced: PineAP. They define it as “the next-gen rogue AP”.

As Karma attack became more popular vendors started to increase security a little bit. Instead of sending out all the SSIDs that the device has connected in the past, the device simply sends a probe request that says “Broadcast” and the APs respond with a beacon with their information and what SSID(s) they are broadcasting. Then the client can decide which one to connect to. This method mitigates the regular Karma attack. To mitigate this mitigation (!) Pineapple team came up with PineAP.

PineAP has several modules:

  • Beacon Response
    Similar to probe responses this module sends beacons. For example if the client is looking for “Home Sweet Home” network, Karma sends a probe response with this SSID and at Beacon Response module sends a beacon with the same SSID making the pineapple look more legitimate.

  • Dogma This module sends out beacon frames of SSIDs selected by the attacker (defined in the PineAP SSID Pool). It also allows targeted attacks so you can set the target MAC address.

  • Auto Harvester Since SSID names are more likely to be kept as a secret it helps to collect this information. Harvester collects leaking SSIDs from the potential clients. They are added to the SSID pool to be used by Dogma.

Further Improvements and new features

  • Setup screen

A neat feature is, as an extra security step, when you first setup and connect to your pineapple you have to enter a random LED pattern.

LED Pattern

As a fan of multi-factor authentication of all sorts I loved the idea of using the physical hardware to improve security.

  • Management network

New Pineapple can setup a secure Wifi AP just for the owner so that you don’t have to connect via Ethernet. You can use your phone to check who is connected to the network and manage the pineapple completely by using the responsive UI.

  • Recon mode

This is another neat feature that allows to scan all AP and clients around. You can even carry out a deauth attack by just clicking on a client.

Mitigations

  • Never connect to open networks, NEVER!: If you are in the habit of using open wifi networks one day you might come across one of those pineapples in your coffee shop and hand over your data unknowingly to a guy sitting in the table next to you! Even without this risk you should never use networks that you have no control over but this kind of risk makes it even more important.

  • Verify SSL: SSL/TLS is still the most important security measure we have when connecting to websites. Always ensure that you are connected to the right website by checking the certificate. An attacker may not be able to break SSL certificate but they can simply bypass it completely by tools like sslstrip. It’s especially important in mobile devices since they tend to hide the address bar to make more room for the content so generally you never know the exact URL you are sending your requests to.

  • VPN: If you are using VPN your traffic is encrypted and sent through a secure channel. In this case, even if an attacker is able to get your traffic they will not be able to make any sense of it.

  • HSTS: This mechanism allows web servers to declare that web browsers (or other complying user agents) should only interact with it using secure HTTPS connections and never via the insecure HTTP protocol. It helps to mitigate the risks of tools like sslstrip. Downside is it’s not currently adopted by all browsers at the moment.

Conclusion

Wireless networks provide great convenience to us but comes with risks and vulnerabilities (as all conveniences in IT). The hardware is getting smaller and more powerful everyday so the tools like Wifi Pineapple are getting more threatening. It’s important to keep an eye on what kind of risks are out there and learn how to avoid those risks.

Resources

graph databases, neo4j comments edit

It always helps to know the ecosystem to get the maximum performance out of any development platform. In this post I will cover some of the tools that can be used to manage a Neo4J database.

Neo4J Browser

This tool comes out of the box and very handy to run Cypher queries and get visual results. I covered some of the stuff you can do with this tool in an earlier post but it’s capable of much more so very nice tool to have in your toolbelt. I think the only shortcoming is you cannot edit data visually. I think it would be very helpful to have the ability to manually add nodes or create new relationships in a drag & drop fashion. It would save time to write the queries from scratch every time but maybe it can have that feature in the future releases.

Linkurious

This is a paid online service. Unfortunately they don’t have a trial option. You can sign up for an online demo though.

Linkurious tutorial

It has a nice intuitive user interface that allows you to search and edit data via the UI.

Linkurious tutorial

When I tried to edit data or try the “Save as a Neo4J Database” feature I got errors.

Linkurious server error

I don’t know if it’s a limitation of the demo version or their system was having a bad day but I’m not convinced to fork over €249 for this tool yet.

UPDATE: After I published this post I’ve been informed that adding/editing feature is disabled in the demo. So the error messages were intentional and not because of a system failure.

Neoclipse

This is a free an open-source desktop application written in Java. It has some flaws (e.g. sometimes you have to reconnect to server to see the affects or your changes) but in general it’s a nice tool for visual editing. You can also run Cyper queries.

Another neat feature to further embellish the visualization is assigning icons to nodes.

Neoclipse preferences

You can specify a folder that contains your images. The image name must match the property specified in the “Node icon filename properties” textbox.

For example when I ran my sample Simpsons Cypher script I get the following graph:

Neoclipse visualisation with icons

Note that the values that should match the filename is case-sensitive and you may need to reconnect to server to see the changes (refresh doesn’t cut it). I learned it the hard-way :-)

Managing the properties of a node is very easy. You can edit the current values in-place on the grid and add new values by right-clicking on the Properties grid and select New and type of the property.

Neoclipse - Adding new property

It’s a nice tool for quickly editing data but it can easily be a memory-hog too! Once I noticed it was using 1.5GB RAM and the graph only around 100 nodes and relationships so I have some performance concerns about it with large datasets.

Gephi

Gephi is a general visualisation tool and thanks to its plugin support it could be extended to support Neo4J databases. (You need at least JDK 7 to install the plugin.)

Gephi plugins

You can download and install the Neo4J plugin manually or better yet you can just select Tools -> Plugins -> Available Plugins and search Neo4J.

Once installed you can then import a Neo4J database by selecting File -> Neo4J Database -> Full Import

Make sure to shutdown the Neo4J server before the import or you will get this very informative error message from Gephi:

Gephi database in use error

Apparently it locks the database as well so if you try to run Neo4J again while Gephi is still running you get this error:

Neo4J lock error

Clearly they don’t play well together so it’s best to run them separately.

After the import you can view the graph visually like this

Gephi imported Neo4J database

Doesn’t look as impressive as Neoclipse IMHO!

To view the data you can switch to Data Laboratory tab:

Gephi Nodes

Gephi Edges

In theory you can add nodes and export the database but that wasn’t a very successful endeavour for me. When you add a new node you can set the label but you cannot edit any properties.

Gephi new node

So that wasn’t quite helpful. I don’t know what you can do with a graph without any properties.

Gephi is still in beta phase. Also as plugins are developed by third-parties there might be some inconsistencies sometimes. I’ll leave this tool for the moment but it looks promising so I’ll put a pin to it for the time being.

Tom Sawyer Perspectives

This is also a generic visualisation tools that can work with multiple data sources. It can integrate with Neo4J as well as InfiniteGraph, a distributed graph database.

Downloading the trial software is a bit tricky. First you apply for an account. Your application is processed manually. After you are accepted, you first apply for a code to evaluate the product. Luckily it’s handled automatically and you receive the code right away. You then enter the code to have the privilege(!) to submit another form that details what type of project you’re planning to develop, what programming language you are using etc. That application is also processed manually. Currently I’m still waiting to be granted a trial license so I will not review the software for the time being. If I get to try it someday I will update this post.

Comparison

Tool Price Requirements Pros Cons
Neo4J Browser Free Web browser <ul><li>Comes with the server</li><li>Rich feature set</li></ul> <ul><li>No editing visually</li></ul>
Linkurious €249 Web browser <ul><li>Visual editing</li></ul> <ul><li> Expensive </li></ul>
Neoclipse Free / Open Source Java 1.6 <ul><li>Easy to use and edit data</li><li>Nice decoration options</li></ul> <ul><li>High memory usage</li><li>Glitches may cause disruption</li></ul>

Prototyping tools

My main goal in this quest was to find a tool that would allow me to edit data visually to speed up data entry. Apart from tools that can directly manipulate data there are also a bunch of modelling tools. I won’t cover them in depth in this post but might be helpful to have a short list of them at least so that to give some pointers.

OmniGraffle

OmniGraffle is a general purpose diagramming tool. It’s not free nor cheap but it has an iPad version so you can keep modelling on-the-go!

Arrow Tool

Arrow Tool is an open-source project developed by a Neo4J developer. It’s as simple as it gets and helps you to quickly create a model.

Conclusion

This is by no means an exhaustive list of the tools in the market. As graph databases gain more traction the number of such tools will exponentially increase.

Visual tools help a great deal sometimes to make sense of and see how the data is connected. But you need to have good Cypher skills to be able to run complex queries. In the next post I’ll go over Cypher and cover the basics.

Resources

development, neo4j, graph databases comments edit

I’m currently in search of a good tool to manage my graph database data. As they say a picture is worth a thousand words so visualization is very important to understand the nature of the data and make some sense of it. If nothing, it’s simply more fun!

Covering a user interface may seem unnecessary since they are generally pretty straightforward and Neo4J browser is no exception but after using it for some time I discovered some neat features that are not immediately obvious in this post so I’ll go over them. I will be using the beta version (v2.2 M04) so if you are using an older version you can compare the upcoming features with your current version.

You gotta know your tools to get the maximum benefit out of them so let’s dig in!

Accessing the Data

A major change in this version is that the authentication is being turned on by default.

Neo4J Authentication

So if you install it on somewhere that’s accessible from the Internet you can have a layer of protection.

If you close that login window by accident you can run the following command to bring it back:

:server connect

Similarly, if you want to terminate the connection you can run

:server disconnect

Until v2.2 M03 you got an authorization token back after the connection is established but since v2.2 M04 you don’t get a token. This is because authorization header is calculated by base64-encoding username:password pair.

Welcome message

If you need to connect using other tools you need to send authorization header with every request. This is Base64-encoded value of username/password pair separated with colon.

So in my case I changed the password to pass (I know it’s a horrible password, this is a just test installation :-))

So my authorization header value becomes:

Username:Password --> neo4j:pass
Base64-encoded value --> bmVvNGo6cGFzcw==

And with the correct authorization header I can get an HTTP 200 for the GET request for all the labels:

Results for all labels

Helpful commands and shortcuts

  • First things first: Help!

To discover all the commands and links to references you can simply run the help command:

:help

Help output

  • Clean your mess with Clear

Browser runs each query in a separate window and after using it a while you can have a long history of old queries. You can get rid of them easily once and for all by this command:

:clear
  • Use Escape to switch to full-screen query view

You can use Escape key to toggle to a full-screen query window. Just press Esc key and you’ll hide the previous query windows.

Toggle full screen

This is especially handy if you are dealing with long Cypher queries and can use some space. You can toggle back to old view (query and output windows below) by pressing Esc key again. Alternatively it will toggle back if you just run the query.

  • Use Shift+Enter for multi-line mode

Normally Enter key runs the queries but most of the time you’d need to work with queries spanning multi-lines. To accomplish that you can simply use Shift+Enter to start a new line and switch to multi-line mode. Once you switch to multi-line mode, the line numbers will appear and Enter will no longer run your query but will start a new line. To run queries in multi-line mode you can use Ctrl+Enter key combination.

  • Name your saved queries in favorites

There are some queries you run quite often. For example in my development environment I tend to delete everything and start from scratch. So I saved my query to the favorites by clicking the star button in the query window. Nice thing about it is if you add a comment at the beginning of your query the browser is smart enough to use it as the name of your query so you don’t have to guess which one is which when you have a lot of saved queries.

Named favorite

  • Play around with the look & feel

With the latest version you can mess around with the stylesheet that is used to visualize the results. In the favorites tab there is a section called “Styling / Graph Style Sheet”. You can see the styles used by clicking the Graph Style Sheet button. It’s not editable on the editor but you can export it to a .grass file by clicking on the Export to File icon.

Graph Style Sheet

After you make your changes you can import it back by dropping it to the narrow band at the bottom of the dialog window.

Import style sheet

You can still get the original styles back by clicking on the icon next to export (that looks like a fire extinguisher!) so feel free to mess with it.

  • Use Ctrl+Up/Down arrow to navigate through old queries

You can browse through your query history using Ctrl+Up/Down arrow. I find this shortcut especially helpful when you quickly need to go back to the previous query.

  • Click on a query to get it back

Once you run a query, the query itself and its output is encapsulated in a separate window so you can browse through old queries. If you need to run a query again you can simply click on the query. As you can see in the image below, a dashed line appears under the query when you hover over it to indicate it’s a link. When you click the query it populates the query window so you don’t need to copy/paste it.

Link to query

Conclusion

Currently the browser doesn’t allow editing data visually but for running Cypher queries it’s a great tool. If you have any tips and tricks to suggest or corrections to make, please leave a comment below.

UPDATE: I published this post using v2.2 M03 but as Michael Hunger kindly pointed out there were some changes in v2.2 M04 that made my post outdated from the get-go so I updated it accordingly. If you find any inconsistencies please let me know using the comments below.

development, neo4j, graph database comments edit

Graph databases are getting more popular every day. I played around with it in the past but never covered it extensively. My goal is now to first cover the basics of graph databases (Neo4J in particular), cover Cypher (a SQL-Like Query Language for Neo4J) and build a full-blown project using these. So this will the first post in a multi-part series.

Neo4J is providing nice training materials. Also I’m currently enjoying active Safari Books Online and Pluralsight subscriptions so I thought it might be a good time to conduct a comprehensive research and go through all of these resources. So without further ado, here’s what I’ve gathered on Graph Databases:

Why Graph Databases

Main focus of graph databases is the relationships between objects. In a graph database, every object is represented with a node (aka a vertex in graph theory) and nodes are connected to each other with relationships (aka an edge).

Graph databases are especially powerful tools for heavily connected data such as social networks. When you try to model a complex real-world system you end up having a lot of entities and connections among them. At this point a traditional relational model starts to be sluggish and hard to maintain and this is where graph databases come to rescue..

Players

Turns out there are many implementations and it’s a broader concept as they have different attributes. You can check this Wikipedia article to see what I mean. As of this writing there were 41 different systems mentioned in the article. A few of the players in the field are:

  • Neo4J: Most popular and one of the oldest in the field. My main focus will be on Neo4J throughout my research
  • FlockDB: An open-source distributed graph database developed by Twitter. It’s much simpler than Neo4J as it focuses on specific problems only.
  • Trinity: A research project from Microsoft. I wish it was released because probably it would come with native .NET clients and integration but looks like it’s dead already as there is activity since late 2012 on its page.

There are three dominant graph models in the industry:

  • The property graph
  • Resource Description Framework (RDF) triples
  • Hypergraphs

The Property Graph Model

  • It contains nodes and relationships
  • Nodes contain properties (key-value pairs)
  • Relationships are named and directed, and always have a start and end node
  • Relationships can also contain properties
  • No prior modelling is needed but it helps to understand the domain. The advice is start with no schema requirements and enforce a schema as you get closer to production.

Basic concepts

As I will be using Neo4J, I decided to focus on the basic concepts of Neo4J databases (the current version I’m using is 2.1 and 2.2M03 which is still in beta)

  1. Nodes - Graph data records
  2. Relationships - Connect nodes. They must have a name and a direction. Sometimes direction has semantic value and sometimes the connection if bothways like a MARRIED_TO relationship. It doesn’t matter which way you define it both nodes are “married to” each other. But for example a “LOVES” relationship doesn’t have to be bothways so the direction matters.
  3. Properties - Named data values
  4. Labels - Introduced in v2.0 They are used to tag items like Book, Person etc. A node can have multiple labels.

Conclusion

Graph databases are on the rise as can be seen clearly from the chart (taken froom db-engines.com):

db-egines popularity chart

It feels very natural to model a database as a graph as they can handle relationships very well and in real-life there are many complex relationships in semi-structured data. So especially at the beginning starting without a schema and have your model and data mature over time makes perfect sense. So it is understandable why graph databases are gaining traction everyday.

In the next post I will delve into Cypher - the query language of Neo4J. What good is a database if you can’t run queries on it anyway, right? :-)

Resources

review, event, place, personal comments edit

I wish I had hobbies! In the past I tried drawing comic books and drumming. They are nice pastime activities but they require a lot of effort and dedication. I just couldn’t find it in me to spend so much time on these activities.

I realized the only activities I don’t mind spending too much time on are developing software and playing with gadgets. One way to improve on these is to join a group. I tried OpenHack meetup group for a while but there was not much going on. Then I discovered this gem: London Hackspace!

Home, sweet home!

I’m writing this post on a comfy couch in their HQ on Hackney Road. This is my first time here but I already fell in love. It’s a great place full of gadgets, laser cutter, 3D-printers, wood and metal workshops, robotics corner and much more! Basically there are no rules. It’s open 24 hours and you can come in anytime you want and can work on any project you want. They also have a very comprehensive library full of tech books.

Everything is DIY. Even the membership card is not given to you automatically but you add your Oyster or any other RFID card to the system by yourself.

They have a very busy events calendar as well. Almost every day there is some event going on. From robotics, to 3D printers or lock-picking. There is no way you cannot find something that you’re not interested in. Check out their Flickr photostream. You can see how diverse the activities are. There is even a bad TV night on Saturday nights!

Long story short, these are just my first impressions but I’m planning to spend a WHOLE LOT of time in this place. I’m sure I will learn a ton of new stuff and have a lot of fun.

Resources

review, book, productivity, career comments edit

One of the challenges of being a software developer is keeping current. Technology moves so quickly that it’s very easy to feel overwhelmed. At times like this you can definitely use some productivity tips. This is how I landed on this book. It has productivity tips alright, but it has so much more to offer.

Soft Skills by John Sonmez

The software developer’s life manual

The subtitle explains it clearly in a few words. The book is not only about productivity but it encompasses all aspects of life. From money management to fitness, from career tips to handling failures it indeed proves to be a life manual.

This is unique book in the sense that it has sections about fitness and spirit. To be honest when I first started reading the book I was planning to skip those sections. But then it hit me: Most of the challenges I have to face are not technical. Technical problems are easy to solve especially in the age of Google and StackOverflow and all sorts of online resources. The real challenge is keeping the motivation up, staying healthy and fit (standard beer & pizza based diet of a developer doesn’t cut it) to increase productivity and overall success, managing your money wisely (which unfortunately means I shouldn’t buy every gadget I see!)

Structure of the book

The book is 504 pages and divided into 7 main sections:

  1. Career
  2. Marketing yourself
  3. Learning
  4. Productivity
  5. Financial
  6. Fitness
  7. Spirit

In the first 2 sections, you can find many helpful tips & tricks regardless you are an employee, freelancer or an entrepreneur. Apparently John Sonmez went through all these stages in his career so he knows what he is talking about.

As I mentioned keeping current is key in our industry and Section 3 addresses this need very effectively.

Since there is so much to learn time management becomes an issue on its own and that’s when you need to improve your productivity. You might not like some of the advice he gives (like quitting watching TV!) but deep inside you’d know he is right.

Money management is especially crucial if you are trying to run your own business. I don’t think finance is any developer’s strongest suit so this section is something to read twice for people like me!

I recently lost a lot of weight and now feel much better about myself. And better yet I feel a lot more productive. Physical fitness is very important. Since our job doesn’t force us to be fit it’s very easy and tempting to let it go. But the consequences are dire! So incorporating a healthy diet and exercise is extremely important to be successful. This section comes with a lot of great advice and techniques to address this.

There’s no way you can achieve any significant level of success if you are not mentally prepared for it. Final section of the book covers mind and soul. From handling relationships to getting back on horse after a failure, it comes with a lot of useful advice and references to good resources.

These 7 sections are divided into 71 chapters in total and they are very easy to read. Since it takes minutes to read each chapter I never had to put it away in the middle of a chapter.

About the author

I like and respect the author, John Sonmez, as he is very prolific and successful. So from the first page I knew for sure the information in the book is not theoretical or made up in any way. Even Uncle Bob’s prologue is an evidence that this guy is methodical, persistent and follows the plan until he achieves success. This is exactly the kind of person you would like to get career advice from!

Conclusion

Generally I find such books fluffy and stating the obvious but not this one because I know I’ve been following the author’s work quite some time now. I know he is very candid and sincere and so I know every word in the book is written by a developer who already had similar challenges and tackled them. Overall, this is definitely a great book to read and keep around for future reference.

Resources

review, gadget comments edit

I already had 3 Raspberry Pis so you might think it’s ridiculous to get excited for a 4th one but you would be wrong! Because this time they revised the hardware and made major upgrades.

Raspberry Pi 2

The new Pi is rocking a quad-core Cortex A7 processor and it comes with 1GB memory. Rest of the specs remained the same (including the price!) but the processor and memory made a huge impact.

For more detailed comparison check the benchmark results on the official website

I tried using a web browser in the old versions and it was painfully slow. So I decided to use the previous versions for background tasks like a security camera or network scanning. The B+ is powerful enough to run XBMC and play videos but even that is slow when it transitions between different screens. So seeing the new Pi can be a full desktop replacement is really exciting.

What else is new

Pi 2 comes with a few applications installed like Mathematica and Wolfram, utilities like a PDF reader and text editor and even Minecraft!

Minecraft on Raspberry Pi 2

What’s next

I don’t have a project in mind at the moment for this one yet. Since the architecture has breaking changes a lot of existing OS versions don’t work on it yet like Raspbmc and Kali Linux. Maybe I can wait and install Kali Linux on it and practice some security features. Or hopefully Microsoft releases Windows 10 for IoT soon enough so I can play with it. I’ll keep posting as I find more about my new toy! In the meantime I’m open to all project ideas so please leave a comment if you anything in mind!

Resources

development, fsharp comments edit

I finished the Pluralsight course finally. I’m still studying F# 2 pomodoros a day but lately lost he motivation to publish the notes. In this post I’ll tidy up the notes. In order to maintain my cadence I think I had better develop more stuff instead of covering theoretical subjects.

More Notes

  • Upcasting / Downcasting To upcast :> operator is used. Alternatively the keyword upcast can be used to achieve the same results. It performs a static cast and the cast is determined at runtime.

Downcasting is performed by the :?> operator or downcast keyword.

  • Abstract classes Abstract classes are decorated with **[]**. To mark members **abstract** keyword is used:
abstract Area : float with get
abstract Perimeter : float  with get
abstract Name : string with get
  • obj is shortcut for System.Object
  • do bindings perform actions when the object is constructed. do bindings are usually put after let bindings so that code in the do binding can execute with a fully initialized object.
  • unit is equivalent of void
  • tabs are not allowed as they can indicate an unknown number of space characters and as spaces and indents matter
  • ‘a means generic. By default a function f is infered as ‘a -> bool meaning it takes a general parameter and returns boolean
  • Providing an incomplete list of functions result in a new function (currying)
  • Getting values from tuples: fst gets the first value, snd gets the second value
  • You can attach elements to a list by using the :: (cons) operator
  • @ operator Concatenates two lists.
  • Exceptions can be thrown using raise function. Reraise function rethrows an exception
  • BigInt (C# and VB) don’t have support for arbitrarily long integers

Resources

events comments edit

I love podcasts as they help me to learn and keep current with the news during idle times. My favourite network is TWiT. They provide all sorts of shows from general tech news to specialized shows like Windows Weekly which focuses on Microsoft world.

A few days ago I received an email from the Windows Apps London meetup group (which made me feel good about my decision for joining) telling me that Mary Jo Foley will be in town and there will be a Windows Weekly live show. Needless to say, I signed up instantly!

I’ve been watching this show for more than 3 years so I exactly knew what to expect. But it’s still very exciting to see Mary Jo in person and being a part of the show. That pixelated being in the back is me:

Me in Windows Weekly

I’ve always considered Mary Jo to be a very kind and polite person and I still maintain that position. Even stronger after seeing how she treats everyone. She was always cheerful, up-beat and friendly all along.

Windows Weekly in London with Mary Jo Foley 1

Windows Weekly in London with Mary Jo Foley 2

A few notes

  • I always download the on-demand version and never watch live. I learned that apparently Leo is always late to the shows! It started 15 minutes late but Mary Jo was not surprised one bit!
  • There were Surface Pro 3s all around. I have never seen so many of them in one place at the same time actually. As a proud owner myself I felt validated :-)

Resources