devaws csharp, javascript, route53, lambda, amazon_api_gateway, nodejs

Part 2: Converting TLD Provider into an API

In all fairness, this part is not entirely necessary for the project. I already found a way to get the list of TLDs from Amazon documentation so I could easily integrate it with my client applications. But I recently heard about Amazon API Gateway which has been introduced a few weeks back so I thought it would be cool to access my JavaScript method via a web API hosted on AWS. After all, the point of many projects I develop is to learn new stuff! In light of that I’ll try something new with the C# version as well and use a self-hosted service so that I can use a Windows Service instead of IIS.

API #1: Amazon API Gateway

There are a couple of new technologies here that I haven’t used before so this was a good chance for me to play with them.

Amazon API Gateway looks like a great service to create hosted APIs. Also it’s integrated with AWS Lambda. You can bind an endpoint directly to lambda function. There are other binding options but in this project I will use Lambda as it was also in my to-learn list.

Setup Amazon API Gateway

API Gateway interface is very intuitive. Within minutes I was able to create a GET method calling my Lambda function. First you create the API by clicking the “Create API” button!

Then you add your resource by clicking “Create Resource”. By default it comes with the root resource (“/”) so you can just use that one as well to add methods.

I created a resource called tldlist. All I needed was a GET method so I created it by “Create Method”.

You select the region and enter the full ARN of your Lambda function. In the UI it just says “function name” but it requires full ARN (i.e.: arn:aws:lambda:eu-west-1:1234567890:function:getTldList)

…and Lambda

The function is a bit different from the previous version. In Node.js I wasn’t able to use XMLHttpRequest object and turns out your use the http module to make web requests so I modified the code a bit. Here’s the final version of my Lambda function:

console.log('Loading function');

var docHost = 'docs.aws.amazon.com';
var docPath = '/Route53/latest/DeveloperGuide/registrar-tld-list.html';
var supportedTldPage = docHost + docPath;
var http = require('http');

exports.handler = function(event, context) {
    var options = {
      host: docHost,
      path: docPath,
      port: 80
    };
    
    http.get(options, function(res) {
        var body = '';
        res.on('data', function(chunk) {
            body += chunk;
        });
        
        res.on('end', function() {
            console.log(body);
            var result = parseHtml(body);
            context.succeed(result);
        });
    }).on('error', function(e) {
        console.log("Got error: " + e.message);
    });     

};

function parseHtml(pageHtml) {
    var pattern = /<a class="xref" href="registrar-tld-list.html#.+?">[.](.+?)<\/a>/g;
    var regEx = new RegExp(pattern);

    var result = {};
    result.url = 'http://' + supportedTldPage;
    result.date = new Date().toUTCString();
    result.tldList = [];

    while ((match = regEx.exec(pageHtml)) !== null) {
        result.tldList.push(match[1]);
    }

    return result;
}

In order to return a value from Lambda function you have to call succeed, fail or done:

context.succeed (Object result);
context.fail (Error error);
context.done (Error error, Object result);

Succeed and fail are self-explanatory. done is like a combination of both. If error is non-null it treats it as failure. Even if you call fail or done with an error the HTTP response code is always 200. what changes is the message body. For example, I played around with a few possibilities to test various results:

Method call: context.succeed(result);  
Output: Full JSON results

Method call: context.done(null, result);  
Output: Full JSON results

Method call: context.fail("FAIL!!!");  
Output: {"errorMessage":"FAIL!!!"}

Method call: context.done("FAIL!!!", results);  
Output: {"errorMessage":"FAIL!!!"}

As you can see, if error parameter is not null it ignores the actual results. Also I removed the JSON.stringify call from the parseHtml method because API gateway automatically converts it to JSON.

Tying things together

Deployment is also a breeze, just like creating the API, resource and the methods all it takes is a few button clicks. You click Deploy API and create an environment such as staging or prod. And that’s it! You’re good to go!

Since this will be a public-facing API with no authentication I also added a CloudWatch alarm:

This way if some mental decides to abuse I will be aware of it. The good thing is it’s very cheap. It costs $3.5 per million API calls which is about £2.25. I don’t think it will break the bank but for serious applications authorization is a must so I will need to investigate that feature in the future anyway.

At this point, I have a Lambda function called from the API hosted by AWS. I don’t have to worry about anything regarding the maintenance and scaling which feels great!

API #2: C# & Self-hosting on a Windows Service

Speaking of maintenance, here comes the Windows version! I don’t intend to deploy it on production but I was meaning to learn self-hosting APIs with Web API to avoid IIS and here’s chance to do so.

Found this nice concise article showing how to run a Web API inside a console application. I tweaked the code a little bit to suit my needs. Basically it takes 4 simple steps:

Step 01. Install OWIN Self-Host NuGet package:

Install-Package Microsoft.AspNet.WebApi.OwinSelfHost

Step 02. Setup routing

public class Startup
{
    public void Configuration(IAppBuilder appBuilder)
    {
        HttpConfiguration config = new HttpConfiguration();
        config.Routes.MapHttpRoute(
            name: "DefaultApi",
            routeTemplate: "api/{controller}/{id}",
            defaults: new { id = RouteParameter.Optional }
        );

        appBuilder.UseWebApi(config);
    }
}

Step 03. Add the controller

public class TldController : ApiController
{
    public HttpResponseMessage Get()
    {
        string supportedTLDPage = ConfigurationManager.AppSettings["AWS-URL"];
        var tldProvider = new TldListProvider();
        var tldList = tldProvider.GetSupportedTldList(supportedTLDPage);
        var output = new {
            url = supportedTLDPage, 
            date = DateTime.UtcNow.ToString(),
            tldList = tldList.Select(tld => tld.Name)
        };
        return this.Request.CreateResponse(HttpStatusCode.OK, output);
    }
}

Step 04. Start OWIN WebApp

public void Start()
{
    string baseAddress = ConfigurationManager.AppSettings["BaseAddress"];
    WebApp.Start<Startup>(url: baseAddress);
    Console.WriteLine("Service started");
}

Final step is installation. As I used TopShelf all I had to do was running a command prompt with administrator privileges run this command:

TldProvider.Service.exe install

Now that my service is running in the background and accepting HTTP requests let’s take it out for a spin:

Brilliant!

What’s next?

So now I have an API that returns me the supported TLD list. In the next post I’ll work on a basic client that consumes that API and AWS Route53 to get the availability results finally.

Resources

devaws csharp, javascript, route53

Part 1: Getting supported TLD List

Around this time last year Amazon announced the Route 53 update which allowed domain registrations.

Domain search in Route 53

Unfortunately, the AWS Management Console UI doesn’t allow searching all TLDs at once for a given domain. So I set out to write a console application in C# that retrieved the TLDs from AWS API in order to “enhance” AWS a bit. As all of my projects, the scope got out of hand very quickly so I decided to break this adventure into smaller parts. In this post I’ll talk about how to get the supported TLD list.

AWS: Everything is API-based, right? Not quite!

The first step to a domain checker application is to acquire a list of TLDs to check. So I started exploring AWS API to get the supported TLD list. To my surprise there wasn’t any! I asked the question in AWS support forums and the response is it’s not supported. There is a webpage that has the list but that’s it! So until they make it available I decided to do a little scraping to generate the list myself. Obviously this is not the proper way of doing it and it’s prone to breaking very easily but they gave me no choice!

First method: C# Library

It’s just one method and it was easy to implement (apart from fiddling with regular expressions!)

public List<Tld> GetSupportedTldList()
{
    Uri supportedTldPage = new Uri("http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/registrar-tld-list.html");
    string pageHtml = string.Empty;
    using (var webclient = new WebClient())
    {
        pageHtml = webclient.DownloadString(supportedTldPage);
    }

    string pattern = "<a class=\"xref\" href=\"registrar-tld-list.html#.+?\">[.](.+?)</a>";
    var matches = Regex.Matches(pageHtml, pattern);

    var result = new List<Tld>();
    foreach (Match m in matches)
    {
        result.Add(new Tld() { Name = m.Groups[1].Value });
    }

    return result;
}

It first downloads the raw HTML from the AWS documentation page. The TLDs on the page have a link in this format:

<a class="xref" href="registrar-tld-list.html#cab">.cab</a> 

The regular expression pattern retrieves all anchors in this format. Only the ones that contain a dot in the value are retrieved to eliminate irrelevant links. To extract the actual TLD from this matching string I used parenthesis to create a group. By default the whole matched string is the first group. Additional groups can be created by parenthesis so that we can only get the values we are interested in. For example the group values for a match look like this

m.Groups[0].Value: "<a class=\"xref\" href=\"registrar-tld-list.html#academy\">.academy</a>"
m.Groups[1].Value: "academy"

Adds these extracted values to a list (for now Tld class only contains Name property, more info like type of TLD or description can be added in the future)

Second method: JavaScript

It’s not very complicated with C# to accomplish this task so I thought it would be even easier with JavaScript as I already tackled the regular expression bit. I couldn’t be wronger!

I created a simple AJAX call with jQuery and got the following error:

Apparently you cannot get external resources willy nilly using jQuery! It took some time but I found a neat workaround here It’s essentially a jQuery plugin sending the request to query.yahooapis.com instead of the URL we request. Apparently Yahoo API returns the results to a callback method we provide. I hadn’t heard about YQL before. The pitch on their site is:

"The YQL (Yahoo! Query Language) platform enables you to query, filter, and combine data across the web through a single interface." 

so a bit explains why they allow Cross-domain queries.

In the test page I added the plugin script

<script-- src="jquery.xdomainajax.js"></script>

Then the main function using jQuery worked just fine:

var supportedTldPage = "http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/registrar-tld-list.html";

function getTldList(callback) {
    $.ajax({
        url: supportedTldPage,
        type: 'GET',
        success: function (res) {
            var pageHtml = res.responseText;
            var jsonResult = parseHtml(pageHtml);
            callback(jsonResult);
        }
    });
}

Another thing I tried was using pure JavaScript:

function getTldList(callback) {
    var request = new XMLHttpRequest();
    request.open("GET", supportedTldPage, true);
    request.onreadystatechange = function () {
        if (request.readyState != 4 || request.status != 200) return;

        var pageHtml = request.responseText;
        var jsonResult = parseHtml(pageHtml);
        callback(jsonResult);
    };
    request.send();
}

This version doesn’t work by default because of the same CORS restrictions. But apparently there’s an extension for that. I installed it and it worked like a charm. Wondering what was happening behind the scenes I captured the request with Fiddler and it looked like this:

Host: docs.aws.amazon.com
Connection: keep-alive
Cache-Control: max-age=0
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36
Origin: http://evil.com/
Accept: */*
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,tr;q=0.6
If-None-Match: "19ce8-51b1a64cfe880-gzip"
If-Modified-Since: Fri, 17 Jul 2015 23:17:38 GMT

Origin was “http://evil.com”! Works great as long as it’s not null! In the sample code I’ll leave both versions. Of course all this hassle is to bypass browser restrictions. Depending on your use case, you can always fire the HTTP request above in Fiddler and get the results. I guess it’s a nice mechanism if you need to implement a closed system and want to ensure that resources are not consumed from the outside. Of course for public web pages like this which are meant to be accessed from anywhere in the world there are no restrictions on the server side.

So that now we can get the raw HTML from AWS documentation, the rest is smooth sailing. The part that handles finding the TLDs in the HTML is very similar:

function parseHtml(pageHtml) {
    var pattern = /<a class="xref" href="registrar-tld-list.html#.+?">[.](.+?)<\/a>/g;
    var regEx = new RegExp(pattern);

    var result = {};
    result.url = supportedTldPage;
    result.date = new Date().toUTCString();
    result.tldList = [];

    while ((match = regEx.exec(pageHtml)) !== null) {
        result.tldList.push(match[1]);
    }

    var myString = JSON.stringify(result);
    return myString;
}

The regular expression pattern is the same. The only difference is .exec method doesn’t return mutliple matches so you have to loop through unless there are no matches.

So finally the result in JSON format is:

{
  "url": "http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/registrar-tld-list.html",
  "date": "Tue, 29 Jul 2015 20:26:02 GMT",
  "tldList": [
    "academy",
    "agency",
    "bargains",
    "bike",
    "biz",
    "blue",
    "boutique",
    "-- removed for brevity --",
    "eu",
    "fi",
    "fr",
    "it",
    "me",
    "me.uk",
    "nl",
    "org.uk",
    "ruhr",
    "se",
    "wien"
  ]
}

What’s next?

Now I have the supported TLD list. I can easily use it in my application but where’s the fun that? I’d rather convert this code into an API so that we can get the new TLDs whenever they are added to that page. I just they don’t change their HTML markup anytime soon! :-)

So in the next post I’ll work on the API…

Resources

dev dotnet, configuration, github

I hate configuration files! They contain highly sensitive information and there is always a risk of them leaking out. Especially when I read horror stories like this I remind myself to be more careful about it. It is particularly important when starting off with a private repository and converting it to open-source.

There are a few approaches and I will try to make pros and cons list for each of them which hopefully would make it easier for me to make a decision.

Method 1: Ignore config files completely

Simply add to gitignore before checking in the config files. This is not really a viable option but to cover all basis I wanted to add this one as well.

Pros

  • Very easy to implement
  • Easy to be consistent (use the same gitignore everywhere)

Cons

  • When someone checks out the project initially it wouldn’t compile because of the missing config file.
  • For a new user there is no way to figure out what the actual config should look like
  • For internal applications, you have to maintain local copies and handle distribution among developers manually.

Method 2: Check in configuration with placeholders

Check in the initial file then run the following command:

git update-index --assume-unchanged <file>

After this you can add the actual values and use the config as usual. But those values will be ignored. This way when someone checks out they can at least compile the project.

Pros

  • Less disruptive as the project can be compiled

Cons

  • When you make a change to the config file (e.g. add/rename keys) you can’t check in those changes as well

Method 3: Ignore config file, check in a sample file instead

This is essentially a merger of Method 1 and 2. Maintain two files (e.g. App.config and App.config.sample). Ignore the app.config from the getgo and only check in .sample file. Structurally it will be exactly the same as app.config without the confidential values.

Pros

  • Won’t compile by default, extra step for the people who check out (small one but still)

Cons

  • Both files need to be kept in sync manually
  • Still no online copy of the actual config file available

Method 4: Reference to another configuration file

A slight variation of the previous method. .NET allows cascading and linking with configuration files. For example say this is my app.config file:

<configuration>
	<appSettings file="Actual.config">
		<add key="Twilio.AccountSID" value="SAMPLE SID" />
		<add key="Twilio.AuthToken" value="SAMPLE TOKEN" />
		<add key="Non-sensitive-info" value="some value" />
	</appSettings>
</configuration>

I can link the appSettings section to another file named actual.config which would look like this:

<appSettings>
  <add key="Twilio.AccountSID" value="REAL 123456" />
  <add key="Twilio.AuthToken" value="REAL Token" />
  <add key="Non-sensitive-info" value="some value" />
</appSettings>

Actual.config never goes into source control. When the application cannot find the actual.config it just uses the values in the app.config. When the actual.config is present, those values override the sample values.

Pros

  • Only keys with sensitive values can be hidden

Cons

  • Doesn’t work with custom config sections
  • Still no online copy of the actual config file available

The idea is to maintain a private repository that contains the config files and add it as a submodule:

git submodule add git@github.com:volkanpaksoy/private-config.git

And link to that configuration file in the app.config as in Method 4.

Pros

  • Actual values are online and accessible

Cons

  • Requires a paid account for private repositories
  • More complicated

Verdict

As I already have a paid GitHub account the con for Method 5 is not quite valid for me so I think I will go with that one. Otherwise the actual values will be stored locally on the disk only and will eventually get lost. Looks like there is no convenient and easy way of doing it after all. If you have any suggestions I’m all ears.

As I said, I hate configuration files!

Resources