hobby productivity, apple watch, alexa, philips hue, iot comments

My #1 rule for productivity is “No Snoozing!”. If you snooze, it means you are late for everything you planned to do and that is a terrible way to start your day. This post is about a few tools, tips and techniques I use to prevent snoozing.

Tip #1: Sleep well

This is generally easier said done but there’s no way around it. You MUST get enough sleep. Otherwise, sooner or later your will power will get weaker and weaker. Eventually you’ll succumb to the sweet temptation of more sleep.

Tip #2: Place the alarm away from your bed

This way when your alarm (most likely your phone) goes off you have to make a deliberate attempt to get up and turn it off. If you have to get out of the bed you’re more likely to not to get back to it straight away.

Tip #3: Use multiple alarms with different sounds

It gets easier to wake up if you can surprise yourself! Human beings are so good at adapting to every condition we very easily start getting used and ignore the same alarm sound going off at the exact same time every day. I find it useful to change the alarm times and sounds every now and then.

Tip #4: Use Apple Watch

After Apple Watch Series 4 was released I got myself one.

I’m not sure if it’s worth the cost but when it comes to waking up a little vibration on your wrist can do miracles apparently!

When you have it pair with your iPhone, by default you can stop the alarms from your watch. This may be a nice convenience feature in some cases, but when it comes to waking up we are trying to make it as hard as possible for ourselves to turn the alarms off.

My trick is:

First I disable “Push alerts from iPhone” in the Watch app.

Then I create a separate alarm on watch for the same time.

This way I get 2 alarms at the same time. It’s easy to stop the watch as it’s within my arm’s reach. While the haptic feedback of the watch wakes me up the alarm on the phone also goes off. Now I have to physically get out of the bed to stop that one as well.

Tip #5: Use Alexa

Another gizmo to set an alarm is Alexa but you can do much more than just that with Routines.

Tip #5.1: Play a playlist

This tip requires Spotify Premium subscription.

First, create yourself a nice, loud and heavy playlist of “waking up” music. I prefer energetic Heavy Metal songs from Lamb of God and Slayer. The trick here is to play a random song every morning. Similar to Tip #3, the same song every morning becomes very boring very quickly. But having a random one keeps you surprised every morning. I use this command to play my playlist in shuffle mode:

Shuffle Playlist '{Playlist name}'

Tip #5.2: Turn the lights on

A good sleep tip is to keep your bedroom as dark as possible. That’s why I have all black curtains in my room and it’s quite dark. The downside is it’s so good for sleep it makes waking up even harder!

That’s why I bought myself a Philips Hue smart bulb and as part of my waking up routine Alexa turns it on along with playing the Spotify playlist.

This is what my routine looks like:

Conclusion

For me snoozing is a cardinal sin so I’m always on the lookout for improving my arsenal to fight against snoozing. Hope you find something useful in this post too. If you have tips on your own feel free to leave a comment.

Resources

docker devops, github, backup comments

A while back I created a PowerShell script to backup my GitHub account and blogged about it here. It’s working fine but it requires some manual installation and setup and I didn’t want to do that every time I needed to deploy it to a new machine. That’s why I decided to Dockerize my script so that everything required can come in an image pre-installed.

TL;DR:

  • There’s a Powershell script that allows you to back up your GitHub account including private repositories here: Source Code
  • There’s a Docker image that encapsulates the same script which can be found here: Docker Image
  • Below are my notes on Docker that I’ve been taking while working on Docker-related projects. Proceed if are interested in some random tidbits about Docker.

Lessons Learned on Docker

  • Shortcut to leave container without stopping it: Ctrl P + Ctrl Q

  • Build a new image from the contents of the current folder

      docker image build -t {image_name} .
    
  • Every RUN command creates a new layer in the image. Image layers are read-only.

  • Connect to an already running container:

      docker attach {container id or name}
    
  • Save a running container as an image:

      docker commit -p 78727078a04b container1
    
  • Remove image

      docker rmi {image name}
    

    This requires the image doesn’t have any containers created off of it. To delete multiple images based on name:

      docker rmi $(docker images |grep 'imagename')
    
  • List running containers

      docker container ls
    
  • To list all containers including the stopped ones:

      docker ps -a
    
  • Delete all stopped containers

      docker container prune
    
  • To delete all unused containers, images, volumes, networks:

      docker system prune
    
  • Copy files to and from a container

      docker cp foo.txt mycontainer:/foo.txt
      docker cp mycontainer:/foo.txt foo.txt
    
  • To overwrite the entrypoint and get an interactive shell

      docker run -it --entrypoint "/bin/sh" {image name}
    
  • Tip to quickly operate on images/containers: Just enter the first few letters of the image/container. For example if your docker ps -a returns something like this

      1184d20ee824        b2789ef1b26a                  "/bin/sh -c 'ssh-k..."   18 hours ago        Exited (1) 46 seconds ago                         happy_saha
      7823f76352e3        github-backup-04              "/bin/sh"                18 hours ago        Exited (255) 21 minutes ago                       objective_thompson
    

    you can start the first container by entering

      docker start 11
    

    This is of course provided that there aren’t any other containers whose ID start with 11. So no need to enter the full ID as long as the beginning is unique.

  • To get detailed info about an object

      docker inspect {object id}
    

    This returns all the details. If you interested in specific details you can tailor the output by using –format option. For example the following only returns the LogPath for the container:

      docker inspect --format ''  {container id}
    
  • To get the logs of the container:

      docker logs --details --timestamps  {container id}
    
  • Docker for Mac actually runs inside a Linux VM. All docker data is stored inside a file called Docker.qcow2. The paths that are returned are relative paths in this VM. For instance if you inspect the LogPath of a container it would look something like

      /var/lib/docker/containers/{container-id}/{container-id}-json.log
    

    But if you check your host machine, there is no /var/lib/docker folder.

    In Docker preferences it shows where the disk image is located:

    This command let me to go into VM

      screen  ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty
    

    Then I was able to navigate to /var/lib/docker and peek inside the volumes where the data is persisted. This virtualization does not exist in Linux and you can view everything on the host machine straight away.

  • Enter a running container (Docker 1.3+):

      docker exec -it {container-id} bash
    
  • Copy an image from one host to another: Export vs Save

    Save, saves a non-running container image to a file:

      docker save -o <save image to path> <image name>
      docker load -i <path to image tar file>
    

    Export, saves a container’s running or paused instance to a file

      docker export {container-id} | gzip > {tar file path}
      docker import {tar file path}
    

Resources

linux ec2, ebs comments

I recently decided to purchase a reserved t3.nano instance to run some Docker containers and for general testing purposes. In addition to the default volume I decided to add a new one to separate my files from the OS. It required a few steps to get everything in place so I decided to post this mostly for future reference!

Attach a volume during creation

First I added a new volume to the instance while creating it.

Connecting to instance

Now we have to connect to the instance to format the new volume. To achieve that we must have access to the private we generated while we created the instance. So to SSH into the machine we run this command:

ssh -i {/Path/To/Key/file_name.pem} ec2-user@{public DNS name of the instance}

Format the volume

I found some AWS documentation to achieve this which was very useful: Making an Amazon EBS Volume Available for Use on Linux

No need to repeat every command in that documentation. It’s a simple step-by-step guide. Just follow it and you have a volume in use which is also mounted at start up.

Install and configure Docker

Installing Docker is as simple as running this:

sudo yum update -y
sudo yum install -y docker

To be able to use Docker without sudoing everything ad ec2-user to docker group:

sudo usermod -aG docker ec2-user

We need to make sure that Docker daemon starts on reboot too. To achieve this run this:

sudo systemctl enable docker

Copy files to the instance

To copy some files to the new instance I used SCP command:

sudo scp -i {/Path/To/Key/file_name.pem} -r {/Path/To/Local/Folder/} ec2-user@{public DNS name of the instance}:/Remote/Folder

The issue was ec2-user didn’t initially have access to write on the remote folder. In that case you can run the following command to have access:

setfacl -m u:ec2-user:rwx /Remote/Folder

Resources

aws cloudtrail, security, audit comments

Another important service under Management & Governance category is CloudTrail.

A nice thing about this service is that it’s already enabled by default with a limited capacity. You can view all AWS API calls made in your account within the last 90 days. This is completely free and enabled out of the box.

To get more out of CloudTrail we need to create trails

Inside Trails

A couple of options about trails:

  • It can apply to all regions
  • It can apply to all accounts in an organization
  • It can record all management events
  • It can record data events from S3 and Lambda functions

Just be mindful about the possible extra charges when you log every event for all organization accounts:

Testing the logs

I created a trail that logs all events for all organization accounts. I created an IAM user in another account in the organization. In the event history of the local account events look like this:

These events now can be tracked from master account as well. In the S3 bucket these events are organized by account id and date and they are stored in JSON format:

Conclusion

Having a central storage of all events in across all regions and accounts is a great tool to have. Having the raw data is a good start but making sense of that data is even more important. I’ll keep on exploring CloudTrail and getting more out of it to harden my accounts.

Resources

aws security, aws config, audit comments

In my previos blog post I talked about creating an IAM admin user and using that instead of root user all the time. Applying such best practices is a good idea which also begs the question: How can I enforce these rules?

AWS Config

The official description of the service is: “AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources”

What this means is you select some pre-defined rules or implement your custom rules and AWS config constantly runs checks against your resources and notify you if you have non-compliant resources.

Since currently I’m interested in hardening IAM users in the next example I’m going to use an IAM check

Use case: Enforcing MFA

As of this writing there are 80 managed config rules. To enforce MFA, I simply searched MFA in the “Add rule” anf got 5 matches of which I selected only 3:

After I accepted the default settings it was able to identify my IAM user without MFA:

And it comes with a nice little dashboard that shows all your non-compliant resources:

It also supports notifications via SNS. It creates a topic and all you have to do is subscribe to that via an email address and after confirming your address you can start receiving emails.

I was only expecting to get emails about non-compliant resources but it’s bit noisy as it sends emails with subjects “Configuration History Delivery Completed” or “Configuration Snapshot Delivery Started” which didn’t mean much to me.

Pricing

I think the price is exteremely high. The details can be found on their pricing page but in a nutshell a single rule costs $2/month. So for the above example I paid $6 which is a lot of money in terms of resources used.

Conclusion

I like the idea of having an auditing system with notifications but for this price I don’t think I will use it.

I will keep on exploring though as I’m keen on implementing my custom rules with AWS config and also implementing them without AWS config and see if this service adds any benefit over having scheduled Lambda functions.

Resources

aws iam, security, best practices comments

When you create a new AWS account you are the root user who has unlimited access to everything. Using such a powerful user as root on a day-to-day basis is not such a good idea because if it gets compromised you may not have a way to override and/or undo the changes done by the hacker.

Using IAM user instead of root account

Instead, suggested best practice is to create an admin-level IAM account and use it for normal operations. At first I was hesitant to adopt this practice. I didn’t see the point and thought attaching AdministratorAccess policy awould make the use as powerful as root. But there’s a whole list of things that even the most powerful IAM user cannot do. Here’s the list: AWS Tasks That Require AWS Account Root User Credentials

So as you can see root user has important privileges such as closing the account and changing billing information. Even if your account gets compromised and some mailicous person gains access using an IAM account, you can still log in as root and take necessary action.

In a nutshell, based on AWS documentation the following practices are recommended:

  • Use the root user only to create your first IAM user
  • Create an IAM user for yourself as well, give that user administrative permissions, and use that IAM user for all your work.

In addition to Eric Hammond suggests in his blog to delete the root account password as well and use Forgot Password option to create a new one when needed. I keep my passwords in a password manager so if that application is compromised, the hacker can reset my password as well so I don’t follow this practice but it might come in handy if you have to write your password down somewhere.

Templated IAM user creation

It’s a good practice to create an IAM user right after you create your AWS account. It’s even a better practice to automate this process. To achieve this I created a CloudFormation template. The YAML template below does the following:

  • Creates an IAM group named administrators
  • Creates a user named admin
  • Attaches AdministratorAccess policy to the group
  • Forces the user to change their password first time they log in (by attaching IAMUserChangePassword policy to the user)
Resources:
  AdministratorsGroup:
    Type: AWS::IAM::Group
    Properties:
      GroupName: "administrators"
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AdministratorAccess
      Path: /

  AdminUser:
    Type: AWS::IAM::User
    Properties: 
      Groups:
        - !Ref AdministratorsGroup
      LoginProfile:
        Password: "CHANGE_THIS"
        PasswordResetRequired: true
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/IAMUserChangePassword
      Path: /
      UserName: "admin"

Resources

aws cloudformation comments

If you have a habit creating your AWS resources manually, things can get very messy very quickly. At some point you realize that you have no idea if a resource is still in use and just to be safe you leave it alone.

I found myself in this situation and decided to take advantage Infrastructure as Code paradigm using AWS CloudFormation. To start simple I decided to migrate my automated CV response application to a CloudFormation stack. Going forward this is a much efficient way to write blog posts too. Instead of writing step by step instructions I can simply post the CloudFormation stack in JSON or YAML.

Infrastructure of Code in a nutshell

Basically this approach allows you to define, manage and provision all the resources that define your system.

Advantages:

  • Changes can be source-controlled
  • Entire provisioning process can be automated
  • entire infrastructure can be easily recreated in a different account.
  • Resources can easily be identified. Tags can be used to identify which stack the resources belong to.
  • Resources can easily be clean up. Deleting a stack will delete all the resource it created.

Basic Terminology

  • Stack: All the resources used to create an infrastructure.
  • StackSet: A StackSet is a container for AWS CloudFormation stacks that lets you provision stacks across AWS accounts and regions by using a single AWS CloudFormation template.
  • Design template: This is the file YAML or JSON format that defines all the resources that will be created AWS

Where to start…

Even though you get familiar with the concepts finding where tostart can be intimidating sometimes. When it comes to CloudFormation, there are a lot of sample templates that you can start and build upon.

So here’s an easy way to get started:

Step 1: Save the following snippet to a local file such as cloudformation.sample.yaml:

Resources:
  Ec2Instance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t2.micro
      ImageId: ami-0aff88ed576e63e90

In the example above I’m using a stock AWS Linux AMI in London region.

Step 2: Go to Stack and click Create Stack (Make sure you’re in EU London region otherwise AWS won’t be able to find the AMI specified in our template)

Step 3: In specify template section, select Upload a template file

Step 4: Clock Choose file to locate your file and upload your template.

Step 5: Click next and specify a stack name something like FirstStackForEC2 and click Next

Step 6: Click Next on Configure Stack Options view and in the final review step of the wizard click Create Stack.

At this point you can observe the progress of your stack being created.

Now if you go to EC2 service in the same region you should be able to see the new instance:

If you delete the stack, it will in turn delete everything it created which is the EC2 instance in this example.

Launch Stack URLs

I always like the launch stack buttons that I see every now and then. I think there’s something magical about clicking a button and watching an entire infrastructure being created right before your eyes!

Basically clicking the Launch Stack URL opens the wizard we just used with the fields populated with the values in the template.

Step 1: Upload the template to S3 and make it publicly accessible.

Step 2: Use the following naming convention and replace the values:

https://console.aws.amazon.com/cloudformation/home?region=region#/stacks/new?stackName=stack_name&templateURL=template_location

In this example I uploaded the Launch Stack button image to my GitHub repository so that I can link to it.

Resources

aws aurora, rds, serverless comments

A few months ago AWS announced a serverless model for their Aurora databases. Compared to traditional DB approach this is brand new.

I’ve been trying it out for a pilot application and it works well in general. You pay for what you use just like any other serverless resource.

The only problem I’ve been having is DB startup time after pause. Meaning after 5 minutes the resources are released and the first request that comes after that suffers a performance penalty. My application was getting an error when this happened and it was showing an error screen. Obviously from a user standpoint it’s not a great experience.

So to remedy this issue I’ve updated the DB connection timeout

Connection Timeout=120

By default it’s 15 seconds which is not enough for the new server to respond. But after increasing the timeout at least I could prevent the application from failing. Of course this doesn’t speed up the response time of the DB server.

They recently announced additional regions that support serverless Aurora.

For cost-cutting reasons this can be a great option. Especially if your system is idle for extended periods of time you don’t need to pay anything. Also it scales up so you don’t have to worry about the database bottlenecks under heavy traffic.

Resources

aws cloudwatch, custom metric, devops comments

I had an issue recently with an EC2 instance running out of disk space. Unfortunately free disk space is not a metric that comes out of the box with AWS CloudWatch. This post is about implementing a custom metric and getting notifications via AWs CloudWatch based on that metric.

Steps to monitor disk space with CloudWatch

Step 1: Download sample config file

AWS provides a sample JSON file at this location: https://s3.amazonaws.com/ec2-downloads-windows/CloudWatchConfig/AWS.EC2.Windows.CloudWatch.json

Download a copy of this file.

Step 2: Set IsEnabled to true

By default it comes disabled so set the value as shown below:

"IsEnabled": true

Step 3: Add the custom metric for disk usage

Add the custom metric to monitor disk space:

{
    "Id": "PerformanceCounterDisk",
    "FullName": "AWS.EC2.Windows.CloudWatch.PerformanceCounterComponent.PerformanceCounterInputComponent,AWS.EC2.Windows.CloudWatch",
    "Parameters": {
        "CategoryName": "LogicalDisk",
        "CounterName": "% Free Space",
        "InstanceName": "C:",
        "MetricName": "FreeDiskPercentage",
        "Unit": "Percent",
        "DimensionName": "InstanceId",
        "DimensionValue": "{instance_id}"
    }
}

Step 4: Add the new metric to flows

After defining the metric we need to add it to the flows so that it can be sent to CloudWatch. To achieve this update the flows section as shown below:

"Flows": {
    "Flows": 
    [
        "(ApplicationEventLog,SystemEventLog),CloudWatchLogs",
        "(PerformanceCounter,PerformanceCounterDisk),CloudWatch"
    ]
}

Step 5: Add IAM role to server

It’s a good practice to manage permissions of EC2 instances via IAM roles assigned to the machine. To enable sending logs to CloudWatch add AmazonEC2RoleForSSM policy to the machine’s role

Without this role SSM agent service gets an access denied error.

Step 6: Restart Amazon SSM Agent service

Either by using Windows Services Manager or running the following command:

Restart-Service AmazonSSMAgent

Once this is all done wait a few minutes and check CloudWatch metrics. Under All -> Windows/Default you should be able to see new metric under InstanceId group (as that’s what we are using to group the logs). And when you click the metric you should be able to see a nice time-based graph of free disk space on the server:

Notes

  • It’s useful to know where SSM Agent’s logs are stored. They can be found in this path:

    %PROGRAMDATA%\Amazon\SSM\Logs\

  • The service reports every 5 minutes. The PollInterval in the JSON file is in seconds and is different than service report interval.

Resources

awssecurity organizations, iam comments

I have never been a huge fan of AWS Management Console. Some reasons for that being:

  • Inconsistencies: In some services you can search by anything (such as tag value in EC2 dashboard) whereas in others you have to put in the exact start of the object (such as CloudWatch)
  • Regional separation: Some might like it but I find it confusing and error-prone. If you need to work in multiple regions you have to constantly change the region from the dropdown menu. If you accidentally create a resource in another region you wouldn’t see it’s still running until you accidentally switch back to that region again. But S3 seems to be an exception to this as you can select the region while creating the bucket and you can see all in the same list (speaking of inconsistencies…)
  • Flat resource structure: Every resource is mixed together in an account. If you have multiple projects or teams in your company, you would see all the resources they created among yours. Also there is no environment concept. Your test and production resources live side by side.

This post is about AWS Organizations which addresses the 3rd point in the list above.

What is AWS Organizations?

It is a way to centrally manage multiple accounts inside an organization by creating a hierarchy between accounts.

Benefits of having an account structure

  • No need to label everything with project/team/environment name
  • Production and non-production resources don’t live side by side
  • Better access controls: No need to grant access on resource level. It can be done much easily on account-level

Also from a cost point of view it has the following benefits (taken from AWS Account Structure Considerations)

  • Grouping resources that require different payment instruments
  • Providing groups with different levels of administrative control over AWS resources
  • Better controlling Reserved Instances for specific workloads
  • Identifying untaggable costs such as data transfer
  • Using accounts associated with different business units or functional teams

Key Concepts

Account: Your regular AWS account. The first account you create is called a Master Account, the rest are Member Accounts.

Organization: A group of related accounts. The account creating the organization becomes the master account.

The star next to the account indicates it is the master account.

Organizational Unit: You can use organizational units (OUs) to group accounts together to administer as a single unit. This can be any logical grouping such as team, project, environment etc.

Service Control Policies (SCPs): Enables you to restrict, at the account level of granularity, what services and actions the users, groups, and roles in those accounts can do

Managing Projects and Environments

First I was tempted to separate projects as well but I’d end up with too many accounts so abandoned that idea and adopted an environment-based organizational structure. I ended up having these AWS accounts in my organization:

  • Dev
  • Integration
  • UAT (User Acceptance Test)
  • Sandbox
  • Production

Then I created an organizational unit named Stages and moved all these accounts under that OU. This is just one way of structuring projects. Based on organizational needs it can be customized. In my case I decided to keep all shared services (logging, auditing, source code) in the master account.

Logging into accounts

This baffled me at first. Initially I created a test account which I wanted to delete later on. But I wasn’t able to do that until I completed the sign up steps which in turn I wasn’t able to because I didn’t have the credentials to log in!

As stated in this document:

When you create a new account, AWS Organizations initially assigns password 
to the root user that is a minimum of 64 characters long. All characters 
are randomly generated with no guarantees on the appearance of certain 
character sets. You can't retrieve this initial password. To access the
account as the root user for the first time, you must go through the
process for password recovery.

So when you follow the sign-in link it redirects you to IAM login page. I needed to switch to root account login and recover my password by using the Forgot My Password link. On that note: Don’t use fake email addresses as you will need the confirmation email to recover your password.

Removing account issues

This one was tricky. In order to leave an organization first you need to enter a payment method and select a support plan. This way the account becomes eligible to be a standalone account. Only after that, you can Leave organization. But not right away!

After I entered all the data and completed the setup steps, I clicked Leave organization and I got this error:

I waited almost a full day after getting this error but to no avail. I kept getting the same error: “This operation requires a wait period. Try again later.”

I had a chat with a support engineer and created a case for this. Nothing helped at first but after a few days I tried again and it worked! So either the waiting period was very long or they fixed something in my account unbenownst to me.

Deleting the account without removing

Another issue I had was deleting an account before removing from the organization. I was assuming that if the account was closed permanently it would be removed from the organization as well. This was not the case. It remains listed as Suspended

Unfortunately, once this happens there is no way of resolving it using the tools at our disposal. The only solution is to contact AWS support, reactivated the suspended account, leave the organization and close it again!

But you have to do it from the suspended account, not from the master account. Since technically you’re requesting support for another account, they won’t do it (as they told in their response). Good news is that we still have access to support even though the account is suspended. Si I went to support page and created a support request to reinstate my account (so that I could close it again shortly after!)

Another option might be just to wait. I haven’t tried it myself bu in the account closure email it states “After 90 days, you will not be able to reopen your account, and any remaining content in your closed account will be deleted.” So I’m guessing it will be gone completely if I could just for 90 days.

Managing Accounts Programmatically

The coolest thing about AWS Organizations is accounts can be created via command line. It’s easy as this:

JOTUNHEIM:~ volkan$ aws organizations create-account --account-name {NAME}  --email {EMAIL}
{
    "CreateAccountStatus": {
        "Id": "xyz-abcabcabcabcabcabcabcabcabcabcab",
        "AccountName": "{NAME}",
        "State": "IN_PROGRESS",
        "RequestedTimestamp": 1532494396.633
    }
}

Notes

  • Free tier is shared among all accounts in the organization: “If your company creates your AWS account through AWS Organizations, free tier eligibility for all member accounts begins on the day the organization is created.”

Conclusion

This post is meant to be an introduction to AWS Organizations rather than a complete guide. I will post similar ones as I use this as basis of my infrastructure and build on this.

Resources

linuxsysops raid comments

AWS and cloud computing are awesome but I still enjoy having a server at home. I’ve decided to reinstate my old desktop. Replaced the disks with shiny new ones (A 500GB SSD and 2 4TB drives for data) and installed Ubuntu 18.04.

My primary goals were:

  • Partition SSD drive and use it for data that needs performance
  • Set up RAID1 on 2 4TB disks so that 1 disk failure wouldn’t result in data loss
  • Set up some sort of notifications to monitor SSD disk health
  • Set up some sort of notifications to monitor RAID disk health

Having all these in place was important for me to have a solid, reliable system before I started building stuff on top of it.

Partitioning

By default Ubuntu only adds a 512MB /boot/efi partition and leaves the rest for root (/). But since I recently had some free space issues in the boot drive recently I decided to create a boot partition as well. Also added a 32GB Swap partition. Swap is is virtual memory and it should have the same size as the computer’s memory.

So I ended up allocating 32GB for home as well and left the rest for the root. Going forward now I have more than 300GB to use for application data, Docker images etc.

RAID1

Now it’s time to set up a RAID array to have some redundancy. They are shiny new disks but you can never fully rely on them and they will eventually fail. Probably it’s not the best idea to install 2 identical disks at the same time as their lifecycle will likely end at similar times. So I’ll need to keep an eye on them and set up some monitoring and notifications (more on that later).

As my guide I started using this article called Setting up RAID 1 (Mirroring) using ‘Two Disks’ in Linux – Part 3 to set up my RAID.

This is a very nice article and explains everything step by step already so I’m not going to duplicate it here. But I bumped into an issue while following the guide: 4TB drives. My disks were MBR which only supported up to 2TB and I had to convert them into GPT disks.

The answer was parted. I followed the steps in this answer and managed to partition the drives to 4TB (3.7 to be exact!)

Testing RAID1

Now that I had a RAID it was time to go out and test it. First thing I wanted to see was it mounted automatically after a reboot. It did mount the drives but there was a problem with RAID. At boot the device was showing as md127 instead of md0 and it was failing to sync.

My mdadm.conf file looked like this when this was happening:

ARRAY /dev/md0 level=raid1 num-devices=2 metadata=1.2 name=asgard:0 UUID={device id}
   devices=/dev/sda1,/dev/sdb1

After some Googling I found the answere here

The solution was simply removing the name parameter from the file!. After that, when I rebooted and entered the following commands I was able to see a healthy RAID:

cat /proc/mdstat

and

mdadm -D /dev/md0

Monitoring System Disk with smartmontols

Now that everything looked in place, I needed to ensure it stays that way!

To achieve that goal, I edited this file /etc/smartd.conf and added this line:

DEVICESCAN -o on -H -l error -l selftest -t -m my.email@address.com -M test

Set up sending emails from server

When I first installed smartmontools, it asked Postfix configuration which looks like this:

Since I was interested in getting results fast, I selected No Configuration and moved on. Now it’s time to configure it.

To bring up this screen again I entered the following command:

dpkg-reconfigure postfix

I followed the wizard, entered my email domain. Then followed this documentation from AWS: Integrating Amazon SES with Postfix

As always, I ended up having some issues :-).

First problem was it didn’t work! It wasn’t finding the SES SMTP server as relay server and was always trying to send emails from localhost. The solution was here

As the instructions say, I updated /etc/postfix/main.cf with the values below:

myhostname = localhost
mydestination = $myhostname, localhost.$mydomain, localhost, $mydomain

and I was able to send emails to the SMTP server. But the SES didn’t like my IP address and the email bounced. The solution to that was to create an IP filter in SES and allow the traffic from that address.

Then I restarted the service to test

service smartmontools restart

and received the notification. Actually received 3 emails for some reason.

The service runs at startup so this way I can be notified whenever it reboots too.

Monitoring RAID with mdadm

Took some time to complete synchronizing 4TB disks but finally I was ready to rock:

Apart from smartmontools, mdadm application is also capable of sending emails when a disk fails.

The documentation tells to add MAILADDR followed by an email address to specify target email address but in my tests adding the line didn’t change anything.

In fact, turns out by default it’s sending the notifications. As my server was set up to send emails now, by just entering the following command to send out a test email I was able to receive it

mdadm --monitor --scan --test -1

The problem is it’s now using root@mydomain.com all the time now. I wasn’t able to change it. But as long as that mailbox exists at least I can receive notifications.

Conclusion

All hardware eventually fails. Disk failure is especially annoying because it may cause some precious data loss. Apart from good backup practices, it’s also helpful to have a good monitoring system and redundancy on the disks we use.

It took me some time to set up my system but now at least I have 1 disk redundancy for large amounts of data and the ability to be notified whenever something goes wrong with the disks which gives me some peace of mind (not too much though :-)).

Resources

hobby fitness, concept2, rowing comments

I’ve been using a Concept2 Model D rowing machine for some time now and quite enjoying it as a form of workout. (Primarily because I can still watch Netflix or Youtube while rowing!)

Concept2 Model C Rowing Machine

Since I have some data accumulated in it I decided to have a look into ways of getting it and working on it hoping that it give me some insights about possible ways of improving my stats.

Official Tools

To be honest the existing toolset that comes out of the box is quite sufficient.

LogBook

This is the official web application where you can monitor your workouts.

Concept LogBook

This application is quite good really. You can manually enter your workouts, view the existing history. Create teams and participate in challenges so there’s also a social aspect to it.

iOS App: ErgData

The monitor connected to the rowing machine (Performance Monitor - PM5) supports Bluetooth connection which can be easily paired with an iPhone. If you install ErgData app on your phone you can sync the device with your phone and get them out that way very easily. Better yet, it allows you to upload your workouts to LogBook. After you complete a workout, you can easily upload the results by clicking Sync.

Concept2 ErgData app

Unofficial Tools

RasPiRowing

I found this nice Raspberry Pi based project called RasPiRowing developed one of the staff members of Concept.

Since I’m a fan of Raspberry Pi have a whole bunch of them lyting around, it didn’t take me long to install it and use it. It works just fine and comes with a fun fish game too:

FishPi Game

It’s a nice way of interacting with the Concept2. Since it can be accessed by a Python application I can build my own applications as well to get data out of the erg.

Developer Tools

SDK

There is an SDK available to download for both Mac and Windows.

I installed the Mac version which extracts the files under /Users/{username}/C2 PM SDK/

But I couldn’t find much useful stuff in there:

SDK contents

Tried to build the XCode project but gave a build error and I just left it at that.

API

They also provide an API wich can be used to get the data out. This sounds the most interesting part to me as I can develop my own custom tools based on this API.

In the documentation, they advise to use the dev site first while trying out the API and then request using the live data. Also you need to register your application with Concept2 to be able to use their APIs.

Resources

devaws s3, csharp comments

When it comes to transferring files over network, there’s always a risk of ending up with corrupted files. To prevent this on transfers to and from S3, AWS provides us with some tools we can leverage to guarantee correctness of the files.

Verifying files while uploading

In order to verify the file is uploaded successfully, we need to provide AWS the MD5 hash value of our file. Once upload has been completed, AWS calculates the MD5 hash on their end and compares the both values. If they match, it means it went through successfully. So our request looks like this:

var request = new PutObjectRequest
{
    MD5Digest = md5,
    BucketName = bucketName,
    Key =  key,
    FilePath = inputPath,
};

where we calculate MD5 hash value like this:

using (var stream = new FileStream(fullPath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
    using (var md5 = MD5.Create())
    {
        var hash = md5.ComputeHash(stream);
        return Convert.ToBase64String(hash);
    }
}

In my tests, it looks like if you don’t provide a valid MD5 hash, you get a WinHttpException with the inner exception message “The connection with the server was terminated abnormally”

If you provide a valid but incorrect MD5, the exception thrown is of type AmazonS3Exception with the message “The Content-MD5 you specified did not match what we received”.

Amazon SDK comes with 2 utility methods named GenerateChecksumForContent and GenerateChecksumForStream. At the time of this writing, GenerateChecksumForStream wasn’t available in the AWS SDK for .NET Core. So the only method worked for me to calculate the hash was the way as shown above.

Verifying files while downloading

When downloading we use EtagToMatch property of GetObjectRequest to have the verification:

var request = new GetObjectRequest
{
	BucketName = bucketName,
    Key =  key,
    EtagToMatch = "\"278D8FD9F7516B4CA5D7D291DB04FB20\"".ToLower() // Case-sensitive
};

using (var response = await _s3Client.GetObjectAsync(request))
{
    await response.WriteResponseStreamToFileAsync(outputPath, false, CancellationToken.None);
}

When we request the object this way and if the the MD5 hash we send doesn’t match the one on the server we get an exception with the following message: “At least one of the pre-conditions you specified did not hold”

Once important point to keep in mind is that AWS keeps the hashes in lowerc-ase and the comparison is case-sensitive so make sure to convert everything to lower-case before you send it out.

Resources

devaws certification, certified cloud practitioner comments

As I decided to get full AWS certification I started preparing for the exams. I wanted to start with the Cloud Practitioner just to get my self accustomed with the exam procedure in general. Here’s my notes:

Exam Objectives

According to Amazon’s official exam description page, this exam validates the following aspects:

  • Define what the AWS Cloud is and the basic global infrastructure
  • Describe basic AWS Cloud architectural principles
  • Describe the AWS Cloud value proposition
  • Describe key services on the AWS platform and their common use cases (for example, compute and analytics)
  • Describe basic security and compliance aspects of the AWS platform and the shared security model
  • Define the billing, account management, and pricing models
  • Identify sources of documentation or technical assistance (for example, whitepapers or support tickets)
  • Describe basic/core characteristics of deploying and operating in the AWS Cloud

Main Subject Areas

  1. Billing and pricing (12%)
  2. Cloud concepts (28%)
  3. Technology (36%)
  4. Security (24%)

Preparation Notes

aws.training Online Training Notes

Cloud Computing

  • On-demand delivery of IT resources. Can scale up and down based on needs.
  • Fosters agility (number one reason why customers switch to cloud computing): Speed (global reach), experimentation (operations as code, templated environments with CloudFormation) and culture of innovation (experiment quickly with low cost)
  • Region vs Availability Zone (AZ): Region is a physical location in the world which contains multiple AZs. AZs contain one or more discrete data centers with independent resources and housed in different facilities.
  • Using Auto Scaling and ELB, scale up and down and only pay for what you use.
  • Ability to deploy systems in multiple regions (lower latency)
  • Ability to choose the region where data is stored
  • AWS is responsible for data center security
  • Security policy can be formalized (as code)
  • Ability to recover from failures

Core Services

  • Global Infrastructure:
    • Regions: Have multiple AZs
    • Availability Zones: Have one or more data centres. They all have different power supplier companies.
    • Edge Locations: Used by CloudFront.
  • Amazon Virtual Private Cloud (VPC)
    • Uses same concepts as on-premise networking
    • VPC can span across multiple AZs
    • Supports multiple subnets (each of which can be deployed in a different AZ)
    • Can create public-facing subnets and private-facing subnets within the same VPC
    • Each account can create multiple VPCs
    • Using fewer VPCs is recommended to avoid complexity
    • Can assign Internet Gateways to specific subnets to allow public access

  • Security Groups
    • Act like a built-in firewall
    • Best practice: Allow what’s required only and block everything else
  • Compute Services
    • Amazon Lightsail: Managed Virtual Private Servers service
      • Fixed price.
      • Includes a static IP, DNS management and storage
      • Fixed configuration
      • Uses t2 class EC2 instances under the hood
    • AWS Elastic Compute Cloud (EC2)
      • Difference betwwen EC2-Classic and EC2-VPC
        • EC2-Classic: Your instances run in a single, flat network that you share with other customers.
        • EC2-VPC: Your instances run in a virtual private cloud (VPC) that’s logically isolated to your AWS account.
    • AWS Lambda
      • No servers to manage
      • Pay as you go: Only pay for the time your code runs
      • Continuous scaling
      • Supports subsecond metering. Charged for every 100 milliseconds of execution time
      • Some limitations apply: AWS Lambda Limits
    • AWS Elastic Beanstalk
      • Platform as a service
      • Allows quick deployments of applications
      • Allows HTTPS on load balancers
      • Supports various platforms (node.js, python etc)
      • Provisions the resources required (EC2, ELB etc) automatically
    • Application Load Balancer
      • 2nd type of load balancer offered by ELB

      • Comes with new features

      • Supports routing to containers
      • Key terms:
        • Listeners: A process that checks for connection requests using the configuration (protocol, port)
        • Target: Destination for traffic
        • Target Group: Each target group routes requests to one or more registered targets
      • Target checks can be performed per target group basis
      • Integrates with ECS and supports dynamic ports utilized by scheduled containers
      • Need to create at least 2 AZs when creating an Application Load Balancer
      • Ability to route to different target groups based on port or path
    • Elastic Load Balancer
      • Supports sticky sessions
      • Supports multiple AZs and cross-zone balancing
      • For HTTP/HTTPS it uses “Least Outstanding” method to route the request. For TCP, it uses “Round robin”. The least outstanding routing algorithm is defined as “A ‘least outstanding requests routing algorithm’ is an algorithm that choses which instance receives the next request by selecting the instance that, at that moment, has the lowest number of outstanding (pending, unfinished) requests.”
    • Auto Scaling
      • Adding more instances: Scaling out, terminating instanes: Scaling in
      • Launch configuration answers “What” (AMI, Instance type, Security Groups, Roles). Creating an LC is similar to creating a new EC2 instance.
      • Auto Scaling Group answers “Where” (VPC and subnet(s), load balancer, minimum and maximum instances, desired capacity)
      • Auto Scaling Policy answeres “When” (Scheduled/on-demand/scale out or in policy)
  • Amazon EBS
    • Allows point-in-time snapshots and creation of a new volume from a snapshot
    • Supports encrypted volumes free of charge
    • EBS volume must be created in the same AZ as the EC2 instance that will use it
  • Amazon S3
    • Objects are stored redundantly across multiple facilities withing the same region
    • The bucket names must be globally unique.
    • Can configure cross-region replication for backup and disaster recovery
    • Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket
  • Amazon Glacier
    • Vaults have access and lock policies attached to them
    • Each AWS account can create up to 1000 vaults
    • Can create an S3 lifecycle policy to move to Glacier then delete after a period of time
      • Supports up to 40TB max item size (S3 supports 5TB)
      • It costs more per retrieval
      • Vault Lock allows you to easily deploy and enforce compliance controls for individual Amazon Glacier vaults with a vault lock policy. You can specify controls such as “write once read many” (WORM) in a vault lock policy and lock the policy from future edits. Once locked, the policy can no longer be changed
  • Amazon RDS
    • Can create a standby copy in a different AZ within the same VPC
    • Can create multiple read replicas (in different regions as well)
  • Amazon DynamoDB
    • Always uses SSD for storage
    • Supports auto-scaling. Increases/decreases the throughput based on load
    • Tables are partitioned by primary key
    • Two query methods: Query and Scan
    • Query uses the primary key to find items. Scan can use any attribute.
    • Scan is slower than Query as it needs to look at all items
  • Amazon Redshift
    • Managed data warehouse
    • Supports standard SQL
    • Supports ODBC/JDBC connectors
  • Amazon Aurora
    • Managed MySQL-clone (compatible with MySQL)
    • After a crash it doesn’t need to redo log files. It performs it on every read operation which reduces the restart time
  • AWS Trusted Advisor
    • Checks all the resources used and gives advice based on best practices
    • 5 categories:
      • Cost optimisation
      • Performance
      • Security
      • Fault tolerance
      • Service limits
    • Upgrading support plan enables all Trusted Advisor recommendations, free plan doesn’t include all
    • Has an API and can be used to automate optimisations
    • Can use it with CloudWatch alarms

Security

  • The AWS Shared Responsibility Model
    • AWS handles infrastructure security
    • AWS provides 3rd party audit reports
    • AWS’s responsibilities include: OS and database patching, firewall configuration and disaster recovery
    • Customer is responsible for putting logical access controls in place and protect account credentials
    • Customers are responsible to secure everything they put in the cloud
  • AWS Service Catalog
    • Allows to centrally manage common IT services that are approved for use on AWS
  • AWS IAM
    • Controls access to AWS resources
    • Handles Authentication (who can access resources) and authorization (how they can use resources)
    • Users can have programmatic access and/or console access.
    • Best practices
      • Delete root account keys. Instead use IAM accounts
      • Use MFA
      • Use groups
      • Use roles
      • Rotate credentials
      • Remove unnecessary users
  • AWS Security Compliance Programs
    • Risk Management: Follow the following standards:
      • COBIT
      • AICPA
      • NIST
    • Constantly scans service endpoints for vulnerabilities
    • Compliance programs are listed here
  • AWS Security Resources

Architecting

  • Well-architected framework: https://aws.amazon.com/architecture/well-architected/
  • Fiver pillars of the framework
    • Operational excellence
    • Security
    • Reliability
    • Performance efficency
    • Cost optimization
  • Fault Tolerance
    • Remain operational even if components fail
    • Built-in redundancy of an application’s components
  • High-Availability
    • A concept for the whole system
    • “Always” functioning and accessible
    • Without human intervention
    • HA Service Tools
      • Elastic Load Balancer
      • Elastic IP Addresses
      • Amazon Route 53
      • Auto Scaling
      • Amazon CloudWatch

Pricing and Support

  • Core concepts in billing
    • Pay as you go: No up front expenses
    • Pay less when you reserve: Reserved instances cost less
    • Pay even less per unit by using more: Tiered pricing for services such as S3, EC2 etc. Data transfer in is always free of charge.
    • Pay even less as AWS grows
  • Amazon RDS Costs
    • Clock hours of server time
    • Database characteristics
    • Database purchase type
    • Number of DB instances
    • Provisional storage
      • No charge for backup storage of up to 100% of database storage for active databases. After terminated, the backups are charged
    • Additional storage
    • Requests
    • Deployment type
    • Data transfer

General Notes

Exam Centre

The exam centre was very small and there was some sort of music studio next door so there was constant noise. OVerall it was a bit disappointing to take the exam in a desolated business centre and in a small room but it’s the same exam regardless so I was able to focus on the questions after I got used to the noise.

Exam Process

  • My exam was scheduled at 3:00. I arrived early and the proctor allowed me to sit at 2:00 as there were empty places in the exam room. It was a nice surprise because I definitely didn’t want to wait for another hour in that heat
  • At one point, the screen froze. I had to call the proctor. He restarted the application. Fortunately it just resumed where it left off.
  • CCP is the easiest AWS exam but even so there were some challenging questions. Mostly non-technical questions were hard for me (like questions related to support plans). I don’t think I’ll everr see those questions in other exams.

Exam Result

… and the result is : Pass

Amazon has an interesting scoring system apparently. Right after you submit the exam, the screen displays Pass or Fail but not the actual score. You receive that in a separate email. They don’t even announce what the passing score is as they reserve the right to change when they see fit. It’s also based on other candidates’ results too so almost like a curve. Anyway, it was quite a relief to see the pass result on the screen. I’m still curiously waiting for the actual score though.

My next exam will be AWS Certified Solutions Associate. I’ll post my exam notes after that exam as well.

Resources

devaws ses, lambda comments

A few years ago AWS announced a new SES feature: Incoming Emails. So far I have only used it once to receive domain verification emails to an S3 bucket but haven’t built a meaningful project. In this blog post my goal is to develop a sample project to demonstrate receiving emails with SES and processing those emails automatically by triggering Lambda functions.

As a demo project I will build a system that automatically responds to a sender with my latest CV as shown in the diagram below

Receiving Email with Amazon Simple Email Service

Amazon Simple Email Service (SES) is Amazon’s SMTP server. It’s core functionality has been sending emails but Amazon kept adding more features such as using templates and receiving emails.

Step 1: Verify a New Domain

First, we need a verified domain to receive emails. If you already have one you ca skip this step.

  • Step 1.1: In the SES console, click Domains –> Verify a New Domain
  • Step 1.2: Enter the domain name to verify and click Verify This Domain

  • Step 1.3: In the new dialog click Use Route 53

(This is assuming your domain is in Route53. If not you have to verify it by other means)

  • Step 1.4: Make sure you check Email Receiving Record checkbox and proceed

  • Step 1.5 Confirm verification status

Go back to Domains page in SES console and make sure the verification has been completed successfully

In my example, it only took about 2 minutes.

Step 2: Create a Lambda function to send the CV

In the next step we will continue to configure SES to specify what to do with the received email. But first we need the actual Lambda function to do the work. Then we will connect this to SES so that it runs everytime when we receive an email to a specific email.

  • Step 2.1: Create a Lambda function from scratch

  • Step 2.2: Create an SNS topic

SES will publish emails to this topic. We will do the plumbing and give necessary permissions later.

  • Step 2.3: Create subscription for the Lambda function to SNS topic

Now we tie the topic to our Lambda by creating a subscription

  • Step 2.4: Attach necessary permissions to the new role

In my example, I store my CV in an S3 bucket. So the policy would need to receive SNS notifications, read access to S3 bucket and permissions to send emails.

By default a new Lambda role comes with AWSLambdaBasicExecutionRole attached to it

First add this to have read-only access to a single bucket:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObjectAcl",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{BUCKET NAME}",
                "arn:aws:s3:::*/*"
            ]
        }
    ]
}

Then this to be able to send emails

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ses:SendEmail",
                "ses:SendTemplatedEmail",
                "ses:SendRawEmail"
            ],
            "Resource": "*"
        }
    ]
}

I like to keep these small, modular policies so that I can reuse then in other projects.

After adding the policies you should be able to see these in your Lambda function’s access list when you refresh the function’s page:

Step 3: Develop the Lambda function

In this exmample I’m going to use a .NET Core and C# 2.0 to create the Lambda function.

  • Step 3.1: Install Lambda templates

In Windows, AWS Lambda function templates come with AWS Visual Studio extension but in Mac we have to install them via command line.

dotnet new -i Amazon.Lambda.Templates::*
  • Step 3.2: Create Lambda function
dotnet new lambda.EmptyFunction --name SendEmailWithAttachmentFromS3 --profile default --region eu-west-1
  • Step 3.3:

Now it’s time for the actual implementation. I’m not going to paste the whole code here. Best place to get it is its GitHub repository

  • Step 3.4 Deploy the function

Create an IAM user with access to Lambda deployment and create a profile locally named deploy-lambda-profile.

dotnet restore
dotnet lambda deploy-function send_cv

Step 4: Create a Receipt Rule

Now that we have a verified domain, we need a rule to receive emails.

In my example project, I’m going to use an email address that will send my latest CV to a provided email adress.

  • Step 4.1: In the Email Receiving section click on Rule Sets –> Create a Receipt Rule

  • Step 4.2: Add a recipient

  • Step 4.3: Add an Action

Now we choose what to do when an email is received. In this example I want it to be published to an SNS topic that I created earlier. I could invoke the Lambda function directly but leveraging publish/subscribe gives me more flexibility as in I can change the subscriber in the future or add more stuff to do without affecting the rule configuration.

Since it supports multiple actions I could choose to invoke Lambda directly and add more actions here later on if need be but I’d like to use a standard approach which is all events are published to SNS and the interested parties subscribe to the topics.

I chose UTF-8 because I’m not expecting any data in the message body so it doesn’t matter too much in this example.

  • Step 4.4 Give it a name and create the rule.

Step 4: Test end-to-end

Now that it’s all set up, it is time to test.

  • Step 4.1: Send a blank email to cv@vlkn.me (Or any other address if you’re setting up your own)

  • Step 4.2:

Then after a few seconds later, receive an email with the attachment:

The second email is optional. Basically, I creted an email subscriber too. So that whenever a blank email is received I get notified by SNS directly. This helps me to keep an eye on traffic if there is any.

Resources

aws certification comments

I’ve been working with AWS for years. Since I love everything about it and planning to use it for the foreseeable future, I’ve decided to go ahead and get the official certificates. This is to make sure I’ve covered all the important aspects of AWS fully. Also it motivates me to devleop more projects and blog posts on it.

Overview

There are 2 main categories of tracks:

  • Role-based Certifications
  • Specialty Certifications

The tracks and exam paths to take are shown in the diagram below:

My plan is to start with Cloud Practitioner exam, continue with AWS Solutions Architect track and move on to developer and sysops tracks.

Costs

I think it’s important to analyze costs first to assess whether or not this is a journey you want to start.

Individual Exam Costs

|Exam Name|Cost|Notes |AWS Certified Cloud Practitioner|100 USD|Optional |Associate-level exams|150 USD| |Professional-level exams|300 USD| |Specialty exams|300 USD| |Recertification exams|75 USD|Recertification is required every two years for all AWS Certifications |Associate-level practice exams|20 USD| |Professional-level practice exams|40 USD|

Total Tracks Costs

|Exam Track|Total Cost|With VAT|Notes| |AWS Solutions Architect|450 USD|540 USD| |AWS Certified DevOps Engineer|450 USD|540 USD| |All Associate Level Exams |300 USD|360 USD|3 Exams |All Professional Level Exams |600 USD|720 USD|2 Exams (There’s no professional level for developer, both associate level exams lead to DevOps engineer) |All Exams|1150 USD|1380 USD|Includes the optional Cloud Practitioner exam

The total cost is quite cheap but I think in the end it’s worth it.

Taking the Exams

It all starts with aws.training site. Just sign in with your Amazon account or create a new one. This allows you to take the online free courses. To take the exams you’d need a new account. I think this is because they partnered with a 3rd party to provide this.

Registration is quite similar. Just provide name and address and search for an exam centre.

Online Training

Free Courses

AWS Training

This is the official certification site of AWS. It allows the user to enroll to courses and view their transcript.

It’s a bit hard to find the actual course after you enrol because you can’t jump to contents from search results. What you should do is first go to My Transcript and under current courses you should be able to see the course and a link that says “Open”. Clickking that link takes to the actual content.

It has more content in the site. I’ll discover more as I go along.

edX

edX have recently launched 3 free AWS courses.

Various Training Resources

Conclusion

As my favourite saying goes: “It’s not about the destination, it’s about the journey”.

AWS certification for me is not a destination. It just plays a role for me to stay on course and stay motivated to create more projects and blog posts in a timely manner.

I’m hoping to see this journey to completion. I’ll be posting more on AWS and my journey on certification soon.

Resources

devaws api gateway comments

API Gateway is Amazon’s managed API service. Serverless architecture is growing more on me everyday. I think leveraging infinite auto-scaling and only paying for what you use makes perfect sense. But to have an API that will be customer-facing first thing that needs to be setup is a custom domain which might be a bit involved when SSL certificates come in to play. In this post I’d like to create an API from scratch and use a custom domain name assigned to it.

Step 1: Create an API

Creating an API is straightforward: Just assign a meaningful and description. However, to me it was a bit confusing when it came to choosing the endpoint type.

The two options provided are: Regional and Edge optimized.

  • Edge-optimized API endpoint: The API is deployed to the specified region and a CloudFront distribution is created. API requests are routed to the nearest CloudFront Point of Presence (POP).

  • Regional API endpoint: This type was added in November 2017. The main goal is to prevent a roundtrip for in-region requests. API requests are targeted directly to the region-specific API Gateway without going through any CloudFront distribution.

Custom domain names are supported for both endpoint types.

In this example, I’ll use Regional endpoint type. For further reading, here’s a nice blog post about endpoint types.

Step 2: Create a resource and method

For demonstration purposes I created a resource called customer and a GET method that is which calls a mock endpoint.

Step 3: Deploy the API

From the Actions menu in Resources tab, I selected Deploy API.

Deployment requires a stage. Since this is the first deployment, I had to create a new stage called test. A new stage can be created while deploying. After the deployment test stage looks like this:

At this point API Gateway assigned a non-user-friendly URL already:

https://81dkdt6q81.execute-api.eu-west-2.amazonaws.com/test

This is the root domain of the API. So I was able to call the endpoint like this:

https://81dkdt6q81.execute-api.eu-west-2.amazonaws.com/test/albums

My goal was to get it working with my own domain such as:

https://hmdb.myvirtualhome.net/albums

Step 4: Generate the certificate in ACM

I’m using Route53 for all my domains and using ACM (AWS Certificate Manager) for generating SSL/TLS certificates. Before creating the custom domain name I needed my certificate available.

The wizard is quite simple: I just added the subdomain for the API and selected DNS validation.

After the review comes the validation process. Since I’m using Route 53 and ACM plays well with it, it simply provided a nice big button that said Create record in Route 53.

After clicking and confirming I got this confirmation message:

After waiting for about 3 minutes, the cerficate was issued already:

Step 5: Create Custom Domain Name in API Gateway

Now that the certificate was ready I had to go back to API Gateway to create the custom domain name and associate it with the newly created cert.

First, I clicked on Custom Domain Names on left menu and filled out the details. Make sure that your subdomain matches the one the certificate was generated for.

I assigned /test path to the test stage I had created earlier. I will use root path for the production stage when I deploy the final version.

After creating the custom domain, take note of Target Domain Name generated by AWS.

Step 6: Create A Record in Route 53

I had to also point DNS to the domain generated by API Gateway.

Since I was using a regional endpoint I had to map the custom domain name to the target domain name mentioned in the previous step.

Now the problem was when I tried to do it via AWS Management Console, it failed as explained in this StackOverflow answer.

So I had to do it via CLI as below:

aws route53 change-resource-record-sets --hosted-zone-id {ZONE_ID_OF_MY_DOMAIN} --change-batch file://changedns.json

whereas the contents of changedns.json were

{
  "Changes": [
    {
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "api.hmdb.myvirtualhome.net",
        "Type": "A",
        "AliasTarget": {
          "DNSName": "d-xyz.execute-api.eu-west-2.amazonaws.com",
          "HostedZoneId": "ZJ5UAJN8Y3Z2Q",
          "EvaluateTargetHealth": false
        }
      }
    }
  ]
}

In the JSON above, DNSName is the Target Domain Name created by AWS is Step 5. The HostedZoneId (ZJ5UAJN8Y3Z2Q), on the other hand, is the zone ID of API Gateway which is listed here.

UPDATE

If you are having issues running the command above that might mean you don’t have a default profile setup which has permissions to change DNS settings. To fix that:

1. Create a new user with no permissions

Go to IAM console and create a new user. Skip all the steps and download the credentials as .csv in the last step.

2. Assign required permissions

Create a new policy using the JSON template below and attach it to the new user

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "route53:ChangeResourceRecordSets",
            "Resource": "arn:aws:route53:::hostedzone/{ZONE ID OF YOUR DOMAIN}"
        }
    ]
}

3. Create a new profile for the user

aws configure --profile temp-route53-profile

and set the Access/Secret keys along with the region of your hosted zone.

Then you run the first CLI command with providing profile name:

aws route53 change-resource-record-sets --hosted-zone-id {ZONE_ID_OF_MY_DOMAIN} --change-batch file://changedns.json --profile temp-route53-profile

An important point here is to get your hosted zone ID from Route53. In the API Gateway, it shows a hosted zone ID which is actually AWS API Gateway zone ID. We use that zone ID in our DNS configuration (which is in changedns.json file in this example) but when we provide the hosted zone ID on the command line we provide our domain ID which can be found in Route53.

Step 7: Test

So after creating the alias for my API I visited the URL on a browser and I was able to get the green padlock indicating that it loaded the correct SSL certificate.

Resources

devaws route53, angular, dotnet core, dynamic dns, csharp comments

A few years back I developed a project called DynDns53. I was fed up with the dynamic DNS tools available and thought could easily achieve the same functionality since I had already been using AWS Route53.

Fast forward a few years, due to some neglect on my part and technology moving so fast the project started to feel outdated and abandoned. So I decided to revise it.

Key improvements in this version are:

  • Core library is now available in NuGet so anyone can build their own clients around it
  • A new client built with .NET Core so that it runs on all platforms now
  • A Docker version is available that runs the .NET Core client
  • A new client built with Angular 5 to replace the legacy AngularJS
  • CI integration: Travis is running the unit tests of core library
  • Revised WPF and Windows Service clients and fixed bugs
  • Added more detailed documentation on how to set up the environment for various clients

Also kept the old repository but renamed it to dyndns53-legacy. I might archive it at some point as I’m not planning to support it any longer.

Available on NuGet

NuGet is a great way of installing and updating libraries. I thought it would be a good idea to make use of it in this project so that it can be used without cloning the repository.

With DotNetCore it’s quite easy to create a NuGet package. Just navigate to project folder (where .csproj file is located) and run this:

dotnet pack -c Release

The default configuration it uses is Debug so make sure you’re using the correct build and a matching pack command. You should be able to see a screen similar to this

Then push it to Nuget

dotnet nuget push ./bin/Release/DynDns53.CoreLib.1.0.0.nupkg -k {NUGET.ORG API_KEY} -s https://api.nuget.org/v3/index.json

To double-check you can go to your NuGet account page and under Manage Packages you should be able to see your newly published package:

Now we play the waiting game! Becuase it may take some time for the package to be processed by NuGet. For exmaple I saw the warning shown in the screenshot 15 minutes after I pushed the package:

Generally this is a quick process but the first time I published my package, I got my confirmation email about 7 hours later so your mileage may vary.

If you need to update your package after it’s been published, make sure to increment the version number before running dotnet pack. In order to do that, you can simply edit the .csproj file and change the Version value:

  <PropertyGroup>
    <TargetFramework>netstandard2.0</TargetFramework>
    <PackageId>DynDns53.CoreLib</PackageId>
    <Version>1.0.1</Version>
    <Authors>Volkan Paksoy</Authors>
    <Company></Company>
  </PropertyGroup>

Notes

  • Regarding the NuGet API Key: They recently changed their approach about keys. Now you only have one chance to save your key somewhere else. If you don’t save it, you won’t be able to access ti via their UI. You can create a new one of course so no big deal. But to avoid key pollution you might wanna save it in a safe place for future reference.

  • If you are publishing packages frequently, you may not be able to get the updates even after they had been published. The reason for that is the packages are cached locally. So make sure to clean your cache before you try to update the packages. On Mac, Visual Studio doesn’t have a Clean Cache option as of this writing (unlike Windows) so you have to go to your user folder and remove the packages under {user}/.nuget/packages folder. After this, you update the packages and you should get the latest validated version from Nuget.

.NET Core Client

Prerequisites

First, you’d need an IAM user who has access to Route53. You can use the policy template below to give the minimum possible permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "route53:ListResourceRecordSets",
                "route53:ChangeResourceRecordSets"
            ],
            "Resource": "arn:aws:route53:::hostedzone/{ZONE ID}"
        }
    ]
}

Only 2 actions are performed so as long as you remmeber to update the policy with the new zone IDs if you need to manage other domains this should work fine work you.

Usage

Basic usage is very straightforward. Once compiled you can supply the IAM Access and Secret Keys and the domains to update with their Route53 Zone IDs as shown below:

dotnet DynDns53.Client.DotNetCore.dll --AccessKey {ACCESS KEY} --SecretKey {SECRET KEY} --Domains ZoneId1:Domain1 ZoneId2:Domain2 

Notes

  • .NET Core Console Application uses the NuGet package. One difference between .NET Core and classis .NET application is that the packages are no longer stored along with the application. Instead they are downloaded to the user’s folder under .nuget folder (e.g. on a Mac it’s located at /Users/{USERNAME}/.nuget/packages)

Available on Docker Hub

Even though it’s not a complex application I think it’s easier and hassle-free to run it in a self-contained Docker container. Currently it only supports Linux containers. I might need to develop a multi-architecture image in the future in need be but for now Linux only is sufficient for my needs.

Usage

You can get the image from Docker hub with the following command:

docker pull volkanx/dyndns53

and running it is very similar to running the .NET Core Client as that’s what’s running inside the container anyway:

docker run -d volkanx/dyndns53 --AccessKey {ACCESS KEY} --SecretKey {SECRET KEY} --Domains ZoneId1:Domain1 ZoneId2:Domain2 --Interval 300

The command above would run the container in daemon mode so that it can keep on updating the DNS every 5 minutes (300 seconds)

Notes

  • I had an older Visual Studio 2017 for Mac installation and it didn’t have Docker support. The setup is not very granular to pick specific features. So my solution was to reinstall the whole thing at which point Docker support was available in my project.

  • After adding Docker support the default build configuration becomes docker-compose. But it doesn’t work straight away as it throws an exception saying

      ERROR: for dyndns53.client.dotnetcore  Cannot start service 	dyndns53.client.dotnetcore: Mounts denied: 
      The path /usr/local/share/dotnet/sdk/NuGetFallbackFolder
      is not shared from OS X and is not known to Docker.
      You can configure shared paths from Docker -> Preferences... -> File Sharing.
      See https://docs.docker.com/docker-for-mac/osxfs/#namespaces for more info.
    

I added the folder it mentions in the error message to shared folders as shown below and it worked fine afterwards:

  • Currently it only works on Linux containers. There’s a nice articlehere about creating multi-architecture Docker images. I’ll try to make mine multi-arch as well when I revisit the project or there is an actual need for that.

Angular 5 Client

I’ve updated the web-based client using Angular 5 and Bootstrap 4 (Currently in Beta) which now looks like this:

I kept a copy of the old version which was developed with AngularJS. It’s available at this address: http://legacy.dyndns53.myvirtualhome.net/

Notes

  • After I added AWS SDK package I started getting a nasty error:

      ERROR in node_modules/aws-sdk/lib/http_response.d.ts(1,25): error TS2307: Cannot find module 'stream'.
    

    Fortunately the solution is easy as shown in the accepted answer here. Just remove “types: []” line in tsconfig.app.json file. Make sure you’re updating the correct file though as there is similarly named tsconfig.json in the root. What we are after is the tsconfig.app.json under src folder.

  • In this project, I use 3 different IP checkers (AWS, DynDns and a custom one I developed myself a while back and running on Heroku). Calling these from other clients is fine but when in the web application I bumped into CORS issues. There are possible solutions for this:

    1. Create you own API to return the IP address: In the previous version, I created an API with AWS API Gateway which uses a very simple Lambda function to return caller’s IP address

       exports.handler = function(event, context) {
         	 context.succeed({
               "ip": event.ip
           })
       }
      

      I create a GET method for my API and used the above Lambda function. Now that I had full control over it I was able to enable CORS as shown below:

    2. The other solution is “tricking” the browser by injecting CORS headers by using a Chrome extension. There is an umber of them but I use the one aptly named “Allow-Control-Allow-Origin: *”

      After installed you just enable it and the getting external IP works fine.

      It’s a good practice to filter it for your specific needs so that it doesn’t affect other sites (I had some issues with Google Docs when this is turned on)

CI Integration

I created a Travis integration which is free since my project is open-source. It runs the unit tests of the core library automatically. Also added the shiny badge on the project’s readme file that shows the build status.

Resources

devmachine learning google cloud platform, speech-to-text comments

Just out of curiosity I wanted to play around with Google Cloud Platform. They give $300 free credit for a 12 month trial period so I thought this would be a good chance to try it out.

The APIs I wanted to sample were speech recognition and translation.

Setting Up SDK

I followed the quick start guide which is a step-by-step process so it was quite helpful to get acquainted with the basics.

To be able to follow the instructions I downloaded and installed the GCloud SDK. On Mac it’s quite easy:

curl https://sdk.cloud.google.com | bash
exec -l $SHELL
gcloud init

And once it’s complete it requires you to log in to your account and grant access to SDK:

Testing the API

After the initial setup I tried the sample request and it worked just fine:

The example worked but also raised a few questions in my mind:

  1. Sample uses gs protocol. First off, what does it mean?
  2. Can I use good ol’ http instead of it and point to any audio file publicly accessible?
  3. Can I use MP3 as encoding or does it need to be FLAC?

As learned from this SO thread, gs is used for Google Cloud Storage and “https://storage.googleapis.com” translates to “gs://”.

So the http version of the test file is “https://storage.googleapis.com/cloud-samples-tests/speech/brooklyn.flac”. I was able to verify the file actually exists but when I replaced it with the original value I got this error:

{
    "error": {
        "code": 400,
        "message": "Request contains an invalid argument.",
        "status": "INVALID_ARGUMENT"
    }
}

This also answered my second question. According to the documentation it only supports Google Cloud Storage currently:

uri contains a URI pointing to the audio content. 
Currently, this field must contain a Google Cloud Storage URI 
(of format gs://bucket-name path_to_audio_file). 

The answer to my 3rd question wasn’t very promising either. Apparently only the types listed below are supported:

If the authorization token expires, you can generate a new one by using the following commands:

export GOOGLE_APPLICATION_CREDENTIALS="/Path/To/Credentials/Json/File"

gcloud auth application-default print-access-token

So no way of uploading a random MP3 and get text out of it. But I’ll of course try anyway :-)

Test Case: Get lyrics for a Rammstein song and translate

OK, now that I have a free trial at my disposal and have everything setup, let’s create some storage, upload some files and put it to a real test.

Step 01: Get some media

My goal is to extract lyrics of a Rammstein song and translate them to English. For that I chose the song Du Hast. Since I couldn’t find a way to download FLAC version of the song I decided to download the official vide from Rammstein’s YouTube channel.

This is just for experimental purposes and I deleted the video after I’m done testing it so should be fine I guess. To download videos from youtube you can refer to this TechAdvisor article.

I simply used VLC to open the YouTube video. In Window -> Media Information dialog it shows the full path of the raw video file and I copied that path into a browser and downloaded the video.

Step 02: Prepare the media to process

Since all I need is audio I extracted it from video file using VLC. Probably can be done in a number of ways but VLC is quite straightforward to do it:

Click File –> Convert & Stream, drag and drop the video

In the Choose Profile section, select Audio - FLAC.

The important bit here is is that by default VLC converts to stereo audio with 2 channels but Google doesn’t support it which is explained in this documentation:

All encodings support only 1 channel (mono) audio

So make sure to customize it and enter 1 as channel count:

Step 03: Call the API

Now I was ready to call the API with my shiny single-channel FLAC file. I uploaded it to the Google Storage bucket I created, gave public access to it and tried the API.

Apparently, speech:recognize endpoint only supports audio up to a minute. This is the error I got after posting a 03:55 audio.

“Sync input too long. For audio longer than 1 min use LongRunningRecognize with a ‘uri’ parameter.”

The solution is using speech:longrunningrecognize endpoint which only returns a JSON with 1 value: name. This is a unique identifier assigned by Google to the job they created for us.

Once we have this id we can query the result of the process by calling GET operations endpoint.

Fantastic! Some results. It’s utterly disappointing of course as we only got a few words out of it, but still something (I guess!).

Step 04: Compare the results:

Now the following is the actual lyrics of the song:

Du
du hast
du hast mich
du hast mich gefragt
du hast mich gefragt, und ich hab nichts gesagt

Willst du bis der Tod euch scheidet
treu ihr sein für alle Tage

Nein

Willst du bis zum Tod, der scheide
sie lieben auch in schlechten Tagen

Nein

and this is what I got back from Google:

du hast 
du hast recht 
du hast 
du hast mich 
du hast mich 
du du hast 
du hast mich

du hast mich belogen

du hast 
du hast mich blockiert

It missed most of the lyrics. Maybe it was headbanging too hard that it couldn’t catch those parts!

Test Case: Slow German Podcast

Since my idea of translating German industrial metal lyrics on the fly failed miserably I decided to try with cleaner audio where there is no music. Found a nice looking podcast called Slow German. Nice thing about it is that it provides transcripts as well so I can compare the Speech API results with it.

Obtained a random episode from their site and followed the steps above.

First 4 paragraphs of the actual transcript of the podcast is as follows (The full transcript can be found here:

Denk ich an Deutschland in der Nacht, dann bin ich um den Schlaf gebracht.“ Habt Ihr diesen Satz schon einmal gehört? Er wird immer dann zitiert, wenn es Probleme in Deutschland gibt. Der Satz stammt von Heinrich Heine. Er war einer der wichtigsten deutschen Dichter. Aber keine Angst: Auch wenn er am 13. Dezember 1797 geboren wurde, sind seine Texte sehr aktuell und relativ leicht zu lesen. Ihr werdet ihn mögen!

Harry Heine wuchs in einem jüdischen Haushalt auf. Er war 13 Jahre alt, als Napoleon in Düsseldorf einzog. Schon als Schüler begann er, Gedichte zu schreiben. Beruflich sollte er eigentlich im Bankgeschäft arbeiten, aber dafür hatte er kein Talent. Also versuchte er es erst mit einem eigenen Geschäft für Stoffe, das aber bald pleite war. Dann begann er zu studieren. Er probierte es mit Jura und mit Geschichte, besuchte verschiedene Vorlesungen.

Mit 25 Jahren veröffentlichte er erste Gedichte. Es war eine aufregende Zeit für ihn. Er wechselte die Städte und die Universitäten, er beendete sein Jura- Studium und wurde promoviert. Um seine Chancen als Anwalt zu verbessern, ließ er sich protestantisch taufen, er kehrte also dem Judentum den Rücken und wurde Christ. Daher auch der neue Name: Christian Johann Heinrich Heine. Später hat er die Taufe oft bereut.

Wenn Ihr Heines Werke lest werdet Ihr merken, dass sie etwas Besonderes sind. Sie sind oft kritisch, sehr oft aber auch ironisch und humorvoll. Er spielt mit der Sprache. Er kann aber auch sehr böse sein und herablassend über Menschen schreiben. Seine Kritik auch an politischen Ereignissen und die Zensur, mit der er in Deutschland leben musste, führten Heinrich Heine nach Paris. Er wanderte nach Frankreich aus.

And this is the result I got from Google (Trimmed to match the above):

denk ich an Deutschland in der Nacht dann bin ich um den Schlaf gebracht habt ihr diesen Satz schon einmal gehört er wird immer dann zitiert wenn es Probleme in Deutschland gibt der Satz stammt von Heinrich Heine er war einer der wichtigsten deutschen Dichter aber keine Angst auch wenn er am 13. Dezember 1797 geboren wurde sind seine Texte sehr aktuell und relativ leicht zu lesen ihr werdet ihn mögen Harry Heine wuchs in einem jüdischen Haushalt auf er war 13 Jahre alt als Nappo

hier in Düsseldorf einen Zoo schon als Schüler begann er Gedichte zu schreiben beruflich sollte er eigentlich im Bankgeschäft arbeiten aber dafür hatte er kein Talent also versuchte er es erst mit einem eigenen Geschäft für Stoffe das aber bald pleite war dann begann er zu studieren er probierte es mit Jura und mit Geschichte besuchte verschiedene Vorlesungen mit 25 Jahren veröffentlichte er erste Gedichte es war eine aufregende Zeit für ihn er wechselte die Städte und die Universitäten er beendete sein Jurastudium und wurde Promo

auch an politischen Ereignissen und die Zensur mit der er in Deutschland leben musste führten Heinrich Heine nach Paris er wanderte nach Frankreich aus 

Comparing the translations

Since I don’t speak German I cannot judge how well it did. Clearly it didn’t capture all the words but I wanted to see if what it returned makes any sense anyway. So I put both in Google Translate and this is how they compare:

Translation of the original transcript:

When I think of Germany at night, I'm about to go to sleep. "Have you ever heard that phrase before? He is always quoted when there are problems in Germany. The sentence is by Heinrich Heine. He was one of the most important German poets. But do not worry: even if he was born on December 13, 1797, his lyrics are very up to date and relatively easy to read. You will like him!

Harry Heine grew up in a Jewish household. He was 13 years old when Napoleon moved in Dusseldorf. Even as a student, he began writing poetry. Professionally, he was supposed to work in banking, but he had no talent for that. So he first tried his own business for fabrics, which was soon broke. Then he began to study. He tried law and history, attended various lectures.

At the age of 25 he published his first poems. It was an exciting time for him. He changed cities and universities, he completed his law studies and received his doctorate. To improve his chances as a lawyer, he was baptized Protestant, so he turned his back on Judaism and became a Christian. Hence the new name: Christian Johann Heinrich Heine. Later he often regretted baptism.

When you read Heine's works, you will find that they are special. They are often critical, but often also ironic and humorous. He plays with the language. But he can also be very angry and condescending to write about people. His criticism also of political events and the censorship with which he had to live in Germany led Heinrich Heine to Paris. He emigrated to France.	

Translation of Google’s results:

I think of Germany in the night then I'm about to sleep Did you ever hear this sentence He is always quoted when there are problems in Germany The sentence comes from Heinrich Heine He was one of the most important German poets but do not be afraid he was born on December 13, 1797 his lyrics are very up to date and relatively easy to read you will like him Harry Heine grew up in a Jewish household he was 13 years old as Nappo

Here in Dusseldorf a zoo as a student he began to write poetry professionally he should actually work in the banking business but for that he had no talent so he first tried his own business for fabrics but soon broke and then began to study he tried it with Jura and with history attended various lectures at age 25 he published his first poems it was an exciting time for him he changed the cities and the universities he finished his law studies and became promo

also in political events and the censorship with which he had to live in Germany led Heinrich Heine to Paris he emigrated to France

The translations of the podcast are very close, especially the first part. It missed some sentences and when you read the API output at least you can get a general understanding of what the text is about. It’s not a good read maybe and it’s not good if you’re interested in details but it’s probably good enough

Conclusion

Speech to text can be very useful backed with automated real-time translations. Google Speech API supports real time speech recognition as well so it may be interesting to put Translation API in use as well and develop a tool to get real time translations but that’s for another blog post.

Resources

devops git, devops, powershell comments

Having lots of projects and assets stored on GitHub I thought it might be a good idea to create periodical backups of my entire GitHub account (all repositories, both public and private). The beauty of it is since Git is open source, this way I can migrate my account to anywhere and even host it on my own server on AWS.

Challenges

With the above goal in mind, I started to outline what’s necessary to achieve this task:

  1. Automate calling GitHub API to get all repos including private ones. (Of course one should be aware of GitHub API rate limits which is currently 5000 requests per hour. If you use up all your allowance with your scripts you may not be able to use it yourself. Good thing is they are returning how many calls are left before you exceed your quota in x-ratelimit-remaining HTTP header in their responses.)
  2. Pull all the latest versions for all branches. Overwrite local versions in cases of conflict.
  3. Find a way to easily transfer a git repository (A compressed single file version rather than individual files) if transferring to another medium is required (such as an S3 bucket)

With these challenges ahead, I first started looking into getting the repos from GitHub:

Consuming GitHub API via PowerShell

First, I shopped around for existing libraries for this task (such as PowerShellForGitHub by Microsoft but it didn’t work for me. Basically I couldn’t even manage the samples on their Wiki. It kept giving cmdlet not found error so I gave up.)

Found a nice video on Channel 9 about consuming REST APIs via PowerShell which uses GitHub API as a case study. It was perfect for me as my goal was to use GitHub API anyway. And since this is a generic approach to consume APIs it can come handy in the future as well. It’s quite easy using basic authentication.

Authorization

First step, is to create a Personal Access Token with repo scope. (Make sure to copy the value before you close the page, there is no way to retrieve it afterwards.)

After the access token has been obtained, I had to generate authorization header as shown in the Channel 9 video:

$token = '<YOUR GITHUB ACCOUNT NAME>:<PERSONAL ACCESS TOKEN>'
$base64Token = [System.Convert]::ToBase64String([char[]]$token)
$headers = @{
    Authorization = 'Basic {0}' -f $base64Token
};

$response = Invoke-RestMethod -Headers $headers -Uri https://api.github.com/user/repos

This way I was able to get the repositories including the private ones but by default it returns 30 records on a page so I had to traverse over the pages .

Handling pagination

GitHub sends the next and the last page URLs in link header:

<https://api.github.com/user/repos?page=2>; rel="next", <https://api.github.com/user/repos?page=3>; rel="last"

The challenge here is that looks like Invoke-RestMethod response doesn’t allow to access headers which is a huge bummer as there are useful info in the headers as shown in the screenshot:

GitHub response headers in Postman

At this point, I wanted to use PSGitHub mentioned in the video but as of this writing it doesn’t support getting all repositories. In fact in a note it says “We need to figure out how to handle data pagination” which made me think we are on the same page here (no pun intended!)

GitHub supports a page size parameter (e.g. per_page=50) but the documentation says the maximum value is 100. Although it is tempting to use that one as that would bring all my repos and leave some room for the future ones as well I wanted to go with a more permanent solution. So I decided to request more pages as longs as there are objects returning like this

$page = 1

Do
{
    $response = Invoke-RestMethod -Headers $headers -Uri "https://api.github.com/user/repos?page=$page"
    
    foreach ($obj in $response)
    {
        Write-Host ($obj.id)
    }
    
    $page = $page + 1
}
While ($response.Count -gt 0)

Now in the foreach loop of course I have to do something with the repo information instead of just printing the id.

Cloning / pulling repositories

At this point I was able to get all my repositories. GitHub API only handles account information so now I needed to able to run actual git commands to get my code.

First I had installed PowerShell on Mac which is quite simple as specified in the documentation:

brew tap caskroom/cask
brew cask install powershell

With Git already installed on my machine, all is left was using Git commands to clone or update repo on PowerShell terminal such as:

git fetch --all
git reset --hard origin/master

Since this is just going to be a backup copy I don’t want to deal with merge conflicts and just overwriting everything local.

Another approach could be deleting the old repo and cloning it from scratch but I think this would be a bit wasteful to do it everytime for each and every repository.

Putting it all together

Now that I have all the bits and pieces I have glue them together in a meaningful script than can be scheduled and here it is:

Conclusion and Future Improvements

This version accomplishes the basic task of backing up an entire GitHub account but it can be improved in a few ways. Maybe I can post a follow up article including those improvements. A few ideas come to mind are:

  • Get Gists (private and public) as well.
  • Add option to exclude repos by name or by type (i.e. get only private ones or get all except repo123)
  • Add an option to export them to a “non-git” medium such as an S3 bucket using git bundle (which turns out to be a great tool to pack everything in a repository in a single file)
  • Create a Docker image that contains all the necessary software (Git, PowerShell, backup script etc) so that it can be distributed without any setup requirements.

Resources