devops git, powershell

Having lots of projects and assets stored on GitHub I thought it might be a good idea to create periodical backups of my entire GitHub account (all repositories, both public and private). The beauty of it is since Git is open source, this way I can migrate my account to anywhere and even host it on my own server on AWS.

Challenges

With the above goal in mind, I started to outline what’s necessary to achieve this task:

  1. Automate calling GitHub API to get all repos including private ones. (Of course one should be aware of GitHub API rate limits which is currently 5000 requests per hour. If you use up all your allowance with your scripts you may not be able to use it yourself. Good thing is they are returning how many calls are left before you exceed your quota in x-ratelimit-remaining HTTP header in their responses.)
  2. Pull all the latest versions for all branches. Overwrite local versions in cases of conflict.
  3. Find a way to easily transfer a git repository (A compressed single file version rather than individual files) if transferring to another medium is required (such as an S3 bucket)

With these challenges ahead, I first started looking into getting the repos from GitHub:

Consuming GitHub API via PowerShell

First, I shopped around for existing libraries for this task (such as PowerShellForGitHub by Microsoft but it didn’t work for me. Basically I couldn’t even manage the samples on their Wiki. It kept giving cmdlet not found error so I gave up.)

Found a nice video on Channel 9 about consuming REST APIs via PowerShell which uses GitHub API as a case study. It was perfect for me as my goal was to use GitHub API anyway. And since this is a generic approach to consume APIs it can come handy in the future as well. It’s quite easy using basic authentication.

Authorization

First step, is to create a Personal Access Token with repo scope. (Make sure to copy the value before you close the page, there is no way to retrieve it afterwards.)

After the access token has been obtained, I had to generate authorization header as shown in the Channel 9 video:

$token = '<YOUR GITHUB ACCOUNT NAME>:<PERSONAL ACCESS TOKEN>'
$base64Token = [System.Convert]::ToBase64String([char[]]$token)
$headers = @{
    Authorization = 'Basic {0}' -f $base64Token
};

$response = Invoke-RestMethod -Headers $headers -Uri https://api.github.com/user/repos

This way I was able to get the repositories including the private ones but by default it returns 30 records on a page so I had to traverse over the pages .

Handling pagination

GitHub sends the next and the last page URLs in link header:

<https://api.github.com/user/repos?page=2>; rel="next", <https://api.github.com/user/repos?page=3>; rel="last"

The challenge here is that looks like Invoke-RestMethod response doesn’t allow to access headers which is a huge bummer as there are useful info in the headers as shown in the screenshot:

GitHub response headers in Postman

At this point, I wanted to use PSGitHub mentioned in the video but as of this writing it doesn’t support getting all repositories. In fact in a note it says “We need to figure out how to handle data pagination” which made me think we are on the same page here (no pun intended!)

GitHub supports a page size parameter (e.g. per_page=50) but the documentation says the maximum value is 100. Although it is tempting to use that one as that would bring all my repos and leave some room for the future ones as well I wanted to go with a more permanent solution. So I decided to request more pages as longs as there are objects returning like this

$page = 1

Do
{
    $response = Invoke-RestMethod -Headers $headers -Uri "https://api.github.com/user/repos?page=$page"
    
    foreach ($obj in $response)
    {
        Write-Host ($obj.id)
    }
    
    $page = $page + 1
}
While ($response.Count -gt 0)

Now in the foreach loop of course I have to do something with the repo information instead of just printing the id.

Cloning / pulling repositories

At this point I was able to get all my repositories. GitHub API only handles account information so now I needed to able to run actual git commands to get my code.

First I had installed PowerShell on Mac which is quite simple as specified in the documentation:

brew tap caskroom/cask
brew cask install powershell

With Git already installed on my machine, all is left was using Git commands to clone or update repo on PowerShell terminal such as:

git fetch --all
git reset --hard origin/master

Since this is just going to be a backup copy I don’t want to deal with merge conflicts and just overwriting everything local.

Another approach could be deleting the old repo and cloning it from scratch but I think this would be a bit wasteful to do it everytime for each and every repository.

Putting it all together

Now that I have all the bits and pieces I have glue them together in a meaningful script than can be scheduled and here it is:

Conclusion and Future Improvements

This version accomplishes the basic task of backing up an entire GitHub account but it can be improved in a few ways. Maybe I can post a follow up article including those improvements. A few ideas come to mind are:

  • Get Gists (private and public) as well.
  • Add option to exclude repos by name or by type (i.e. get only private ones or get all except repo123)
  • Add an option to export them to a “non-git” medium such as an S3 bucket using git bundle (which turns out to be a great tool to pack everything in a repository in a single file)
  • Create a Docker image that contains all the necessary software (Git, PowerShell, backup script etc) so that it can be distributed without any setup requirements.

Resources

awsdevops s3, mysql, powershell

I have an application that uses MySQL database. Because of cost concerns it’s running on an EC2 instance instead of RDS. As it’s not a managed environment, the burden of backing up my data falls on me. This is a small step by step guide that details how I’m backing up my MySQL database to AWS S3 with PowerShell

PART 01 - AWS SETUP

  1. Create a bucket named (e.g. “application-backups”) on AWS S3 using AWS Management Console.
  2. Create a new IAM user (e.g “upload-backup-to-s3”)
  3. Create a new policy using management console. The policy will only give enough permissions to put objects into a single S3 bucket:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::xxxxxxx"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": [
                "arn:aws:s3:::xxxxxxx/*"
            ]
        }
    ]
}

In S3 bucket properties copy ARN and replace the x’s above with that.

  1. Customize S3 bucket LifeCycle settings to determine how long you want the old logs retain in your bucket. In my case I set it to expire at 21 days which I think it’s large enough window for relevant database backups. I probably wouldn’t restore anything older than 21 days anyway.

  2. [Optional - For email notifications] Create an SES user by clicking SES -> SMTP Settings -> Create My SMTP Credentials

Make sure you don’t make the mistake I made which was creating an IAM user with a policy that can send emails. In the SMTP settings page there’s a note right below the button:

Your SMTP user name and password are not the same as your AWS access key ID and secret access key. Do not attempt to use your AWS credentials to authenticate yourself against the SMTP endpoint.

When you click the button it creates an IAM user basically for the secret key is 44 bytes whereas the IAM user I created had a secret key of 40 characters. Anyway, bottom line is in order to be able to send emails via SES create the user as described above and all should be fine.

PART 02 - POWERSHELL SCRIPT

  1. Download and install AWS Tools for Windows PowerShell (https://aws.amazon.com/powershell/)

  2. Create a script as shown below. In a nutshell what the script does is:

a. Execute mysqldump command (Comes with MySQL Server)

b. Zip the backup file (which reduces the size significantly)

c. Upload the zip file to S3 bucket

d. Send a notification email using SES

e. Delete the local files

This is the full script:

  • For SQL Server databases, there’s a PowerShell cmdlet called Backup-SQLDatabase but for MySQL I think the most straightforward way is using mysqldump that comes with MySQL server.

  • For password-protected zip files, you can take a look at this article (I haven’t tried it myself)

Final step: Schedule the script by using Windows Task Scheduler

This is quite straightforward. Just create a task, schedule it to how often you want to backup your database.

In the actions section, enter “powershell” as “program/script” and the path of your PowerShell script as “argument” and that’s it.

Resources

hobby synology, music, streaming

I’ve been a Spotify customer for quite long time but recently realized that I wasn’t using it enough to justify 10 quid per month. Amazon made a great offer for 4 months subscription for only £0.99 and I’m trying that out now but the quality of the service didn’t impress so far. Then it dawned on me: I already have lots of MP3s from my old archives, I have a fast internet connection and I have a Synology. Why not just build my own streaming?

One device to rule them all: Synology

Everyday I’m growing more fond of my Synology and regretting for all that time I haven’t utilized it fully.

For streaming audio, we need the server and client software. The server side comes with Synology: Audio Station

The Server

Using Synology Audio Station is a breeze. You simply connect to Synology over the network and copy your albums into the music folder. Try to have a cover art named as “cover.jpg” so that your albums shows nicely on the user interface.

The Client

Synology has a suite of iOS applications which are available in the Apple App Store. The one I’m using for audio streaming is called DS Audio.

By using Synology’s Control Panel you can use a specific user for listening to music only. This way even if your account is compromised the attacker will only have read-only access to your music library.

Connecting to the server

There are two ways of connecting to your server:

  1. Dynamic DNS
  2. Quick Connect (QC)

Dynamic DNS is a builtin functionality but you’d need a Synology account. Basically your Synology pings their server so that it can detec the IP changes.

QC is the way I chose to go with. It’s a proprietary technology by Synology. The nice thing about QC is when you are connected to your local network it uses the internal IP so it doesn’t use mobile data. When you’re outside it uses the external IP and connects over the Internet.

Features

  • You can download all the music you want from your own library without any limitations. There’s no limit set for manual downloads. For automatic downloads you can choose from no caching to caching everything or choose a fixed size from 250MB to 20GB.
  • When you’re offline you don’t need to login. On login form there’s a link to Downloaded Songs so you can skip logging in and go straight to your local cache.
  • You can pin your favourite albums to home screen.
  • Creating a playlist or adding songs to playlists is cumbersome (on iPhone at least):
    • Select a song and tap on … next to the song
    • Tap Add. This will add your song to the play queue.
    • Tap on Play button on top right corner.
    • Tap playlist icon on top right corner.
    • Tap the same icon again which is now on top left corner to go into edit mode
    • Now tap on the radio buttons on the left of the songs to select.
    • When done, tap on the icon on the bottom left corner. This will open the Add to Playlist screen (finally!)
    • Here you can choose an existing playlist or create a new one by clicking + icon.

Considering how easy this can be done on Spotify client this really needs to be improved.

  • In the library or Downloaded Songs sections, you can organise your music by Album, Artist, Composer, Genre and Folder. Of course in order for Artist/Composer/Genre classification to work you have to have your music properly tagged.
  • The client has Radio featue which has builtin support for SHOUTCast

SHOUTCast

  • You can rate songs. There’s a built-in Top Rated playlist. By rating them you can play your favourite songs without needing them to be added to playlists which is a neat feature.

Conclusion

I think having full control over my own music is great and even though DS Audio client has some drawbacks it’s worth it as it’s completely free. Also you can just set it up as a secondary streaming service in addition to your favourite paid one just in case so that you have a backup solution.

Resources