awsdevhobby gadget, raspberry_pi, route53

Everything is *-as-a-service nowadays and books are no exception. I have a Safari Books Online subscription which I enjoy a lot. It is extremely convenient to have thousands of books at your fingertips. But… DIY still rules! There are times you may still want to have your own collection and it doesn’t just have to be an e-book collection. And on top of all it’s purely fun and educational.

Ingredients

Basically all you need is an up-and-running Raspberry Pi. If you have one, you can skip this section. These are just the components I used in my implementation:

Keyboard and display are needed for the initial setup. Once it’s connected to network you can do everything over SSH.

Calibre, my old friend!

I’ve been a fan of Calibre for many years now. With Calibre you can manage any document collection you want. I love its user interface which allows me to easily tag and categorize my collections. Also it can convert between a wide range of formats. And when I plug in my Kindle it automatically recognizes the device and I can right-click on a book and send to device very easily. Check out this page for a full list of features.

My favorite feature is that it can act as a server. I mainly use Stanza on my iPad and connect to my Calibre server to download comic books over WiFi. The downside of running it locally on my computer is that the machines needs to be on and I have to enable the content server on Calibre manually before connecting from iPad.

Here comes the project

Instead, what I’d like to have is

  • An online server available all the time: Raspberry pi is very power-efficient little monster so I can keep it running

  • Isolated from my main machine: For security reasons I don’t want to open a port on my desktop computer

  • Accessible both from inside and outside: Of course I could just launch a cheap AWS instance and use it as the content server but

    • It’s not as fun!
    • If I need to move around GBs of data local network rocks!

Also, as I said it’s mostly for fun so I don’t have to justify it completely to myself :-)

Roll up your sleeves!

Step 0: Setup Raspberry Pi

If you haven’t done it already you can easily set it up by following the Quick Start Guide on raspberrypi.org

Step 1: Install Calibre on Raspberry Pi

This one was a bit harder than I expected. The official site says “Please do not use your distribution provided calibre package, as those are often buggy/outdated. Instead use the Binary install described below.”

and the suggested command is

sudo -v && wget -nv -O- https://raw.githubusercontent.com/kovidgoyal/calibre/master/setup/linux-installer.py | sudo python -c "import sys; main=lambda:sys.stderr.write('Download failed\n'); exec(sys.stdin.read()); main()"

Alas, after running the command I got the following error:

Calibre installation error

I asked in the Calibre forums about the error and I was advised to build it from source code. Because the compiled version is for Intel processors and it doesn’t work on an ARM processor which Raspberry Pi has. The instructions for building it from source is on the same page but I haven’t tried it myself.

As a fallback method I simply used apt-get to install it:

sudo apt-get update && sudo apt-get install calibre

It worked fine but the version is 0.8.51 (latest release at the time of this writing is 2.20.0 so you can see it’s a little bit outdated!). Content server has been implemented long time ago so for all intents and purposes it’s good enough for this project.

Step 2: Run it as server

Now that we have Calibre installed we can run the content server from command line:

calibre-server --with-library=/home/pi/calibre/myLibrary --daemonize

This will run the process in the background (because of the –daemonize flag) but id the Pi restarts it will not run automatically. To fix that I added the command to crontab by first entering the following command

crontab -e

and adding the following line after the comments

@reboot calibre-server --with-library=/home/pi/calibre/myLibrary --daemonize

so that the same command is run after every reboot.

Crontab on Raspberry Pi

Now let’s test if we’re online. By default, Calibre starts serving on port 8080 with no authentication required. So just find the IP address of the Raspberry Pi and try to connect it over HTTP from your machine such as http://{Local IP}:8080

and voila!

Now we can add some books and start using it from any machine on the network.

Step 3: Add some books

First I uploaded some files to a different folder using WinSCP. If you are not on Windows I’m sure you can find a similar tool to transfer files to Raspberry Pi.

We can add books by using calibredb command like this:

calibredb add Raspberry_Pi_Education_Manual.pdf --with-library=/home/pi/calibre/myLibrary

Please note if you try to use calibre instead of calibredb you’d get the following error:

calibre: cannot connect to X server 

Because we are using the GUI we cannot use calibre directly, instead we add it using calibredb.

Calibre always copies the files to its own library so once the books are added you can delete the original ones.

After the files are added refresh the page and you should get something like this:

At this point we can download the books on any machine on the local network.

Step 4: Connect from clients

  • Kindle

Kindle has an experimental browser (I currently have a Paperwhite, I don’t know about the newer versions). So to download books, I simply go to Settings -> Experimental Browser and enter the URL of my content server (http://{Local IP}:8080):

And after you download the file you can go to home and enjoy the book on your Kindle.

Please note that Kindle cannot download PDFs. When I tried to download Raspberry Pi manual I got the following error

Only files with the extension .AZW, .PRC, .MOBI or .TXT can be downloaded to your Kindle.

So make sure you upload the the right file formats.

  • iPad / Stanza

This is my favorite app on iPad. It’s not even on AppStore anymore but I still use it to download books from Calibre.

All I had to do was click Get Books and it found the OPDS server on the network automatically so I could browse and download books right away.

Stanza

  • iPad / Web

Alternatively you can just browse to server and open it with any eBook reader app available on your iPad.

Calibre UI on iPad

[Optional] Step 5: Setup Port Forwarding

For internal usage we are done! If you want to access your library over the Internet you have to define port forwarding rule. The way to do it is completely dependant on your router so you have to fiddle with your router’s administration interface.

Basically you map an external port to an internal IP and port.

For example I mapped port 7373 to local 192.168.175:8080 so whenever I connect to my home network’s external IP on port 7373 I get my Calibre user interface.

I recommend running the server with –username and –password flags so that only authenticated users can browse your library.

[Optional] Step 6: Setup Dynamic DNS

If you have a static IP you don’t need this step at all but generally personal broadbands don’t come with static IPs. My solution for this was using AWS Route 53 and updating the DNS using AWS Python SDK (Boto).

First I had to install pip to be able to install boto

sudo apt-get install python3-pip

Then boto itself

sudo pip install boto

I created an IAM user that only has access to a single domain which I use for this kind of stuff on Route 53 and added its credentials to the AWS credentials file as explained in the nice and brief tutorial here

The script calls AWS’s external IP checker and stores it in currentIP. Then gets the hosted zone and loops through all the record sets. When it finds the subdomain I’m going to use for Calibre (‘calibre.volki.info.’) it updates the IP address with the currentIP and commits the changes. Thanks to AWS that’s all it takes to create a Dynamic DNS application.

import boto.route53
import urllib2
currentIP = urllib2.urlopen("http://checkip.amazonaws.com/").read()

conn = boto.connect_route53()
zone = conn.get_zone("volki.info.")
change_set = boto.route53.record.ResourceRecordSets(conn, '{HOSTED_ZONE_ID}')

for rrset in conn.get_all_rrsets(zone.id):
    if rrset.name == 'calibre.volki.info.':
        u = change_set.add_change("UPSERT", rrset.name, rrset.type, ttl=60)
        rrset.resource_records[0] = currentIP
        u.add_value(rrset.resource_records[0])
        results = change_set.commit()

Of course this script needs to be added to crontab and should be run every 5-10 minutes. If the external IP changes there might be some disturbance to the service but it should just take a few minutes before it’s resolved.

With this script now we can access our library over the Internet and we don’t have to worry about changes in the IP address.

Resources

dev neo4j, graph_databases, cypher

Visual tools are nice and all but they are not as fun as playing with a query language. When you write your own queries the possibilities are endless! In this post I’ll cover the basics of Cypher query language. Cypher is a declarative, SQL-like, pattern-matching query language for Neo4J graph databases.

Basic CRUD Operations

MATCH

Match clause allows to define patterns to search the database. It is similar to SELECT in the RDBMS-world.

MATCH (n) RETURN n

The query above returns all nodes in the database. In this example n is a temporary variable to store the result. The results can be filtered based on labels such as:

MATCH (n:Person) RETURN n

Property-based filters can be used in both MATCH and WHERE clauses. For example the two queries below return the same results:

MATCH (n:Movie {title: "Ninja Assassin"})
RETURN n
MATCH (n:Movie)
WHERE n.title = "Ninja Assassin"
RETURN n

Instead of returning the entire node we can just select some specific properties. If a node doesn’t have the property it simply returns null.

MATCH (a:Person)
RETURN a.name, a.born

Query returning all actors' names and years they were born in

Results can be filtered by relationships as well. For example the query below returns the movies Tom Hanks acted in

MATCH (p {name:"Tom Hanks"})-[:ACTED_IN]->(m)
RETURN p, m

Query results for movies Tom Hanks acted in

Sometimes we might need to learn the relationship from the query. In that case we can use TYPE function to get the name of the relationship:

MATCH (a)-[r1]->(m)<-[r2]-(a)
RETURN a.name, m.title, type(r1), type(r2)

Actors having multiple relationships with the same movie

Relationship type and direction can be omitted in the queries. The following query returns the names of the movies that have “some sort of relationship” with Tom Hanks:

Movie with Tom Hanks

OPTIONAL MATCH: OPTIONAL keyword fills in the missing parts in the results with nulls. For example the query below returns 1 row as null because there is no outgoing relationship from the movie The Matrix. If we didn’t use OPTIONAL we would have an empty resultset.

MATCH (a:Movie {title: 'The Matrix'})
OPTIONAL MATCH (a)-[]->(d)
RETURN d  

CREATE

To add new data CREATE query is used. In the simplest form the following query creates (a rather useless) node:

CREATE ()

Labels and properties can be set while creating new nodes such as:

CREATE (a:Actor {name: "Leonard Nimoy", born: 1931, died: 2015})

Relationships are created with CREATE as well. For example the following query creates 2 nodes with Person label and creates a MARRIED_TO relationship between them:

CREATE (NedStark:Person {firstname: "Eddard", lastname: "Stark", alias: "Ned", gender: "male"}),
	   (CatelynStark:Person {firstname: "Catelyn", lastname: "Stark", maidenName: "Tully", gender: "female"})
CREATE (NedStark)-[:MARRIED_TO]->(CatelynStark)

Relationships can have properties too:

CREATE
	(WaymarRoyce)-[:MEMBER_OF {order:"Ranger"}]->(NightsWatch)

Properties can be added or updated by using SET keyword such as:

MATCH (p:Person {firstname: "Eddard"})
SET p.title = "Lord of Winterfell"
RETURN p

The above query adds a “title” property to the nodes with label Person and with firstname “Eddard”.

An existing property can be deleted by REMOVE keyword

MATCH (p:Person {firstname: "Eddard"})
REMOVE p.aliasList

MERGE

Merge can be used to create new nodes/relationships or update them if they already exist. In the case of update, all the existing properties must match. For example the following query adds a new property to the node named House Stark of Winterfell

CREATE (houseStark:House {name: "House Stark of Winterfell", words: "Winter is coming"})

MERGE (houseStark {name: "House Stark of Winterfell", words: "Winter is coming"})
SET houseStark.founded = "Age of Heroes"

The following one, on the other hand creates a new node with the same name (because the words properties don’t match):

MERGE (houseStark:House {name: "House Stark of Winterfell", words: "Winter is coming!!!"})
SET houseStark.founded = "Age of Heroes"
RETURN houseStark

I find helpful when you have a long Cyper query and you might run it multiple times. If you just use CREATE every time you run the query you will end up with new nodes. If you use MERGE it will not give any errors and will not create new nodes.

Another way to prevent this is unique constraints. For example the following query will enforce uniqueness on _id property for nodes labelled as Book.

CREATE CONSTRAINT ON (book:Book) ASSERT book._id IS UNIQUE

Now we can run this query and create the books:

CREATE 
	(AGameOfThrones:Book {_id: "7abe0a1c-b9bd-4f00-b094-a82dfb32b053", title: "A Game of Thrones", orderInSeries: 1, released:"August 1996"}),
 	(AClashOfKings:Book {_id: "051dae64-dfdb-4134-bc43-f6d2b4b57d37", title: "A Clash of Kings", orderInSeries: 2, released:"November 1998"}),
 	(AStormOfSwords:Book {_id: "e9baa042-2fc8-49a6-adcc-6dd455f0ba12", title: "A Storm of Swords", orderInSeries: 3, released:"August 2000"}),
 	(AFeastOfCrows:Book {_id: "edffaa47-0110-455a-9390-ad8becf5c549", title: "A Feast for Crows", orderInSeries: 4, released:"October 2005"}),
 	(ADanceWithDragons:Book {_id: "5a21b80e-f4c4-4c15-bfa9-1a3d7a7992e3", title: "A Dance with Dragons", orderInSeries: 5, released:"July 2011"}),
 	(TheWindsOfWinter:Book {_id: "77144f63-46fa-49ef-8acf-350cdc20bf07", title: "The Winds of Winter", orderInSeries: 6 })

If we try to run it again we get the following error:

Cypher constraint error

Unique constraint enforces the existing data to be unique as well. So if we run the book creation query twice, before the constraint, then try to create the constraint we get an error:

To delete the constraint we use DROP keyword such as

DROP CONSTRAINT ON (book:Book) ASSERT book._id IS UNIQUE

DELETE

To delete nodes first we find them by MATCH. Instead of RETURN in the above examples we use DELETE to remove them.

MATCH (b:Book {_id: "7abe0a1c-b9bd-4f00-b094-a82dfb32b053"})
DELETE b

The above query would only work if the node doesn’t have any relationships. To delete a node as well as its relationships we can use the following:

MATCH (b {_id: "7abe0a1c-b9bd-4f00-b094-a82dfb32b053"})-[r]-()
DELETE b, r

We can generalize the above query to clean the entire database:

MATCH (n)
OPTIONAL MATCH (n)-[r]-() 
DELETE n, r

I find this quite useful during test and development phase as I need to start over quite often.

Useful keywords

  • ORDER BY

It is almost mandatory in all SQL-variants. Eventually you have to sort the data. For example the following returns all actors order by their names in alphabetical order:

  • LIMIT

Sometimes your query may return a lot of values that you’re not interested in. For example you may want to get top n results. In this case you can restrict the number of rows returned by LIMIT keyword such as:

The above query returns the top 3 actors who have the most ACTED_IN relationships with movies in descending order.

  • UNION

Another similar keyword from RDBMS is UNION. Similar to standard SQL, the returned columns must have the same number and names. For example the following will fail:

MATCH (p:Person)
RETURN p.name
UNION
MATCH (m:Movie)
RETURN m.title

But with using an alias it can be fixed like this:

MATCH (p:Person)
RETURN p.name AS name
UNION
MATCH (m:Movie)
RETURN m.title AS name

Measuring Query Performance

PROFILE keyword is very helpful as it lets you see the query plan and optimize it. It is very simple to use: You just put it before the MATCH clause such as

This is obviously a very simplistic example but I strongly recommend the GrapAware article on Labels vs Indexed Properties. PROFILE is heavily used to identify which option performs better in a given scenario. Also a great read to learn more about modelling Neo4J database.

Conclusion

Cypher is quite powerful and it can be very expressive in the right hands. It is quite comprehensive to cover in a blog post. I just wanted this post to contain various samples for the basic queries. I hope it can be of some help for someone new to Cypher in conjunction with the references I listed below.

Resources

aws news

AWS team is certainly keeping themselves very busy. If you turn your head away for a while you end up falling behind the new features. In this post I’ll cover some of the fairly new features and improvements I’ve recently discovered and found useful and interesting.

Reserved Instance pricing model improvements

In the new simplified model they introduced 3 options: No upfront, partial upfront and full upfront. No upfront is a great feature as you don’t need to pay big amounts and just start paying less. Reserved instances are generally great for servers that run 24/7. For example currently if you launch a t2.micro Linux instance in EU-Ireland region you pay $0.014/hour. If you switch to No Upfront option the hourly rate falls to $0.010. (This is assuming the server will run continuously. You pay $7.30/month regardless of the instance’s state.)

AWS Reserved Instance Pricing

Auto recovery

This is a very cool feature that helps a great deal with system. maintenance. It’s not available in all regions yet (just N. Virginia as of this writing) but I’m guessing it will be rolled out to other regions soon. Now with this feature we can create an alarm for StatusCheckFailed_System metric and select “Recover this instance” as an action

AWS Instance Recovery

So when a machine is unresponsive it can be restarted automatically. Considering how many issues a simple restart can solve I think it’s a great feature :-)

Resource Groups

I’ve always found AWS Management Console a very dangerous place as all machines (dev, test prod) are all pooled and listed in the same place. In a careless moment you can easily terminate a production machine instead of a temporary test machine (that’s why I always turn on termination protection).

Resource groups, as the name implies, are meant to group related items. Although it’s primary goal is not to solve the problem I mentioned above (maybe it’s just me who sees it as a problem) but that’s I initially will use it for.

AWS Resource Groups Overview

Using resource groups is dead simple: Just add a tag to your resources (I added one called “type”) and create a resource group and use that tag as a filter. So now instead of listing the entire EC2 instancess I can just click on Dev group and play around with my development servers without even seeing the production machines.

AWS Resource Group Details

Lambda

AWS Lambda is a compute service that allows you to execute your code without launching an EC2 instance. It’s in preview mode at the moment. It’s not meant to run full-blown applications and you have no control over the actual machines running the code. It’s great for simple even-handlers such as triggering a function when an image is uploaded etc. Currently I’m working on an application that’s going to use this service so more on that later.

Aurora

I haven’t tried this one myself yet and currently it’s in preview stage. The premise is it’s fully-compatible with MySQL and costs 1/10th of a regular MySQL instance. It’s not quite clear whether it’s a fork of MySQL like MariaDB or just an independent RDBMS but either way being able to use it directly as a MySQL replacement sounds good. As an AWS fan I’m sure performance-wise it would be everything they said it would so no doubts there. I signed up for the preview and waiting for approval. When I get to try it out I will post my impressions as a separate blog post.

Conclusion

There is a big competition going on between the cloud computing platforms these days. Prices are constantly going down and new features are being added so it’s quite a lot of fun to follow the news in this market. I’ll try to keep an eye on new AWS features more closely from now on.

Resources