dev debug

Even though APIs are RESTful these days sometimes you might need to interact with a SOAP-based web service.

I had to consume an SOAP XML web service the other day and encountered a strange problem while debugging my test application. The application was working fine but when I tried to debug it was closing silently. So as a first step I opened Debug –> Exceptions and checked all to let the application break upon any type of exception it gets.

Break when any exception is thrown

After running the application with this setting at least I was able to see what the fuss was all about:

SOAP BindingFailure exception

There are various approaches to resolve this issue and finally I found the correct one on a Stackoverflow answer: Go to Project Properties and in the Build tab turn on Generate Serialization Assembly setting.

Turn on Generate Serialization Assembly setting

When this setting it turned on, it generates an assembly named {Your Application Name}.XmlSerializers.dll.

Out of curiosity I peeked into the assembly with ILSpy and it looks like this:

XmlSerializers.dll in ILSpy

Basically it just generates an abstract class deriving from XmlSerializer (named XmlSerializer1) and generates a bunch of sealed child classes deriving from that class.

I’ve never been a fan of auto-generated code anyway and looks like I’m not going to need it in my code but it’s used in the background by the framework. I added links to a few Stackoverflow answers related to that assembly though. I gather the advantage of turning it on is reduced startup time and being able to debug in this case. The disadvantage is increased deployment size which I think is negligible in this day and age so I’ll just keep it on and debug happily ever after!

Resources

awsdevhobby gadget, raspberry_pi, route53

Everything is *-as-a-service nowadays and books are no exception. I have a Safari Books Online subscription which I enjoy a lot. It is extremely convenient to have thousands of books at your fingertips. But… DIY still rules! There are times you may still want to have your own collection and it doesn’t just have to be an e-book collection. And on top of all it’s purely fun and educational.

Ingredients

Basically all you need is an up-and-running Raspberry Pi. If you have one, you can skip this section. These are just the components I used in my implementation:

Keyboard and display are needed for the initial setup. Once it’s connected to network you can do everything over SSH.

Calibre, my old friend!

I’ve been a fan of Calibre for many years now. With Calibre you can manage any document collection you want. I love its user interface which allows me to easily tag and categorize my collections. Also it can convert between a wide range of formats. And when I plug in my Kindle it automatically recognizes the device and I can right-click on a book and send to device very easily. Check out this page for a full list of features.

My favorite feature is that it can act as a server. I mainly use Stanza on my iPad and connect to my Calibre server to download comic books over WiFi. The downside of running it locally on my computer is that the machines needs to be on and I have to enable the content server on Calibre manually before connecting from iPad.

Here comes the project

Instead, what I’d like to have is

  • An online server available all the time: Raspberry pi is very power-efficient little monster so I can keep it running

  • Isolated from my main machine: For security reasons I don’t want to open a port on my desktop computer

  • Accessible both from inside and outside: Of course I could just launch a cheap AWS instance and use it as the content server but

    • It’s not as fun!
    • If I need to move around GBs of data local network rocks!

Also, as I said it’s mostly for fun so I don’t have to justify it completely to myself :-)

Roll up your sleeves!

Step 0: Setup Raspberry Pi

If you haven’t done it already you can easily set it up by following the Quick Start Guide on raspberrypi.org

Step 1: Install Calibre on Raspberry Pi

This one was a bit harder than I expected. The official site says “Please do not use your distribution provided calibre package, as those are often buggy/outdated. Instead use the Binary install described below.”

and the suggested command is

sudo -v && wget -nv -O- https://raw.githubusercontent.com/kovidgoyal/calibre/master/setup/linux-installer.py | sudo python -c "import sys; main=lambda:sys.stderr.write('Download failed\n'); exec(sys.stdin.read()); main()"

Alas, after running the command I got the following error:

Calibre installation error

I asked in the Calibre forums about the error and I was advised to build it from source code. Because the compiled version is for Intel processors and it doesn’t work on an ARM processor which Raspberry Pi has. The instructions for building it from source is on the same page but I haven’t tried it myself.

As a fallback method I simply used apt-get to install it:

sudo apt-get update && sudo apt-get install calibre

It worked fine but the version is 0.8.51 (latest release at the time of this writing is 2.20.0 so you can see it’s a little bit outdated!). Content server has been implemented long time ago so for all intents and purposes it’s good enough for this project.

Step 2: Run it as server

Now that we have Calibre installed we can run the content server from command line:

calibre-server --with-library=/home/pi/calibre/myLibrary --daemonize

This will run the process in the background (because of the –daemonize flag) but id the Pi restarts it will not run automatically. To fix that I added the command to crontab by first entering the following command

crontab -e

and adding the following line after the comments

@reboot calibre-server --with-library=/home/pi/calibre/myLibrary --daemonize

so that the same command is run after every reboot.

Crontab on Raspberry Pi

Now let’s test if we’re online. By default, Calibre starts serving on port 8080 with no authentication required. So just find the IP address of the Raspberry Pi and try to connect it over HTTP from your machine such as http://{Local IP}:8080

and voila!

Now we can add some books and start using it from any machine on the network.

Step 3: Add some books

First I uploaded some files to a different folder using WinSCP. If you are not on Windows I’m sure you can find a similar tool to transfer files to Raspberry Pi.

We can add books by using calibredb command like this:

calibredb add Raspberry_Pi_Education_Manual.pdf --with-library=/home/pi/calibre/myLibrary

Please note if you try to use calibre instead of calibredb you’d get the following error:

calibre: cannot connect to X server 

Because we are using the GUI we cannot use calibre directly, instead we add it using calibredb.

Calibre always copies the files to its own library so once the books are added you can delete the original ones.

After the files are added refresh the page and you should get something like this:

At this point we can download the books on any machine on the local network.

Step 4: Connect from clients

  • Kindle

Kindle has an experimental browser (I currently have a Paperwhite, I don’t know about the newer versions). So to download books, I simply go to Settings -> Experimental Browser and enter the URL of my content server (http://{Local IP}:8080):

And after you download the file you can go to home and enjoy the book on your Kindle.

Please note that Kindle cannot download PDFs. When I tried to download Raspberry Pi manual I got the following error

Only files with the extension .AZW, .PRC, .MOBI or .TXT can be downloaded to your Kindle.

So make sure you upload the the right file formats.

  • iPad / Stanza

This is my favorite app on iPad. It’s not even on AppStore anymore but I still use it to download books from Calibre.

All I had to do was click Get Books and it found the OPDS server on the network automatically so I could browse and download books right away.

Stanza

  • iPad / Web

Alternatively you can just browse to server and open it with any eBook reader app available on your iPad.

Calibre UI on iPad

[Optional] Step 5: Setup Port Forwarding

For internal usage we are done! If you want to access your library over the Internet you have to define port forwarding rule. The way to do it is completely dependant on your router so you have to fiddle with your router’s administration interface.

Basically you map an external port to an internal IP and port.

For example I mapped port 7373 to local 192.168.175:8080 so whenever I connect to my home network’s external IP on port 7373 I get my Calibre user interface.

I recommend running the server with –username and –password flags so that only authenticated users can browse your library.

[Optional] Step 6: Setup Dynamic DNS

If you have a static IP you don’t need this step at all but generally personal broadbands don’t come with static IPs. My solution for this was using AWS Route 53 and updating the DNS using AWS Python SDK (Boto).

First I had to install pip to be able to install boto

sudo apt-get install python3-pip

Then boto itself

sudo pip install boto

I created an IAM user that only has access to a single domain which I use for this kind of stuff on Route 53 and added its credentials to the AWS credentials file as explained in the nice and brief tutorial here

The script calls AWS’s external IP checker and stores it in currentIP. Then gets the hosted zone and loops through all the record sets. When it finds the subdomain I’m going to use for Calibre (‘calibre.volki.info.’) it updates the IP address with the currentIP and commits the changes. Thanks to AWS that’s all it takes to create a Dynamic DNS application.

import boto.route53
import urllib2
currentIP = urllib2.urlopen("http://checkip.amazonaws.com/").read()

conn = boto.connect_route53()
zone = conn.get_zone("volki.info.")
change_set = boto.route53.record.ResourceRecordSets(conn, '{HOSTED_ZONE_ID}')

for rrset in conn.get_all_rrsets(zone.id):
    if rrset.name == 'calibre.volki.info.':
        u = change_set.add_change("UPSERT", rrset.name, rrset.type, ttl=60)
        rrset.resource_records[0] = currentIP
        u.add_value(rrset.resource_records[0])
        results = change_set.commit()

Of course this script needs to be added to crontab and should be run every 5-10 minutes. If the external IP changes there might be some disturbance to the service but it should just take a few minutes before it’s resolved.

With this script now we can access our library over the Internet and we don’t have to worry about changes in the IP address.

Resources

dev neo4j, graph_databases, cypher

Visual tools are nice and all but they are not as fun as playing with a query language. When you write your own queries the possibilities are endless! In this post I’ll cover the basics of Cypher query language. Cypher is a declarative, SQL-like, pattern-matching query language for Neo4J graph databases.

Basic CRUD Operations

MATCH

Match clause allows to define patterns to search the database. It is similar to SELECT in the RDBMS-world.

MATCH (n) RETURN n

The query above returns all nodes in the database. In this example n is a temporary variable to store the result. The results can be filtered based on labels such as:

MATCH (n:Person) RETURN n

Property-based filters can be used in both MATCH and WHERE clauses. For example the two queries below return the same results:

MATCH (n:Movie {title: "Ninja Assassin"})
RETURN n
MATCH (n:Movie)
WHERE n.title = "Ninja Assassin"
RETURN n

Instead of returning the entire node we can just select some specific properties. If a node doesn’t have the property it simply returns null.

MATCH (a:Person)
RETURN a.name, a.born

Query returning all actors' names and years they were born in

Results can be filtered by relationships as well. For example the query below returns the movies Tom Hanks acted in

MATCH (p {name:"Tom Hanks"})-[:ACTED_IN]->(m)
RETURN p, m

Query results for movies Tom Hanks acted in

Sometimes we might need to learn the relationship from the query. In that case we can use TYPE function to get the name of the relationship:

MATCH (a)-[r1]->(m)<-[r2]-(a)
RETURN a.name, m.title, type(r1), type(r2)

Actors having multiple relationships with the same movie

Relationship type and direction can be omitted in the queries. The following query returns the names of the movies that have “some sort of relationship” with Tom Hanks:

Movie with Tom Hanks

OPTIONAL MATCH: OPTIONAL keyword fills in the missing parts in the results with nulls. For example the query below returns 1 row as null because there is no outgoing relationship from the movie The Matrix. If we didn’t use OPTIONAL we would have an empty resultset.

MATCH (a:Movie {title: 'The Matrix'})
OPTIONAL MATCH (a)-[]->(d)
RETURN d  

CREATE

To add new data CREATE query is used. In the simplest form the following query creates (a rather useless) node:

CREATE ()

Labels and properties can be set while creating new nodes such as:

CREATE (a:Actor {name: "Leonard Nimoy", born: 1931, died: 2015})

Relationships are created with CREATE as well. For example the following query creates 2 nodes with Person label and creates a MARRIED_TO relationship between them:

CREATE (NedStark:Person {firstname: "Eddard", lastname: "Stark", alias: "Ned", gender: "male"}),
	   (CatelynStark:Person {firstname: "Catelyn", lastname: "Stark", maidenName: "Tully", gender: "female"})
CREATE (NedStark)-[:MARRIED_TO]->(CatelynStark)

Relationships can have properties too:

CREATE
	(WaymarRoyce)-[:MEMBER_OF {order:"Ranger"}]->(NightsWatch)

Properties can be added or updated by using SET keyword such as:

MATCH (p:Person {firstname: "Eddard"})
SET p.title = "Lord of Winterfell"
RETURN p

The above query adds a “title” property to the nodes with label Person and with firstname “Eddard”.

An existing property can be deleted by REMOVE keyword

MATCH (p:Person {firstname: "Eddard"})
REMOVE p.aliasList

MERGE

Merge can be used to create new nodes/relationships or update them if they already exist. In the case of update, all the existing properties must match. For example the following query adds a new property to the node named House Stark of Winterfell

CREATE (houseStark:House {name: "House Stark of Winterfell", words: "Winter is coming"})

MERGE (houseStark {name: "House Stark of Winterfell", words: "Winter is coming"})
SET houseStark.founded = "Age of Heroes"

The following one, on the other hand creates a new node with the same name (because the words properties don’t match):

MERGE (houseStark:House {name: "House Stark of Winterfell", words: "Winter is coming!!!"})
SET houseStark.founded = "Age of Heroes"
RETURN houseStark

I find helpful when you have a long Cyper query and you might run it multiple times. If you just use CREATE every time you run the query you will end up with new nodes. If you use MERGE it will not give any errors and will not create new nodes.

Another way to prevent this is unique constraints. For example the following query will enforce uniqueness on _id property for nodes labelled as Book.

CREATE CONSTRAINT ON (book:Book) ASSERT book._id IS UNIQUE

Now we can run this query and create the books:

CREATE 
	(AGameOfThrones:Book {_id: "7abe0a1c-b9bd-4f00-b094-a82dfb32b053", title: "A Game of Thrones", orderInSeries: 1, released:"August 1996"}),
 	(AClashOfKings:Book {_id: "051dae64-dfdb-4134-bc43-f6d2b4b57d37", title: "A Clash of Kings", orderInSeries: 2, released:"November 1998"}),
 	(AStormOfSwords:Book {_id: "e9baa042-2fc8-49a6-adcc-6dd455f0ba12", title: "A Storm of Swords", orderInSeries: 3, released:"August 2000"}),
 	(AFeastOfCrows:Book {_id: "edffaa47-0110-455a-9390-ad8becf5c549", title: "A Feast for Crows", orderInSeries: 4, released:"October 2005"}),
 	(ADanceWithDragons:Book {_id: "5a21b80e-f4c4-4c15-bfa9-1a3d7a7992e3", title: "A Dance with Dragons", orderInSeries: 5, released:"July 2011"}),
 	(TheWindsOfWinter:Book {_id: "77144f63-46fa-49ef-8acf-350cdc20bf07", title: "The Winds of Winter", orderInSeries: 6 })

If we try to run it again we get the following error:

Cypher constraint error

Unique constraint enforces the existing data to be unique as well. So if we run the book creation query twice, before the constraint, then try to create the constraint we get an error:

To delete the constraint we use DROP keyword such as

DROP CONSTRAINT ON (book:Book) ASSERT book._id IS UNIQUE

DELETE

To delete nodes first we find them by MATCH. Instead of RETURN in the above examples we use DELETE to remove them.

MATCH (b:Book {_id: "7abe0a1c-b9bd-4f00-b094-a82dfb32b053"})
DELETE b

The above query would only work if the node doesn’t have any relationships. To delete a node as well as its relationships we can use the following:

MATCH (b {_id: "7abe0a1c-b9bd-4f00-b094-a82dfb32b053"})-[r]-()
DELETE b, r

We can generalize the above query to clean the entire database:

MATCH (n)
OPTIONAL MATCH (n)-[r]-() 
DELETE n, r

I find this quite useful during test and development phase as I need to start over quite often.

Useful keywords

  • ORDER BY

It is almost mandatory in all SQL-variants. Eventually you have to sort the data. For example the following returns all actors order by their names in alphabetical order:

  • LIMIT

Sometimes your query may return a lot of values that you’re not interested in. For example you may want to get top n results. In this case you can restrict the number of rows returned by LIMIT keyword such as:

The above query returns the top 3 actors who have the most ACTED_IN relationships with movies in descending order.

  • UNION

Another similar keyword from RDBMS is UNION. Similar to standard SQL, the returned columns must have the same number and names. For example the following will fail:

MATCH (p:Person)
RETURN p.name
UNION
MATCH (m:Movie)
RETURN m.title

But with using an alias it can be fixed like this:

MATCH (p:Person)
RETURN p.name AS name
UNION
MATCH (m:Movie)
RETURN m.title AS name

Measuring Query Performance

PROFILE keyword is very helpful as it lets you see the query plan and optimize it. It is very simple to use: You just put it before the MATCH clause such as

This is obviously a very simplistic example but I strongly recommend the GrapAware article on Labels vs Indexed Properties. PROFILE is heavily used to identify which option performs better in a given scenario. Also a great read to learn more about modelling Neo4J database.

Conclusion

Cypher is quite powerful and it can be very expressive in the right hands. It is quite comprehensive to cover in a blog post. I just wanted this post to contain various samples for the basic queries. I hope it can be of some help for someone new to Cypher in conjunction with the references I listed below.

Resources