dev neo4j, graph_databases

Graph databases are getting more popular every day. I played around with it in the past but never covered it extensively. My goal is now to first cover the basics of graph databases (Neo4J in particular), cover Cypher (a SQL-Like Query Language for Neo4J) and build a full-blown project using these. So this will the first post in a multi-part series.

Neo4J is providing nice training materials. Also I’m currently enjoying active Safari Books Online and Pluralsight subscriptions so I thought it might be a good time to conduct a comprehensive research and go through all of these resources. So without further ado, here’s what I’ve gathered on Graph Databases:

Why Graph Databases

Main focus of graph databases is the relationships between objects. In a graph database, every object is represented with a node (aka a vertex in graph theory) and nodes are connected to each other with relationships (aka an edge).

Graph databases are especially powerful tools for heavily connected data such as social networks. When you try to model a complex real-world system you end up having a lot of entities and connections among them. At this point a traditional relational model starts to be sluggish and hard to maintain and this is where graph databases come to rescue..

Players

Turns out there are many implementations and it’s a broader concept as they have different attributes. You can check this Wikipedia article to see what I mean. As of this writing there were 41 different systems mentioned in the article. A few of the players in the field are:

  • Neo4J: Most popular and one of the oldest in the field. My main focus will be on Neo4J throughout my research
  • FlockDB: An open-source distributed graph database developed by Twitter. It’s much simpler than Neo4J as it focuses on specific problems only.
  • Trinity: A research project from Microsoft. I wish it was released because probably it would come with native .NET clients and integration but looks like it’s dead already as there is activity since late 2012 on its page.

There are three dominant graph models in the industry:

  • The property graph
  • Resource Description Framework (RDF) triples
  • Hypergraphs

The Property Graph Model

  • It contains nodes and relationships
  • Nodes contain properties (key-value pairs)
  • Relationships are named and directed, and always have a start and end node
  • Relationships can also contain properties
  • No prior modelling is needed but it helps to understand the domain. The advice is start with no schema requirements and enforce a schema as you get closer to production.

Basic concepts

As I will be using Neo4J, I decided to focus on the basic concepts of Neo4J databases (the current version I’m using is 2.1 and 2.2M03 which is still in beta)

  1. Nodes - Graph data records
  2. Relationships - Connect nodes. They must have a name and a direction. Sometimes direction has semantic value and sometimes the connection if bothways like a MARRIED_TO relationship. It doesn’t matter which way you define it both nodes are “married to” each other. But for example a “LOVES” relationship doesn’t have to be bothways so the direction matters.
  3. Properties - Named data values
  4. Labels - Introduced in v2.0 They are used to tag items like Book, Person etc. A node can have multiple labels.

Conclusion

Graph databases are on the rise as can be seen clearly from the chart (taken froom db-engines.com):

db-egines popularity chart

It feels very natural to model a database as a graph as they can handle relationships very well and in real-life there are many complex relationships in semi-structured data. So especially at the beginning starting without a schema and have your model and data mature over time makes perfect sense. So it is understandable why graph databases are gaining traction everyday.

In the next post I will delve into Cypher - the query language of Neo4J. What good is a database if you can’t run queries on it anyway, right? :-)

Resources

misc review, gadget, raspberry_pi

I already had 3 Raspberry Pis so you might think it’s ridiculous to get excited for a 4th one but you would be wrong! Because this time they revised the hardware and made major upgrades.

Raspberry Pi 2

The new Pi is rocking a quad-core Cortex A7 processor and it comes with 1GB memory. Rest of the specs remained the same (including the price!) but the processor and memory made a huge impact.

For more detailed comparison check the benchmark results on the official website

I tried using a web browser in the old versions and it was painfully slow. So I decided to use the previous versions for background tasks like a security camera or network scanning. The B+ is powerful enough to run XBMC and play videos but even that is slow when it transitions between different screens. So seeing the new Pi can be a full desktop replacement is really exciting.

What else is new

Pi 2 comes with a few applications installed like Mathematica and Wolfram, utilities like a PDF reader and text editor and even Minecraft!

Minecraft on Raspberry Pi 2

What’s next

I don’t have a project in mind at the moment for this one yet. Since the architecture has breaking changes a lot of existing OS versions don’t work on it yet like Raspbmc and Kali Linux. Maybe I can wait and install Kali Linux on it and practice some security features. Or hopefully Microsoft releases Windows 10 for IoT soon enough so I can play with it. I’ll keep posting as I find more about my new toy! In the meantime I’m open to all project ideas so please leave a comment if you anything in mind!

Resources

dev fsharp

I finished the Pluralsight course finally. I’m still studying F# 2 pomodoros a day but lately lost he motivation to publish the notes. In this post I’ll tidy up the notes. In order to maintain my cadence I think I had better develop more stuff instead of covering theoretical subjects.

More Notes

  • Upcasting / Downcasting To upcast :> operator is used. Alternatively the keyword upcast can be used to achieve the same results. It performs a static cast and the cast is determined at runtime.

Downcasting is performed by the :?> operator or downcast keyword.

  • Abstract classes Abstract classes are decorated with **[]**. To mark members **abstract** keyword is used:
abstract Area : float with get
abstract Perimeter : float  with get
abstract Name : string with get
  • obj is shortcut for System.Object
  • do bindings perform actions when the object is constructed. do bindings are usually put after let bindings so that code in the do binding can execute with a fully initialized object.
  • unit is equivalent of void
  • tabs are not allowed as they can indicate an unknown number of space characters and as spaces and indents matter
  • ‘a means generic. By default a function f is infered as ‘a -> bool meaning it takes a general parameter and returns boolean
  • Providing an incomplete list of functions result in a new function (currying)
  • Getting values from tuples: fst gets the first value, snd gets the second value
  • You can attach elements to a list by using the :: (cons) operator
  • @ operator Concatenates two lists.
  • Exceptions can be thrown using raise function. Reraise function rethrows an exception
  • BigInt (C# and VB) don’t have support for arbitrarily long integers

Resources