graph databasesneo4j

It always helps to know the ecosystem to get the maximum performance out of any development platform. In this post I will cover some of the tools that can be used to manage a Neo4J database.

Neo4J Browser

This tool comes out of the box and very handy to run Cypher queries and get visual results. I covered some of the stuff you can do with this tool in an earlier post but it’s capable of much more so very nice tool to have in your toolbelt. I think the only shortcoming is you cannot edit data visually. I think it would be very helpful to have the ability to manually add nodes or create new relationships in a drag & drop fashion. It would save time to write the queries from scratch every time but maybe it can have that feature in the future releases.

Linkurious

This is a paid online service. Unfortunately they don’t have a trial option. You can sign up for an online demo though.

Linkurious tutorial

It has a nice intuitive user interface that allows you to search and edit data via the UI.

Linkurious tutorial

When I tried to edit data or try the “Save as a Neo4J Database” feature I got errors.

Linkurious server error

I don’t know if it’s a limitation of the demo version or their system was having a bad day but I’m not convinced to fork over €249 for this tool yet.

UPDATE: After I published this post I’ve been informed that adding/editing feature is disabled in the demo. So the error messages were intentional and not because of a system failure.

Neoclipse

This is a free an open-source desktop application written in Java. It has some flaws (e.g. sometimes you have to reconnect to server to see the affects or your changes) but in general it’s a nice tool for visual editing. You can also run Cyper queries.

Another neat feature to further embellish the visualization is assigning icons to nodes.

Neoclipse preferences

You can specify a folder that contains your images. The image name must match the property specified in the “Node icon filename properties” textbox.

For example when I ran my sample Simpsons Cypher script I get the following graph:

Neoclipse visualisation with icons

Note that the values that should match the filename is case-sensitive and you may need to reconnect to server to see the changes (refresh doesn’t cut it). I learned it the hard-way :-)

Managing the properties of a node is very easy. You can edit the current values in-place on the grid and add new values by right-clicking on the Properties grid and select New and type of the property.

Neoclipse - Adding new property

It’s a nice tool for quickly editing data but it can easily be a memory-hog too! Once I noticed it was using 1.5GB RAM and the graph only around 100 nodes and relationships so I have some performance concerns about it with large datasets.

Gephi

Gephi is a general visualisation tool and thanks to its plugin support it could be extended to support Neo4J databases. (You need at least JDK 7 to install the plugin.)

Gephi plugins

You can download and install the Neo4J plugin manually or better yet you can just select Tools -> Plugins -> Available Plugins and search Neo4J.

Once installed you can then import a Neo4J database by selecting File -> Neo4J Database -> Full Import

Make sure to shutdown the Neo4J server before the import or you will get this very informative error message from Gephi:

Gephi database in use error

Apparently it locks the database as well so if you try to run Neo4J again while Gephi is still running you get this error:

Neo4J lock error

Clearly they don’t play well together so it’s best to run them separately.

After the import you can view the graph visually like this

Gephi imported Neo4J database

Doesn’t look as impressive as Neoclipse IMHO!

To view the data you can switch to Data Laboratory tab:

Gephi Nodes

Gephi Edges

In theory you can add nodes and export the database but that wasn’t a very successful endeavour for me. When you add a new node you can set the label but you cannot edit any properties.

Gephi new node

So that wasn’t quite helpful. I don’t know what you can do with a graph without any properties.

Gephi is still in beta phase. Also as plugins are developed by third-parties there might be some inconsistencies sometimes. I’ll leave this tool for the moment but it looks promising so I’ll put a pin to it for the time being.

Tom Sawyer Perspectives

This is also a generic visualisation tools that can work with multiple data sources. It can integrate with Neo4J as well as InfiniteGraph, a distributed graph database.

Downloading the trial software is a bit tricky. First you apply for an account. Your application is processed manually. After you are accepted, you first apply for a code to evaluate the product. Luckily it’s handled automatically and you receive the code right away. You then enter the code to have the privilege(!) to submit another form that details what type of project you’re planning to develop, what programming language you are using etc. That application is also processed manually. Currently I’m still waiting to be granted a trial license so I will not review the software for the time being. If I get to try it someday I will update this post.

Comparison

Tool Price Requirements Pros Cons
Neo4J Browser Free Web browser <ul><li>Comes with the server</li><li>Rich feature set</li></ul> <ul><li>No editing visually</li></ul>
Linkurious €249 Web browser <ul><li>Visual editing</li></ul> <ul><li> Expensive </li></ul>
Neoclipse Free / Open Source Java 1.6 <ul><li>Easy to use and edit data</li><li>Nice decoration options</li></ul> <ul><li>High memory usage</li><li>Glitches may cause disruption</li></ul>

Prototyping tools

My main goal in this quest was to find a tool that would allow me to edit data visually to speed up data entry. Apart from tools that can directly manipulate data there are also a bunch of modelling tools. I won’t cover them in depth in this post but might be helpful to have a short list of them at least so that to give some pointers.

OmniGraffle

OmniGraffle is a general purpose diagramming tool. It’s not free nor cheap but it has an iPad version so you can keep modelling on-the-go!

Arrow Tool

Arrow Tool is an open-source project developed by a Neo4J developer. It’s as simple as it gets and helps you to quickly create a model.

Conclusion

This is by no means an exhaustive list of the tools in the market. As graph databases gain more traction the number of such tools will exponentially increase.

Visual tools help a great deal sometimes to make sense of and see how the data is connected. But you need to have good Cypher skills to be able to run complex queries. In the next post I’ll go over Cypher and cover the basics.

Resources

developmentneo4jgraph databases

I’m currently in search of a good tool to manage my graph database data. As they say a picture is worth a thousand words so visualization is very important to understand the nature of the data and make some sense of it. If nothing, it’s simply more fun!

Covering a user interface may seem unnecessary since they are generally pretty straightforward and Neo4J browser is no exception but after using it for some time I discovered some neat features that are not immediately obvious in this post so I’ll go over them. I will be using the beta version (v2.2 M04) so if you are using an older version you can compare the upcoming features with your current version.

You gotta know your tools to get the maximum benefit out of them so let’s dig in!

Accessing the Data

A major change in this version is that the authentication is being turned on by default.

Neo4J Authentication

So if you install it on somewhere that’s accessible from the Internet you can have a layer of protection.

If you close that login window by accident you can run the following command to bring it back:

:server connect

Similarly, if you want to terminate the connection you can run

:server disconnect

Until v2.2 M03 you got an authorization token back after the connection is established but since v2.2 M04 you don’t get a token. This is because authorization header is calculated by base64-encoding username:password pair.

Welcome message

If you need to connect using other tools you need to send authorization header with every request. This is Base64-encoded value of username/password pair separated with colon.

So in my case I changed the password to pass (I know it’s a horrible password, this is a just test installation :-))

So my authorization header value becomes:

Username:Password --> neo4j:pass
Base64-encoded value --> bmVvNGo6cGFzcw==

And with the correct authorization header I can get an HTTP 200 for the GET request for all the labels:

Results for all labels

Helpful commands and shortcuts

  • First things first: Help!

To discover all the commands and links to references you can simply run the help command:

:help

Help output

  • Clean your mess with Clear

Browser runs each query in a separate window and after using it a while you can have a long history of old queries. You can get rid of them easily once and for all by this command:

:clear
  • Use Escape to switch to full-screen query view

You can use Escape key to toggle to a full-screen query window. Just press Esc key and you’ll hide the previous query windows.

Toggle full screen

This is especially handy if you are dealing with long Cypher queries and can use some space. You can toggle back to old view (query and output windows below) by pressing Esc key again. Alternatively it will toggle back if you just run the query.

  • Use Shift+Enter for multi-line mode

Normally Enter key runs the queries but most of the time you’d need to work with queries spanning multi-lines. To accomplish that you can simply use Shift+Enter to start a new line and switch to multi-line mode. Once you switch to multi-line mode, the line numbers will appear and Enter will no longer run your query but will start a new line. To run queries in multi-line mode you can use Ctrl+Enter key combination.

  • Name your saved queries in favorites

There are some queries you run quite often. For example in my development environment I tend to delete everything and start from scratch. So I saved my query to the favorites by clicking the star button in the query window. Nice thing about it is if you add a comment at the beginning of your query the browser is smart enough to use it as the name of your query so you don’t have to guess which one is which when you have a lot of saved queries.

Named favorite

  • Play around with the look & feel

With the latest version you can mess around with the stylesheet that is used to visualize the results. In the favorites tab there is a section called “Styling / Graph Style Sheet”. You can see the styles used by clicking the Graph Style Sheet button. It’s not editable on the editor but you can export it to a .grass file by clicking on the Export to File icon.

Graph Style Sheet

After you make your changes you can import it back by dropping it to the narrow band at the bottom of the dialog window.

Import style sheet

You can still get the original styles back by clicking on the icon next to export (that looks like a fire extinguisher!) so feel free to mess with it.

  • Use Ctrl+Up/Down arrow to navigate through old queries

You can browse through your query history using Ctrl+Up/Down arrow. I find this shortcut especially helpful when you quickly need to go back to the previous query.

  • Click on a query to get it back

Once you run a query, the query itself and its output is encapsulated in a separate window so you can browse through old queries. If you need to run a query again you can simply click on the query. As you can see in the image below, a dashed line appears under the query when you hover over it to indicate it’s a link. When you click the query it populates the query window so you don’t need to copy/paste it.

Link to query

Conclusion

Currently the browser doesn’t allow editing data visually but for running Cypher queries it’s a great tool. If you have any tips and tricks to suggest or corrections to make, please leave a comment below.

UPDATE: I published this post using v2.2 M03 but as Michael Hunger kindly pointed out there were some changes in v2.2 M04 that made my post outdated from the get-go so I updated it accordingly. If you find any inconsistencies please let me know using the comments below.

developmentneo4jgraph database

Graph databases are getting more popular every day. I played around with it in the past but never covered it extensively. My goal is now to first cover the basics of graph databases (Neo4J in particular), cover Cypher (a SQL-Like Query Language for Neo4J) and build a full-blown project using these. So this will the first post in a multi-part series.

Neo4J is providing nice training materials. Also I’m currently enjoying active Safari Books Online and Pluralsight subscriptions so I thought it might be a good time to conduct a comprehensive research and go through all of these resources. So without further ado, here’s what I’ve gathered on Graph Databases:

Why Graph Databases

Main focus of graph databases is the relationships between objects. In a graph database, every object is represented with a node (aka a vertex in graph theory) and nodes are connected to each other with relationships (aka an edge).

Graph databases are especially powerful tools for heavily connected data such as social networks. When you try to model a complex real-world system you end up having a lot of entities and connections among them. At this point a traditional relational model starts to be sluggish and hard to maintain and this is where graph databases come to rescue..

Players

Turns out there are many implementations and it’s a broader concept as they have different attributes. You can check this Wikipedia article to see what I mean. As of this writing there were 41 different systems mentioned in the article. A few of the players in the field are:

  • Neo4J: Most popular and one of the oldest in the field. My main focus will be on Neo4J throughout my research
  • FlockDB: An open-source distributed graph database developed by Twitter. It’s much simpler than Neo4J as it focuses on specific problems only.
  • Trinity: A research project from Microsoft. I wish it was released because probably it would come with native .NET clients and integration but looks like it’s dead already as there is activity since late 2012 on its page.

There are three dominant graph models in the industry:

  • The property graph
  • Resource Description Framework (RDF) triples
  • Hypergraphs

The Property Graph Model

  • It contains nodes and relationships
  • Nodes contain properties (key-value pairs)
  • Relationships are named and directed, and always have a start and end node
  • Relationships can also contain properties
  • No prior modelling is needed but it helps to understand the domain. The advice is start with no schema requirements and enforce a schema as you get closer to production.

Basic concepts

As I will be using Neo4J, I decided to focus on the basic concepts of Neo4J databases (the current version I’m using is 2.1 and 2.2M03 which is still in beta)

  1. Nodes - Graph data records
  2. Relationships - Connect nodes. They must have a name and a direction. Sometimes direction has semantic value and sometimes the connection if bothways like a MARRIED_TO relationship. It doesn’t matter which way you define it both nodes are “married to” each other. But for example a “LOVES” relationship doesn’t have to be bothways so the direction matters.
  3. Properties - Named data values
  4. Labels - Introduced in v2.0 They are used to tag items like Book, Person etc. A node can have multiple labels.

Conclusion

Graph databases are on the rise as can be seen clearly from the chart (taken froom db-engines.com):

db-egines popularity chart

It feels very natural to model a database as a graph as they can handle relationships very well and in real-life there are many complex relationships in semi-structured data. So especially at the beginning starting without a schema and have your model and data mature over time makes perfect sense. So it is understandable why graph databases are gaining traction everyday.

In the next post I will delve into Cypher - the query language of Neo4J. What good is a database if you can’t run queries on it anyway, right? :-)

Resources