From JSON to Neo4J
I have a simple JSON file that contained a bunch of users with their followers which looked like this:
{
"user-id": 2,
"username": "user_2",
"avatar": "URL",
"following": [
"user_10",
"user_6",
"user_1"
],
"followers": [
"user_10"
]
}
It felt like a good exercise would be to import that data into a graph database as it wasn’t something I had done before.
As my go-to language is C# and I had some experience with Cypher before, my initial instinct was to develop a tool to generate Cypher statements from the JSON.
First I create the nodes:
private void GenerateNodesCypher()
{
string filename = @"..\..\output\nodes.cql";
var output = new StringBuilder();
foreach (var user in _userList)
{
string s = $"CREATE ({user.username}:User {{ userid: {user.userid} , username: '{user.username}', avatar: '{user.avatar}' }} ); ";
output.AppendLine(s);
}
File.WriteAllText(filename, output.ToString());
}
and then the relationships:
private void GenerateRelationshipsCypher()
{
string filename = @"..\..\output\relationships.cql";
var output = new StringBuilder();
int n = 0;
foreach (var user in _userList)
{
foreach (var following in user.following)
{
string s = $"MATCH (a), (b) WHERE a.username = '{user.username}' AND b.username = '{following}' CREATE (a)-[:FOLLOWING]->(b); ";
output.AppendLine(s);
n++;
}
}
File.WriteAllText(filename, output.ToString());
}
So I ended up with two cql files that looked like this
CREATE (user_1:User { userid: 1 , username: 'user_1', avatar: 'URL' });
CREATE (user_2:User { userid: 2 , username: 'user_2', avatar: 'URL' });
CREATE (user_3:User { userid: 3 , username: 'user_3', avatar: 'URL' });
and
MATCH (a), (b) WHERE a.username = 'user_1' AND b.username = 'user_26' CREATE (a)-[:FOLLOWING]->(b);
MATCH (a), (b) WHERE a.username = 'user_1' AND b.username = 'user_53' CREATE (a)-[:FOLLOWING]->(b);
MATCH (a), (b) WHERE a.username = 'user_2' AND b.username = 'user_6' CREATE (a)-[:FOLLOWING]->(b);
and I used neo4j-shell to execute the files and import the data. But it was no bed of roses. I’ll list the problems I faced along the way and how I got around them so that this experience might be helpful for other people as well:
Issues along the way and lessons learned
Running multiple statements on Neo4J browser
First I tried to run the create statements using the Neo4J browser which turned out to be problematic because it cannot run multiple statements that end with semi-colons. So I removed the semi-colons but then it started giving me this error
WITH is required between CREATE and MATCH
I found a workaround for that on SO. So the following works:
MATCH (a), (b) WHERE a.username = 'user_1' AND b.username = 'user_14' CREATE (a)-[:FOLLOWING]->(b);
WITH 1 as dummy
MATCH (a), (b) WHERE a.username = 'user_1' AND b.username = 'user_22' CREATE (a)-[:FOLLOWING]->(b);
Now the problem was, if the data was a bit dirty, for instance if user_14 didn’t exist, it stopped executing the rest and no other relationships were created). I had a few nodes like that so this method didn’t work for me after all.
Starting the shell was not as easy as I’d imagined
I installed Neo4J using the default settings and to start the shell just navigated to that directory and ran the batch file. Got an error instead of my shell:
C:\Program Files\Neo4j Community\bin>Neo4jShell.bat
The system cannot find the path specified.
Error: Could not find or load main class org.neo4j.shell.StartClient
Apparently the way to run the shell is using the Neo4J server application and clicking Options -> Command Prompt
This launches the Neo4J Command Prompt then the rest is easy:
neo4jshell -file nodes.cql
neo4jshell -file relationships.cql
“Exiting with unterminated multi-line input”
Final issue was the statements in a sql file must be terminated with a semi-colon in order for the shell to execute them. Otherwise it gives the above warning and quits
So after all this hassle my data is in the Neo4J database ready to be queried to death!:
Next I’ll investigate converting the data into CSV and achieve the same results by using LOAD CSV command and look into visualization of this data.