Neo4j: A Developer’s Perspective

Authored by: Deep Mistry, Developer at Open Software Integrators

Introduction
In the age of NoSQL databases, where a new database seems to pop up every week, it is not surprising that even a larger number of articles related to them are written everyday. So when I started writing this blog on Neo4j, instead of describing how freaking awesome it is, I aimed to address the most common issues that a “regular” developer faces. By regular, I mean that, a developer, who is familiar with databases in general and knows the basics for Neo4j, but is a novice when it comes to actually using it.

A brief overview for those not familiar with Neo4j. Neo4j is a graph database. A graph database uses the concept of graph theory to store data. Graph Theory is the study of graphs, which are structures containing vertices and edges or in other words nodes and relationships. So, in a graph database, data is modeled in terms of nodes and relationships. Neo4j, at a first glance seems pretty much similar to any other graph database model that we encountered before. It has nodes, it has relationships, they are interconnected to form a complex graph and you traverse the graph in a specific pattern to get desired results.

How is it different from other graph databases? Aside from the fact that it is the only truly open source stable graph database available, it has some qualities which are hard to find in other graph databases or NoSQL databases for that matter.
- Written in Java: Has very strong integration with Java and other popular languages like Ruby, Python, Scala and more.
- It has a strong integration with the popular Spring framework as one of the sub-projects of the Spring Data project.
- Stable releases with good documentation, large number of forums and a sizeable community.

Apart from these, certain features of the database itself make it stand out among the others.
- Full ACID transactions
- REST support with HTTP.
- Both nodes and relationships can have properties as well as indices.
- Indices also exist on keys and relationships
- SQL like query language integrated with it : Cypher
- The general graph traversal language: Gremlin
- Standalone, or embeddable into most applications in above stated languages
- Good visualization tools and a self contained web interface

So coming back to the question at hand, How do I, as a developer use Neo4j?

Installing/Starting Neo4j:
1. Download Neo4j
2. untar the downloaded package. using the command

tar -xzvf neo4j-enterprise-1.8-SNAPSHOT-unix.tar.gz

Starting the neo4j server:
The neo4j package comes bundled with a neo4j server (Jetty).
- Execute the following command to start the server:

cd $NEO4J_HOME
bin/neo4j start

Exploring the neo4j server:
- You can browse to http://localhost:7474/webadmin/ (default) and see the Neo4j web interface.

There are 5 main tabs here.
- Dashboard: This shows you the basic information for the server. It shows the activity like number of nodes and relationships generated in a graph format (no pun intended)
- Data Browser: This section displays the data (nodes and relationships) in a actual graph format to facilitate easy visualization of the graph database.
- Console: This section, as the name suggests, is a place where we can query the existing graph using Cypher or Gremlin queries.
- Server Info: This section basically provides extensive details about the server deployment.
- Index Manager: This section allows the user to create and manage indexes. Neo4j has both node and relationship indexes.

This sections allows us to perform extensive operations on the database. Both visually and using the command line.

Working with Neo4j
So until now we have seen pretty much everything that is required to efficiently work with Neo4j. Now we focus on actual coding part. I used the Neo4j - Java driver mainly because I am a Java - backend guy, however Neo4j has drivers for several other languages which are equally if not more good.

Basically we start off all Neo4j operations with an object described by the GraphDatabaseService interface. We can wire in the dependency as follows:

GraphDatabaseService graphDb = new
GraphDatabaseFactory().newEmbeddedDatabase( path_To_DB);

From that point onward, its pretty much about managing database transactions and using the methods offered by the GraphDatabaseService object.

Following are some of the methods offered:
createNode - Create a Node
getNodeById - Find a Node directly without using the Index.
getRelationshipById - Find a Relationship directly without the Index
getReferenceNode - Find the Reference Node
getAllNodes(@Deprecated)- Find all nodes in the database ( Not recommended )
getRelationshipTypes (@Deprecated) - Find all Relationship types
shutdown - Shutdown connection to the Neo4j database
beginTx - Begin the database transaction
index - Get the IndexManager for the database.

There are additional methods which this interface has which are irrelevant at this point. For curious readers, The JavaDoc has extensive documentation about it.

There are several implementations of this interface:
AbstractGraphDatabase
AbstractGraphDatabase
EmbeddedGraphDatabase
EmbeddedReadOnlyGraphDatabase
InternalAbstractGraphDatabase

Out of these the most common implementation used is EmbeddedGraphDatabase.

Additionally there are other implementations for special use cases. For example, ImpermenantGraphDatabase is used for unit tests and wipes the database clean each time it is initialized or the BatchGraphDatabase which is used for inserting a large amount of data using the batch inserter.

The API keeps on changing constantly and it is very important to know which exact Neo4j version we are using and the correct API for it.

As mentioned before, we use the GraphDatabaseService objects for all CRUD operations. Neo4j provides separate index types for Nodes and Relationship objects. These objects should always be managed using indices, i.e. while create a Node or a Relationship, we should add them to their respective indices. A Node or a Relationship object should always be retrieved from the database using its index. Performing

In order to retrieve the data back, we can either use the index to directly fetch back Node(s) and Relationship(s). Aside from that the Java API for Neo4j has TraversalDescription interface via which we can describe a complex way to traverse our graph and fetch the results.

TraversalDescription td = Traversal.description()
.breadthFirst()
.relationships(RelationshipType.FRIEND_OF, Direction.OUTGOING);
td.traverse();

Aside from this, we can simply move forward from node to relationship to node by using the Node.getRelationship() and Relationship.getStartNode() or Relationship.getEndNode().

Viewing the results:
After adding nodes and relationships, we can view the results http://localhost:7474/webadmin/ and view the Graph visually. This greatly helps clarifying the exact structure and visualizing the data stored. This method is efficient until we are dealing with a handful of nodes, trying to visualize a database exceeding 30 nodes and relationships is pretty much like viewing a mesh which hardly makes sense.

Summarizing, from my experience getting started and working with Neo4j is pretty basic and easy to setup. Everything from downloading the Neo4j community edition to setting up a simple single node instance and using Java driver for CRUD operations is trivial. Neo4j can be a awkward tool to work with initially. Understanding how data is stored based on Graph theory can be very vague especially to people coming from the SQL world, however once the basics are set in stone, it is exceptionally easy to pick it up.

Links:
http://eclipse.sys-con.com/node/2377557
http://docs.neo4j.org/chunked/milestone/

Comments

licensing

@Abe caution with licenses is indeed useful, but I believe that one can still use the community edition under the regular GPL for proprietary applications. See http://www.neo4j.org/learn/licensing :

"""
If you don't need any of the reliability features in the Advanced or Enterprise editions, then you're free to use the Community edition of Neo4j Server under a GPL license – which means you can use it anywhere i.e. similarly to MySQL. Used in this way, only changes you make to the Neo4j software itself should be open-sourced and shared with the community.
"""

-- Andrew Ball, @cortextual

KLS

@Abe
I don't think that interpretation of the (A)GPL with Java is correct. Read https://www.gnu.org/licenses/lgpl-java.html

Abe

For me, the problem with Neo4j is that it's AGPL licensed. If I wrote a Web app that used it that I made available to the public, I would have to open source the whole Web app because the AGPL includes network connections under the umbrella of "distribution."

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
Are you for real?
Image CAPTCHA
Enter the characters shown in the image.