Stop Thinking, Just Do!

Sungsoo Kim's Blog

Neo4j

tagsTags

22 June 2013


Neo4j is the leading implementation of a property graph database. It is written pre- dominantly in Java and leverages a custom storage format and the facilities of the Java Transaction Architecture (JTA) to provide XA transactions. The Java API offers an object-oriented way of working with the nodes and relationships of the graph. Traversals are expressed with a fluent API. Being a graph database, Neo4j offers a number of graph algorithms like shortest path, Dijkstra, or A* out of the box.

Neo4j integrates a transactional, pluggable indexing subsystem that uses Lucene as the default. The index is used primarily to locate starting points for traversals. Its second use is to support unique entity creation. To start using Neo4j’s embedded Java database, add the org.neo4j:neo4j:<version> dependency to your build setup, and you’re ready to go.

Example 7-1 lists the code for creating nodes and relationships with prop- erties within transactional bounds. It shows how to access and read them later.

Example 7-1. Neo4j Core API Demonstration
GraphDatabaseService gdb = new EmbeddedGraphDatabase("path/to/database");

Transaction tx=gdb.beginTx(); 

try {
	Node dave = gdb.createNode(); 
	dave.setProperty("email","dave@dmband.com"); 
	gdb.index().forNodes("Customer").add(dave,"email",dave.getProperty("email"); 
	Node iPad = gdb.createNode();
	iPad.setProperty("name","Apple iPad");
	Relationship rel=dave.createRelationshipTo(iPad,Types.RATED);
	rel.setProperty("stars",5);
	tx.success(); 
} finally {
	tx.finish(); 
}

// to access the data
Node dave = gdb.index().forNodes("Customer").get("email","david@dmband.com").getSingle();   
for (Relationship rating : dave.getRelationships(Direction.OUTGOING, Types.RATED)) {  
	aggregate(rating.getEndNode(), rating.getProperty("stars")); 
}

With the declarative Cypher query language, Neo4j makes it easier to get started for everyone who knows SQL from working with relational databases. Developers as well as operations and business users can run ad-hoc queries on the graph for a variety of use cases. Cypher draws its inspiration from a variety of sources: SQL, SparQL, ASCII- Art, and functional programming. The core concept is that the user describes the pat- terns to be matched in the graph and supplies starting points. The database engine then efficiently matches the given patterns across the graph, enabling users to define so- phisticated queries like “find me all the customers who have friends who have recently bought similar products.” Like other query languages, it supports filtering, grouping, and paging. Cypher allows easy creation, deletion, update, and graph construction.

The Cypher statement in Example 7-2 shows a typical use case. It starts by looking up a customer from an index and then following relationships via his orders to the products he ordered. Filtering out older orders, the query then calculates the top 20 largest vol- umes he purchased by product.

Example 7-2. Sample Cypher statement
START		customer=node:Customer(email = "dave@dmband.com") 
MATCH		customer-[:ORDERED]->order-[item:LINEITEM]->product 
WHERE		order.date > 20120101 
RETURN		product.name, sum(item.amount) AS product 
ORDER BY	products DESC 
LIMIT		20		

Being written in Java, Neo4j is easily embeddable in any Java application which refers to single-instance deployments. However, many deployments of Neo4j use the standalone Neo4j server, which offers a convenient HTTP API for easy interaction as well as a comprehensive web interface for administration, exploration, visualization, and monitoring purposes. The Neo4j server is a simple download, and can be uncom- pressed and started directly.

It is possible to run the Neo4j server on top of an embedded database, which allows easy access to the web interface for inspection and monitoring.

Neo4j server web interface

http://sungsoo.github.com/images/neo4j.png

In the web interface, you can see statistics about your database. In the data browser, you can find nodes by ID, with index lookups, and with cypher queries (click the little blue question mark for syntax help), and switch to the graph visualizer with the right hand button to explore your graph visually. The console allows you to enter Cypher statements directly or even issue HTTP requests. Server Info lists JMX beans, which, especially in the Enterprise edition, come with much more information.

As an open source product, Neo4j has a very rich and active ecosystem of contributors, community members, and users. Neo Technology, the company sponsoring the development of Neo4j, makes sure that the open source licensing (GPL) for the community edition, as well as the professional support for the enterprise editions, promote the continuous development of the product.

To access Neo4j, you have a variety of drivers available, most of them being maintained by the community. There are libraries for many programming languages for both the embedded and the server deployment mode. Some are maintained by the Neo4j team, Spring Data Neo4j being one of them.

References

[1] Mark Pollack, Oliver Gierke, Thomas Risberg, Jon Brisbin, and Michael Hunger, Spring Data: Modern Data Access for Enterprise Java, pp. 102-103, O’REILLY, 2013.


comments powered by Disqus