[hibernate-dev] [OGM] Neo4J discussions

Thu Oct 17 07:36:40 EDT 2013

Davide and I had an interesting discussion on OxM and Neo4J with
Nicolas. We also brushed on the JCA support and lack of Neo4J interest
in it.

This is a dump of the conversation and a way to continue the
conversation in the open.

## JCA support

Provided outside of Neo4J's team help.
Original author has unplugged from the internet.

## Blueprints

Nicolas, recommended us to look at Blueprints which offers an
abstraction over Neo4J APIs with clearer concepts. Blueprints is
implemented for other Graph databases so we would gain support for them
too for "free".

https://github.com/tinkerpop/blueprints/wiki

Davide looked at it and our use of Neo4J seems simple enough that we
could switch to Blueprints easily.

## Mapping model

We discuss the various mapping modeling strategies to represent an
object graph in a GraphDB.

### Hibernate OGM's approach (aka classic)

    An entity is mapped as a node. A property is added to save the name of the
    table that maps the entity.
    Attributes of the entity are mapped as properties.
    A unidirectional association is mapped as a neo4j Relationship where the
    starting node is the owner of the relationship. The type of the
    relationship is the name of the association.
    A bidirectional association is mapped using two relationships, one for each
    direction.
    A sequence is stored in a node with the value stored in a property, Every
    time we need to update the value of the sequence we acquire a lock on the
    node.

    Node and relationships are stored into two separate lucene indexes and when
    I need to retrieve them I use a lucene query.
    I've developed the code using Neo4j 1.9.4 (I'm now moving to Neo4j 2)

### Gaedo (aka vertex per property)

    Each element (be it a model
    entity or one of its property) was mapped directly as a node in graph.
    Look here for more infos :
    http://riduidel.github.io/gaedo/site/0.4.21/gaedo-blueprints/1_gaedo_graph_storage.html
    The main advantage of this approach is that we do not have to browse
    (and maintain) an index to find objects for which the property "name"
    has the value "foo". instead, we directly locate the vertex holding
    that value, then back-navigate the links named "name" to find objects
    ..; well, I guess you can see the advantage of this approach.

    Which takes me back to that divide between properties stored as
    properties of vertex, or as distant vertices.
    Each approach has its advantage :
     - one vertex per literal ensures easy search through graph navigation
     languages (be them gremlin, cypher, or any other one you find) and
     some nice-to-architect similarity for all vertices (in my case, each
     graph vertex has two properties mimicking the ones found in RDF
     concepts : kind and value). The cost of it being easier lock (it
     indeed seems neo4j requires a lock on each vertex when adding/removing
     an edge connecting them)
     - one property per literal avoids those locking issues, but obviously
     make search really dependant upon indices, and furthermore impossible
     to write using "pure" graph query language,a s queries will sometimes
     rely upon indices navigation, and sometimes upon graph navigation,
     which maybe hard to differentiate - or maybe not.

### Mapping conclusion

After a few back and forth we concluded that the vertex per property
approach does not offer much advantages including in query performances.
Quite the contrary, there are a few advantages to the classic and
natural approach like lower deadlock rate, possibly faster queries etc).
So we will start with this approach in Hibernate OGM and see where it
leads us.

### Data points

Some info on the Neo4J storage
http://digitalstain.blogspot.fr/2011/11/rooting-out-redundancy-new-neo4j.html

## Id generators

I would be nice to implement a generator that uses Neo4J's internal id
property generated for each node. While it is not monotonic, it is a
viable id generator option.

Note that seuqnce mapping still requires to store the sequence seed and
using the Neo4J id property won't be of help here.

That's a condensed version of the discussion. We can continue from there
if needed.

Emmanuel