[dna-issues] [JBoss JIRA] Commented: (DNA-466) JCR requires connectors to expose UUID as identifier property

Thursday, 9 July 2009

    [
https://jira.jboss.org/jira/browse/DNA-466?page=com.atlassian.jira.plugin...
] 

Randall Hauch commented on DNA-466:
-----------------------------------

Okay, after several weeks of working on this issue (including several failed attempts at
more direct fixes to address the problem), I was finally able to commit the code.  The new
design is obviously much improved when it comes to getting and using the paths.  But it
also relies upon the paths and no longer expects UUIDs from connectors (other than in
cases where nodes are referenceable, per the JCR specification).

The new design broke the functionality in SessionCache into two distinct parts.  A new
GraphSession class (in org.jboss.dna.graph.session) now handles the job of maintaining a
cached representation of a portion of the graph structure, including accumulating changes
and batching various connector requests.  The JCR-specific functionality is now in
SessionCache (still in org.jboss.dna.jcr) and is responsible for checking constraints,
ensuring proper node and property types, maintaining cached representations of type names,
and management of the JCR Node and Property implementation objects.  One big plus is that
it helps separate the different kinds of logic, making testing easier.

The GraphSession class maintains a structured tree of node objects, ensuring that this
transient structure stays in sync as various nodes are created, moved, deleted, cloned, or
changed.  The session's behavior is patterned after JCR: it maintains transient state
and changes, and these changes are pushed to the store upon 'save()'.  The node
structure is not designed to be extended; rather, each node has a payload object that can
be configured and customized (via generics).  Similarly, each node consists of a property
info object for each property, and this property info contains a payload object that is
customizable.  (SessionCache uses payload classes that store/cache the JCR-specific data.)
 Quite a few hooks are available for inserting logic at various points in the lifecycle;
for example, when a node is being loaded into the session, a hook is called to allow a
client to populate the property info objects and child nodes, given the Graph node object.
 This allows SessionCache to insert its own logic, including the determination of the
primary type and child node definition, as well as caching any JCR-specific information in
the payload.

As alluded to in a previous comment, the GraphSession structure maintains a Location
object for every node.  That Path object in that Location is created from the Path of the
parent node, which means that the ChildPath implementation can be used for all Path
objects.  This means that each Path.Segment appears only once but is reused in every Path
object on all nodes at or below that spot.  The Location and Path objects are also made to
reflects the SNS indexes, so as the SNS indexes change, the paths are recomputed.  (Yes,
this makes moves, deletes, and reorders more expensive, especially for large subgraphs,
but then almost every other operation becomes much faster.)  And like the previous
SessionCache implementation, the GraphSession uses a MultiMap to efficiently manage the
children of a node, without having to compute the SNS indexes (other than when updating
the Location objects, which is done only when needed).

The GraphSession also takes a different approach to storing the transient changes.  The
old SessionCache design kept immutable snapshots of each node state as read from the
store.  The idea was that the changes could be easily discarded while keeping the
last-read state.   But in practice, the only times the changes are discarded is via
javax.jcr.Session.refresh() or javax.jcr.Node.refresh() methods, which also throws out any
cached state.  Thus, the old design unnecessarily maintained separate "read"
states from the mutable "changed" states.  The new design simply keeps the
structure in-sync with all changes, and each node maintains some state about whether it is
new or has changed, and whether any nodes below it have changed.  This makes it easy to
walk the structure to find out what areas are changed, what's new, and what hasn't
even been loaded yet.  (The new design uses visitors to accomplish the walking
activities.)

So, apart from the faster access to paths, the new design was to get around the difficulty
of the javax.jcr.Node and javax.jcr.Property implementation objects requiring UUIDs.  In
the old design, these implementation objects were very lightweight - almost every activity
required looking up in the SessionCache the corresponding node/property information object
and then delegating to methods on that object.  All nodes were identified by UUID, so each
javax.jcr.Node and javax.jcr.Property implementation object simply had to know the UUID. 
Unfortunately, this design doesn't work when the node has no UUID.  (Earlier, I
experimented with creating artificial UUIDs, but this is not durable and causes problems
when 'save()' or 'refresh()' are performed, causing the javax.jcr.Node and
javax.jcr.Property implementation objects to become stale and unable to find the
corresponding node/property info objects.)

In the new design, we still have very lightweight javax.jcr.Node and javax.jcr.Property
implementation objects that still hold on to a unique key.  However, rather than a single
UUID, the key is a session-specific NodeId (which can become invalid) and a durable
Location of the node.  (The javax.jcr.Property implementations maintain a reference to the
javax.jcr.Node implementation object and the property's name).  The SessionCache is
able to find the corresponding GraphSession node/property given this combined key, and can
almost always locate the node/property because the key has the last-known path.  Also,
because the Location objects *might* have identification properties, these are used when
the path is not sufficient or is no longer valid.  There are cases when this is not
successful, but those cases correspond to those in the JCR specification (e.g., nodes
being deleted or nodes being changed in other sessions).  In these cases, the JCR client
gets an InvalidItemStateException, and retrieves the node from the session.

There are a few outstanding items and known issues that have to be resolved.  However,
these will be tracked as separate issues. In fact, this issue will likely be dependent on
a few of those - when they're finished, this issue can be marked as resolved.

...
 JCR requires connectors to expose UUID as identifier property
 -------------------------------------------------------------

                 Key: DNA-466
                 URL: https://jira.jboss.org/jira/browse/DNA-466
             Project: DNA
          Issue Type: Bug
          Components: JCR
    Affects Versions: 0.5
            Reporter: Randall Hauch
            Assignee: Randall Hauch
            Priority: Blocker
             Fix For: 0.6

 The JCR implementation currently expects connectors to return a UUID as the identifier
property.  This is obviously incorrect, as it goes against several of the connectors
we've already implemented.  In particular, the SessionCache is expecting to find the
UUID, and submits requests to the source with only the UUID (not the path). 
-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

[dna-issues] [JBoss JIRA] Commented: (DNA-466) JCR requires connectors to expose UUID as identifier property