Hi guys,
I've took a look at the implementation of JDBCCacheLoader and here are some
thoughts on it.
There is an alternative way of persisting trees into the database. It has
certain advantages compared to the straight forward solution of each node
keeping a reference to its parent ( a.k.a. Adjacency List Model). The basic
idea is to traverse the tree in preorder and add some indexing info to each
node - you can find a nice and simple description of the model here:
http://www.sitepoint.com/print/hierarchical-data-database. This indexing
info will be further used for optimizing fetching and removing operations.
The big advantage is that all the recursive calls for the loadSubtree and
removeSubtree operations are nicely avoided. Drawback - insertions is
slightly more time consuming.
I've made a comparison between this approach and the existing implementation
and here are some figures. Methods that are affected are: remove(Fqn),
loadState(Fqn subtree, ..) and put(Fqn, value)
1) remove(Fqn). Current recursive implementation performs about pow(m,n)
database calls. M = the average # of children, n the depth of the subtree.
The new approach would reduce it to a fix value(3 calls - retrieve the node,
delete it together with all its children, update indexes)
2) loadState(Fqn subtree, ..) - similar to remove; pow(m,n) database calls,
2 queries for loading it.
3) put(Fqn, value) - here is the drawback. Normally a new update should be
performed in order to shift the indexes. An optimization can be performed
though. By indexing with a step of lets say 10, we'll be assured that the
next 9 insertion will not conflict, so the drawback would be an update for
each 9 insertions - not a big deal I would say.
If you guys find it usefully, really glad to go ahead with an implementation
and compare the performance figures...
Cheers,
Mircea