Hi guys,
I've took a look at the implementation of JDBCCacheLoader and here are some thoughts on it.
There is an alternative way of persisting trees into the database. It has certain advantages compared to the straight forward solution of each node keeping a reference to its parent (
a.k.a. Adjacency List Model). The basic idea is to traverse the tree in preorder and add some indexing info to each node - you can find a nice and simple description of the model here:
http://www.sitepoint.com/print/hierarchical-data-database. This indexing info will be further used for optimizing fetching and removing operations.
The big advantage is that all the recursive calls for the loadSubtree and removeSubtree operations are nicely avoided. Drawback - insertions is slightly more time consuming.
I've made a comparison between this approach and the existing implementation and here are some figures. Methods that are affected are: remove(Fqn), loadState(Fqn subtree, ..) and put(Fqn, value)
1) remove(Fqn). Current recursive implementation performs about pow(m,n) database calls. M = the average # of children, n the depth of the subtree. The new approach would reduce it to a fix value(3 calls - retrieve the node, delete it together with all its children, update indexes)
2) loadState(Fqn subtree, ..) - similar to remove; pow(m,n) database calls, 2 queries for loading it.
3) put(Fqn, value) - here is the drawback. Normally a new update should be performed in order to shift the indexes. An optimization can be performed though. By indexing with a step of lets say 10, we'll be assured that the next 9 insertion will not conflict, so the drawback would be an update for each 9 insertions - not a big deal I would say.
If you guys find it usefully, really glad to go ahead with an implementation and compare the performance figures...
Cheers,
Mircea