Hi guys,
I've drafted the alternative implementation of JDBC cache loader. Just a
brief, the base idea is to to traverse the tree in preorder and add some
indexing info to each node.
The affected methods are remove(Fqn), loadState(Fqn subtree, ..) and
put(Fqn, value).
How I've tested the performance:
a tree with a TREE_DEPTH depth is created. Each node in the tree has a
CHILDREN number of children. By default I've used TREE_DEPTH=7 and
CHILDREN=3 which gave a total number of 3280 children. A node is selected at
each depth from disjunct subtrees. ( i.e. in the benchmark used 7 nodes
are selected). On all those nodes a certain operation is performed:
loadState or remove. For benchmarking the put operation, the in memory tree
nodes are Collections.shuffle(List<Fqn>), then they are (randomly) added to
the database/classLoader.put. Also additional testing was performed in the
case of put for deep trees (100 - 200 nodes)
Results:
On the load and delete the new implementation is much more better. The
new deletion time is linear compared with the old approach where recursive
db calls lead to an exponential raise(see - remove_node_all.PNG). Relatively
the same in the case of loading subtrees(see - remove_node_all.PNG).
The problem is in the case of additions. When adding all 3280 nodes the time
is constantly 10 times bigger than when using existing implementation.
Profiling shewed that the update_indexes operation behaves really badly -
nothing to do in this area. Even if the operation is totally moved in the
database (mysql 5.0.X has limitations on working with triggers so I've used
a stored procedure) is doesn't cause an significant increase.
Another approach.
Another approach I've consider is to reduce the recursive DB calls to a fix
number. All operation rely on the fact that the entire fqn is persisted for
each node. E.g. when deleting node(/a/b/c) I'll remove all records that
have a path starting with (/a/b/c/). Same in the case of node loading. It
proved to work even better than in the case of first implementation.
Recursion in the case of insertions - I've solved this by
a) querying for the first persited ancestor
b) add all the children in a batch pr statement.
Interestingly enough this always-2-queries approach is 30% slower than the
old recursive impl. Profiling showed that the db driver (mysql-connector/J)
takes the same time for executing a batch (new impl) as it takes in the case
of individual calls (old impl).Also the firstAncestor query takes quite long
on mysql. Another observation is that new impl proves much better in the
case of deeper trees, and fails up-2 50% in the case of flatter trees. An
interesting thing would be to see it working on an Oracle/SqlServ... The
good news is that old implementation can be used instead - or a policy can
be defined to use the new impl where tree structure requires it.
Naturally I'd chose the second approach - if you don't see any problem with
it
I'll stop here :)
Cheers,
Mircea