JDBCCacheLoader performance

Sunday, 28 January 2007

Hi guys,

I've drafted the alternative implementation of JDBC cache loader. Just a
brief, the base idea is to to traverse the tree in preorder and add some
indexing info to each node.
The affected methods are remove(Fqn), loadState(Fqn subtree, ..) and
put(Fqn, value).

How I've tested the performance:
 a tree with a TREE_DEPTH depth is created. Each node in the tree has a
CHILDREN number of children. By default I've used TREE_DEPTH=7 and
CHILDREN=3 which gave a total number of 3280 children. A node is selected at
each depth from disjunct  subtrees.  ( i.e. in the benchmark used 7 nodes
are selected). On all those nodes a certain operation is performed:
loadState or remove. For benchmarking the put operation, the in memory tree
nodes are Collections.shuffle(List<Fqn>), then they are (randomly) added to
the database/classLoader.put. Also additional testing was performed in the
case of put for deep trees (100 - 200 nodes)

Results:
On the load and delete the new implementation is  much  more better.  The
new deletion time is linear compared with the old approach where recursive
db calls lead to an exponential raise(see - remove_node_all.PNG). Relatively
the same in the case of  loading subtrees(see - remove_node_all.PNG).
The problem is in the case of additions. When adding all 3280 nodes the time
is constantly 10 times bigger than when using existing implementation.
Profiling shewed that the update_indexes operation behaves really badly -
nothing to do in this area. Even if the operation is totally moved in the
database (mysql 5.0.X has limitations on working with triggers so I've used
a stored procedure) is doesn't cause an significant increase.

Another approach.
Another approach I've consider is to reduce the recursive DB calls to a fix
number. All operation rely on the fact that the entire fqn is persisted for
each node. E.g.  when deleting node(/a/b/c)  I'll remove all records that
have a path starting with (/a/b/c/). Same in the case of node loading. It
proved to work even better than in the case of first implementation.
Recursion in the case of insertions - I've solved this by
a) querying  for the  first persited ancestor
b)  add all the  children  in a  batch  pr statement.
Interestingly enough this always-2-queries approach is 30% slower than the
old recursive impl. Profiling showed that the db driver (mysql-connector/J)
takes the same time for executing a batch (new impl) as it takes in the case
of individual calls (old impl).Also the firstAncestor query takes quite long
on mysql.  Another observation is that new impl proves much better in the
case of deeper trees, and fails up-2 50% in the case of flatter trees. An
interesting thing would be to see it working on an Oracle/SqlServ... The
good news is that old implementation can be used instead - or a policy can
be defined to use the new impl where tree structure requires it.

Naturally I'd chose the second approach - if you don't see any problem with
it

I'll stop here :)

Cheers,
Mircea

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006