Nice one Mircea.

Some comments:

* Re: your "second impl", For the sake of general portability I'd stay away from prepared statements/triggers and strive for a Java-only solution.  I'd leave optimisations such as prepared statements for people who want to further customise the JDBCCacheLoader for their specific DBMS product.  Then again, this could be a configuration param where the "default" is to use the Java-only solution, and *if* the name and signature of a prepared statement is provided for a batch insert then use this.

* Again in the case of your second approach, you're using the fact that Fqns "start with" a particular string (e.g., "/a/b/c/") and removing these when you need to call a remove ("/a/b/c") is fine, only it still assumes that Fqns are comprised only of Strings.  This is fine, this limitation currently exists in the JDBCCacheLoader anyway, just highlighting.

* "Also the firstAncestor query takes quite long on mysql." - What is this query?  Is this a case of, if I am adding ("/a/b/c") you need to do a SELECT ... WHERE FQN="/a/b" ?Why is this specifically slower than any other SELECT on an indexed column? 

* "Another observation is that new impl proves much better in the case of deeper trees, and fails up-2 50% in the case of flatter trees." - what do you mean by fails?  I hope this "just" means worse performance rather than functional failure?  :-)  And if so, how much worse is this with flatter trees, and what is this threshold?  2 nodes deep, 3 nodes deep, etc?

Cheers,
--
Manik Surtani

Lead, JBoss Cache
JBoss, a division of Red Hat

Email: manik@jboss.org
Telephone: +44 7786 702 706
MSN: manik@surtani.org
Yahoo/AIM/Skype: maniksurtani



On 29 Jan 2007, at 00:42, Mircea Markus wrote:

Hi guys,

I've drafted the alternative implementation of JDBC cache loader. Just a brief, the base idea is to to traverse the tree in preorder and add some indexing info to each node.
The affected methods are remove(Fqn), loadState(Fqn subtree, ..) and put(Fqn, value).

How I've tested the performance:
 a tree with a TREE_DEPTH depth is created. Each node in the tree has a CHILDREN number of children. By default I've used TREE_DEPTH=7 and CHILDREN=3 which gave a total number of 3280 children. A node is selected at each depth from disjunct  subtrees.  ( i.e. in the benchmark used 7 nodes are selected). On all those nodes a certain operation is performed: loadState or remove. For benchmarking the put operation, the in memory tree nodes are Collections.shuffle(List<Fqn>), then they are (randomly) added to the database/classLoader.put. Also additional testing was performed in the case of put for deep trees (100 - 200 nodes)


Results:
On the load and delete the new implementation is  much  more better.  The new deletion time is linear compared with the old approach where recursive db calls lead to an exponential raise(see - remove_node_all.PNG). Relatively the same in the case of  loading subtrees(see - remove_node_all.PNG).
The problem is in the case of additions. When adding all 3280 nodes the time is constantly 10 times bigger than when using existing implementation. Profiling shewed that the update_indexes operation behaves really badly - nothing to do in this area. Even if the operation is totally moved in the database (mysql 5.0.X has limitations on working with triggers so I've used a stored procedure) is doesn't cause an significant increase.

Another approach.
Another approach I've consider is to reduce the recursive DB calls to a fix number. All operation rely on the fact that the entire fqn is persisted for each node. E.g.  when deleting node(/a/b/c)  I'll remove all records that have a path starting with (/a/b/c/). Same in the case of node loading. It proved to work even better than in the case of first implementation.
Recursion in the case of insertions - I've solved this by
a) querying  for the  first persited ancestor
b)  add all the  children  in a  batch  pr statement. 
Interestingly enough this always-2-queries approach is 30% slower than the old recursive impl. Profiling showed that the db driver (mysql-connector/J) takes the same time for executing a batch (new impl) as it takes in the case of individual calls (old impl).Also the firstAncestor query takes quite long on mysql.  Another observation is that new impl proves much better in the case of deeper trees, and fails up-2 50% in the case of flatter trees. An interesting thing would be to see it working on an Oracle/SqlServ... The good news is that old implementation can be used instead - or a policy can be defined to use the new impl where tree structure requires it.

Naturally I'd chose the second approach - if you don't see any problem with it

I'll stop here :)

Cheers,
Mircea





<load_state_all.PNG>
<remove_node_all.PNG>
_______________________________________________
jbosscache-dev mailing list
jbosscache-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev