[jbosscache-dev] JDBCCacheLoader performance

Mon Jan 29 06:34:05 EST 2007

Nice one Mircea.

Some comments:

* Re: your "second impl", For the sake of general portability I'd  
stay away from prepared statements/triggers and strive for a Java- 
only solution.  I'd leave optimisations such as prepared statements  
for people who want to further customise the JDBCCacheLoader for  
their specific DBMS product.  Then again, this could be a  
configuration param where the "default" is to use the Java-only  
solution, and *if* the name and signature of a prepared statement is  
provided for a batch insert then use this.

* Again in the case of your second approach, you're using the fact  
that Fqns "start with" a particular string (e.g., "/a/b/c/") and  
removing these when you need to call a remove ("/a/b/c") is fine,  
only it still assumes that Fqns are comprised only of Strings.  This  
is fine, this limitation currently exists in the JDBCCacheLoader  
anyway, just highlighting.

* "Also the firstAncestor query takes quite long on mysql." - What is  
this query?  Is this a case of, if I am adding ("/a/b/c") you need to  
do a SELECT ... WHERE FQN="/a/b" ?Why is this specifically slower  
than any other SELECT on an indexed column?

* "Another observation is that new impl proves much better in the  
case of deeper trees, and fails up-2 50% in the case of flatter  
trees." - what do you mean by fails?  I hope this "just" means worse  
performance rather than functional failure?  :-)  And if so, how much  
worse is this with flatter trees, and what is this threshold?  2  
nodes deep, 3 nodes deep, etc?

Cheers,
--
Manik Surtani

Lead, JBoss Cache
JBoss, a division of Red Hat

Email: manik at jboss.org
Telephone: +44 7786 702 706
MSN: manik at surtani.org
Yahoo/AIM/Skype: maniksurtani

On 29 Jan 2007, at 00:42, Mircea Markus wrote:

> Hi guys,
>
> I've drafted the alternative implementation of JDBC cache loader.  
> Just a brief, the base idea is to to traverse the tree in preorder  
> and add some indexing info to each node.
> The affected methods are remove(Fqn), loadState(Fqn subtree, ..)  
> and put(Fqn, value).
>
> How I've tested the performance:
>  a tree with a TREE_DEPTH depth is created. Each node in the tree  
> has a CHILDREN number of children. By default I've used  
> TREE_DEPTH=7 and CHILDREN=3 which gave a total number of 3280  
> children. A node is selected at each depth from disjunct   
> subtrees.  ( i.e. in the benchmark used 7 nodes are selected). On  
> all those nodes a certain operation is performed: loadState or  
> remove. For benchmarking the put operation, the in memory tree  
> nodes are Collections.shuffle(List<Fqn>), then they are (randomly)  
> added to the database/classLoader.put. Also additional testing was  
> performed in the case of put for deep trees (100 - 200 nodes)
>
>
> Results:
> On the load and delete the new implementation is  much  more  
> better.  The new deletion time is linear compared with the old  
> approach where recursive db calls lead to an exponential raise(see  
> - remove_node_all.PNG). Relatively the same in the case of  loading  
> subtrees(see - remove_node_all.PNG).
> The problem is in the case of additions. When adding all 3280 nodes  
> the time is constantly 10 times bigger than when using existing  
> implementation. Profiling shewed that the update_indexes operation  
> behaves really badly - nothing to do in this area. Even if the  
> operation is totally moved in the database (mysql 5.0.X has  
> limitations on working with triggers so I've used a stored  
> procedure) is doesn't cause an significant increase.
>
> Another approach.
> Another approach I've consider is to reduce the recursive DB calls  
> to a fix number. All operation rely on the fact that the entire fqn  
> is persisted for each node. E.g.  when deleting node(/a/b/c)  I'll  
> remove all records that have a path starting with (/a/b/c/). Same  
> in the case of node loading. It proved to work even better than in  
> the case of first implementation.
> Recursion in the case of insertions - I've solved this by
> a) querying  for the  first persited ancestor
> b)  add all the  children  in a  batch  pr statement.
> Interestingly enough this always-2-queries approach is 30% slower  
> than the old recursive impl. Profiling showed that the db driver  
> (mysql-connector/J) takes the same time for executing a batch (new  
> impl) as it takes in the case of individual calls (old impl).Also  
> the firstAncestor query takes quite long on mysql.  Another  
> observation is that new impl proves much better in the case of  
> deeper trees, and fails up-2 50% in the case of flatter trees. An  
> interesting thing would be to see it working on an Oracle/ 
> SqlServ... The good news is that old implementation can be used  
> instead - or a policy can be defined to use the new impl where tree  
> structure requires it.
>
> Naturally I'd chose the second approach - if you don't see any  
> problem with it
>
> I'll stop here :)
>
> Cheers,
> Mircea
>
>
>
>
>
> <load_state_all.PNG>
> <remove_node_all.PNG>
> _______________________________________________
> jbosscache-dev mailing list
> jbosscache-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/jbosscache-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/jbosscache-dev/attachments/20070129/3ca152a4/attachment.html