[jbosscache-dev] JDBCCacheLoader performance
Manik Surtani
manik at jboss.org
Mon Jan 29 06:34:05 EST 2007
Nice one Mircea.
Some comments:
* Re: your "second impl", For the sake of general portability I'd
stay away from prepared statements/triggers and strive for a Java-
only solution. I'd leave optimisations such as prepared statements
for people who want to further customise the JDBCCacheLoader for
their specific DBMS product. Then again, this could be a
configuration param where the "default" is to use the Java-only
solution, and *if* the name and signature of a prepared statement is
provided for a batch insert then use this.
* Again in the case of your second approach, you're using the fact
that Fqns "start with" a particular string (e.g., "/a/b/c/") and
removing these when you need to call a remove ("/a/b/c") is fine,
only it still assumes that Fqns are comprised only of Strings. This
is fine, this limitation currently exists in the JDBCCacheLoader
anyway, just highlighting.
* "Also the firstAncestor query takes quite long on mysql." - What is
this query? Is this a case of, if I am adding ("/a/b/c") you need to
do a SELECT ... WHERE FQN="/a/b" ?Why is this specifically slower
than any other SELECT on an indexed column?
* "Another observation is that new impl proves much better in the
case of deeper trees, and fails up-2 50% in the case of flatter
trees." - what do you mean by fails? I hope this "just" means worse
performance rather than functional failure? :-) And if so, how much
worse is this with flatter trees, and what is this threshold? 2
nodes deep, 3 nodes deep, etc?
Cheers,
--
Manik Surtani
Lead, JBoss Cache
JBoss, a division of Red Hat
Email: manik at jboss.org
Telephone: +44 7786 702 706
MSN: manik at surtani.org
Yahoo/AIM/Skype: maniksurtani
On 29 Jan 2007, at 00:42, Mircea Markus wrote:
> Hi guys,
>
> I've drafted the alternative implementation of JDBC cache loader.
> Just a brief, the base idea is to to traverse the tree in preorder
> and add some indexing info to each node.
> The affected methods are remove(Fqn), loadState(Fqn subtree, ..)
> and put(Fqn, value).
>
> How I've tested the performance:
> a tree with a TREE_DEPTH depth is created. Each node in the tree
> has a CHILDREN number of children. By default I've used
> TREE_DEPTH=7 and CHILDREN=3 which gave a total number of 3280
> children. A node is selected at each depth from disjunct
> subtrees. ( i.e. in the benchmark used 7 nodes are selected). On
> all those nodes a certain operation is performed: loadState or
> remove. For benchmarking the put operation, the in memory tree
> nodes are Collections.shuffle(List<Fqn>), then they are (randomly)
> added to the database/classLoader.put. Also additional testing was
> performed in the case of put for deep trees (100 - 200 nodes)
>
>
> Results:
> On the load and delete the new implementation is much more
> better. The new deletion time is linear compared with the old
> approach where recursive db calls lead to an exponential raise(see
> - remove_node_all.PNG). Relatively the same in the case of loading
> subtrees(see - remove_node_all.PNG).
> The problem is in the case of additions. When adding all 3280 nodes
> the time is constantly 10 times bigger than when using existing
> implementation. Profiling shewed that the update_indexes operation
> behaves really badly - nothing to do in this area. Even if the
> operation is totally moved in the database (mysql 5.0.X has
> limitations on working with triggers so I've used a stored
> procedure) is doesn't cause an significant increase.
>
> Another approach.
> Another approach I've consider is to reduce the recursive DB calls
> to a fix number. All operation rely on the fact that the entire fqn
> is persisted for each node. E.g. when deleting node(/a/b/c) I'll
> remove all records that have a path starting with (/a/b/c/). Same
> in the case of node loading. It proved to work even better than in
> the case of first implementation.
> Recursion in the case of insertions - I've solved this by
> a) querying for the first persited ancestor
> b) add all the children in a batch pr statement.
> Interestingly enough this always-2-queries approach is 30% slower
> than the old recursive impl. Profiling showed that the db driver
> (mysql-connector/J) takes the same time for executing a batch (new
> impl) as it takes in the case of individual calls (old impl).Also
> the firstAncestor query takes quite long on mysql. Another
> observation is that new impl proves much better in the case of
> deeper trees, and fails up-2 50% in the case of flatter trees. An
> interesting thing would be to see it working on an Oracle/
> SqlServ... The good news is that old implementation can be used
> instead - or a policy can be defined to use the new impl where tree
> structure requires it.
>
> Naturally I'd chose the second approach - if you don't see any
> problem with it
>
> I'll stop here :)
>
> Cheers,
> Mircea
>
>
>
>
>
> <load_state_all.PNG>
> <remove_node_all.PNG>
> _______________________________________________
> jbosscache-dev mailing list
> jbosscache-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/jbosscache-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/jbosscache-dev/attachments/20070129/3ca152a4/attachment.html
More information about the jbosscache-dev
mailing list