[jboss-user] [JBossCache] - Data gravitation synchronization issue
do-not-reply at jboss.com
Thu Sep 27 04:56:05 EDT 2007
We have isolated what we think is a synchronization issue during data gravitation over multiple nodes using buddy replication.We have a unit test demonstrating the issue which I can send to anyone interested.
What appears to happen is this: When two nodes are involved in a data gravitation sometimes multiple data gravitation cleanup commands are issued of which one blocks the other. The calling node then times out after "buddyCommunicationTimeout" milliseconds (a timeout which is only logged as debug?) and returns null, making it look like the requested data does not exist in the cache. Further investigation reveals two global transactions on the data holding cache, one which holds an identity lock (write) for a backup data node and the other waiting to lock the same node.
Depending on "LockAcquisitionTimeout" the blocked request may continue, but we have seen several consequences of this depending on whether a user transaction is involved or not. Sometimes the lock seems to disappear and sometimes it doesn't and the application is in effect completely blocked (as the jgroup thread will be holding a lock on a NakReceiverWindow).
This behavior only occurs (as far as we've seen) when there's quite a bit of concurrent access, in particular: when data is added to one node but accessed immediately on another, ie. when addition and gravitation occurs immediately. We have tried disabling auto gravitation and generally playing around with the configuration but with no effect.
This seems like a synchronization issue and the unit test I can send along also shows that it is intermittent, sometimes the test will go through only to fail the next time and sometime a particular test fails when run standalone only to succeed if another test was run immediately before making it look like burn-in affects the result (which it of course may do).
Our unit test tries to model high concurrency in three different variations, one with a simple data gravitation, one with gravitation followed by a modification (subsequent put on the cache) and one with gravitation and modification within a user transaction. The last of these scenarios is basically what our real application is doing.
I've spent considerable time looking at this so feel free to ask questions. Also, tell me where to send the unit test if you want to have a look at it. The unit test repeatedly fails on a java5/linux/dual core/cache 2.5.0.GA setup.
/Lars J. Nilsson
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4089200#4089200
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4089200
More information about the jboss-user