Hi Dan,
Re:
http://goo.gl/TGwrP
There's a few of this in the Hot Rod server+client testsuites. It's easy to
replicate it locally. Seems like cache operations right after a cache has started are
rather problematic.
In local execution of HotRodReplicationTest, I was able to replicate the issue when trying
to test topology changes. Please find attached the log file, but here're the
interesting bits:
1. A new view installation is being prepared with NodeA and NodeB:
2011-10-24 14:36:09,046 4221 TRACE [org.infinispan.cacheviews.CacheViewsManagerImpl]
(OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) ___hotRodTopologyCache:
Preparing cache view CacheView{viewId=4, members=[NodeA-63227, NodeB-15806]}, committed
view is CacheView{viewId=3, members=[NodeA-63227, NodeB-15806, NodeC-17654]}
…
2011-10-24 14:36:09,047 4222 DEBUG [org.infinispan.statetransfer.StateTransferLockImpl]
(OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Blocking new transactions
2011-10-24 14:36:09,047 4222 TRACE [org.infinispan.statetransfer.StateTransferLockImpl]
(OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Acquiring exclusive state
transfer shared lock, shared holders: 0
2011-10-24 14:36:09,047 4222 TRACE [org.infinispan.statetransfer.StateTransferLockImpl]
(OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Acquired state transfer lock
in exclusive mode
2. The cluster coordinator discovers a view change and requests NodeA and NodeB to remove
NodeC from the topology view:
2011-10-24 14:36:09,048 4223 TRACE
[org.infinispan.interceptors.InvocationContextInterceptor]
(OOB-3,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Invoked with command
RemoveCommand{key=NodeC-17654, value=null, flags=null} and InvocationContext
[NonTxInvocationContext{flags=null}]
3. NodeB has not yet finished installing the cache view, so that remove times out:
2011-10-24 14:36:09,049 4224 ERROR
[org.infinispan.interceptors.InvocationContextInterceptor]
(OOB-3,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) ISPN000136: Execution error
org.infinispan.distribution.RehashInProgressException: Timed out waiting for the
transaction lock
A way to solve this is to avoid relying on cluster view changes, but instead wait for the
cache view to be installed, and then do the operations then. Is there any way to wait till
then?
One way would be to have some CacheView installed callbacks or similar. This could be a
good option cos I could have a CacheView listener for the hot rod topology cache whose
callbacks I can check for isPre=false and then do the cache ops safely.
Otherwise, code like this the one I used for keeping the Hot Rod topology is gonna be
racing against your cache view installation code.
You seem to have some pieces in place for this, i.e. CacheViewListener, but it seems only
designed for internal core/ work.
Any other suggestions?
Cheers,
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache