Hi.
We are currently testing a scenario of starting up an additional cache to a cluster where
replication is running. We have been using 2.0.0 GA with Jgroups 2.5.0, but have
encountered problems with this (mainly 'cache not in started state'). So we
decided to try 2.1.0 CR2.
It seems that by using 2.1.0 CR2 and jgroups 2.6.1 (not the bundled version), we get the
most stable node up and down. However, sometimes when starting up a second cache we get
the lock timeout described in the stacktraces below.
| ----------------- Server 1 (Producer) -----------------
|
| org.jboss.cache.lock.TimeoutException: failure acquiring lock:
fqn=/_BUDDY_BACKUP_/192.168.1.135_50641//31,
caller=Thread[Incoming,192.168.1.112:32903,10,Thread Pools], lock=read
owners=[Thread[Incoming,192.168.1.112:32903,10,Thread Pools]] (activeReaders=1,
activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0)
| at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:528)
| at
org.jboss.cache.interceptors.PessimisticLockInterceptor$LockManager.acquire(PessimisticLockInterceptor.java:680)
| at
org.jboss.cache.interceptors.PessimisticLockInterceptor.acquireNodeLock(PessimisticLockInterceptor.java:481)
| at
org.jboss.cache.interceptors.PessimisticLockInterceptor.lock(PessimisticLockInterceptor.java:377)
| at
org.jboss.cache.interceptors.PessimisticLockInterceptor.invoke(PessimisticLockInterceptor.java:193)
| at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
| at
org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:34)
| at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
| at
org.jboss.cache.interceptors.NotificationInterceptor.invoke(NotificationInterceptor.java:32)
| at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
| at
org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:298)
| at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:130)
| at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
| at
org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:97)
| at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
| at
org.jboss.cache.interceptors.InvocationContextInterceptor.invoke(InvocationContextInterceptor.java:62)
| at org.jboss.cache.CacheImpl.invokeMethod(CacheImpl.java:4004)
| at org.jboss.cache.CacheImpl._replicate(CacheImpl.java:2861)
| at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
| at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
| at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
| at java.lang.reflect.Method.invoke(Method.java:597)
| at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:330)
| at
org.jboss.cache.marshall.InactiveRegionAwareRpcDispatcher.handle(InactiveRegionAwareRpcDispatcher.java:77)
| at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:624)
| at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:533)
| at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:365)
| at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:746)
| at org.jgroups.JChannel.up(JChannel.java:1151)
| at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:362)
| at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:427)
| at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:242)
| at org.jgroups.protocols.FRAG2.up(FRAG2.java:205)
| at org.jgroups.protocols.pbcast.GMS.up(GMS.java:767)
| at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
| at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:579)
| at org.jgroups.protocols.UNICAST.up(UNICAST.java:250)
| at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:714)
| at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
| at org.jgroups.protocols.FD.up(FD.java:322)
| at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:300)
| at org.jgroups.protocols.MERGE2.up(MERGE2.java:145)
| at org.jgroups.protocols.Discovery.up(Discovery.java:246)
| at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1535)
| at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1484)
| at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
| at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
| at java.lang.Thread.run(Thread.java:619)
| Caused by: org.jboss.cache.lock.TimeoutException: write lock for
/_BUDDY_BACKUP_/192.168.1.135_50641//31 could not be acquired after 10000 ms. Locks: Read
lock owners: [Thread[Incoming,192.168.1.112:32903,10,Thread Pools]]
| Write lock owner: null
| (caller=Thread[Incoming,192.168.1.112:32903,10,Thread Pools], lock info: read
owners=[Thread[Incoming,192.168.1.112:32903,10,Thread Pools]] (activeReaders=1,
activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0))
| at org.jboss.cache.lock.IdentityLock.acquireWriteLock0(IdentityLock.java:244)
| at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:167)
| at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:497)
| ... 47 more
|
| ----------------- Server 2 (Replica) -----------------
|
| WARN CacheImpl - replication failure with method_call MethodName: _put;
MethodIdInteger: 3; Args: (null, /_BUDDY_BACKUP_/192.168.1.135_50641//31, 8, , true)
exception
| org.jboss.cache.lock.TimeoutException: failure acquiring lock:
fqn=/_BUDDY_BACKUP_/192.168.1.135_50641//31,
caller=Thread[Incoming,192.168.1.112:32903,10,Thread Pools], lock=read
owners=[Thread[Incoming, 192.168.1.112:32903,10,Thread Pools]] (activeReaders=1,
activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0)
| at org.jboss.cache.lock.IdentityLock.acquire (IdentityLock.java:528)
| at
org.jboss.cache.interceptors.PessimisticLockInterceptor$LockManager.acquire(PessimisticLockInterceptor.java:680)
| at org.jboss.cache.interceptors.PessimisticLockInterceptor.acquireNodeLock
(PessimisticLockInterceptor.java:481)
| at
org.jboss.cache.interceptors.PessimisticLockInterceptor.lock(PessimisticLockInterceptor.java:377)
| at
org.jboss.cache.interceptors.PessimisticLockInterceptor.invoke(PessimisticLockInterceptor.java
:193)
| at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
| at
org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:34)
| at org.jboss.cache.interceptors.Interceptor.invoke (Interceptor.java:77)
| at
org.jboss.cache.interceptors.NotificationInterceptor.invoke(NotificationInterceptor.java:32)
| at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
| at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod
(TxInterceptor.java:298)
| at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:130)
| at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
| at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke
(CacheMgmtInterceptor.java:97)
| at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:77)
| at
org.jboss.cache.interceptors.InvocationContextInterceptor.invoke(InvocationContextInterceptor.java:62)
| at org.jboss.cache.CacheImpl.invokeMethod(CacheImpl.java:4004)
| at org.jboss.cache.CacheImpl._replicate(CacheImpl.java:2861)
| at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
| at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:39)
| at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
| at java.lang.reflect.Method.invoke(Method.java:597)
| at org.jgroups.blocks.MethodCall.invoke (MethodCall.java:330)
| at
org.jboss.cache.marshall.InactiveRegionAwareRpcDispatcher.handle(InactiveRegionAwareRpcDispatcher.java:77)
| at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java
:624)
| at
org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:533)
| at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:365)
| at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up
(MessageDispatcher.java:746)
| at org.jgroups.JChannel.up(JChannel.java:1151)
| at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:362)
| at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:427)
| at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:242)
| at org.jgroups.protocols.FRAG2.up(FRAG2.java:205)
| at org.jgroups.protocols.pbcast.GMS.up(GMS.java:767)
| at org.jgroups.protocols.pbcast.STABLE.up (STABLE.java:234)
| at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:579)
| at org.jgroups.protocols.UNICAST.up(UNICAST.java:250)
| at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:714)
| at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
| at org.jgroups.protocols.FD.up(FD.java:322)
| at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:300)
| at org.jgroups.protocols.MERGE2.up (MERGE2.java:145)
| at org.jgroups.protocols.Discovery.up(Discovery.java:246)
| at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1535)
| at org.jgroups.protocols.TP$IncomingPacket.run(TP.java :1484)
| at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
| at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
| at java.lang.Thread.run (Thread.java:619)
| Caused by: org.jboss.cache.lock.TimeoutException: write lock for
/_BUDDY_BACKUP_/192.168.1.135_50641//31 could not be acquired after 10000 ms. Locks: Read
lock owners: [Thread[Incoming, 192.168.1.112:32903,10,Thread Pools]]
| Write lock owner: null
| (caller=Thread[Incoming,192.168.1.112:32903,10,Thread Pools], lock info: read
owners=[Thread[Incoming, 192.168.1.112:32903,10,Thread Pools]] (activeReaders=1,
activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0))
| at org.jboss.cache.lock.IdentityLock.acquireWriteLock0(IdentityLock.java:244)
| at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:167)
| at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:497)
| ... 47 more
|
If we inspect the threads involved we see that thread A is waiting on thread B which is
waiting on thread A. So we have (I think) something similar to a distributed dead lock
scenario which is resolved by a lock timeout (after configured timeout).
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4114710#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...