[infinispan-issues] [JBoss JIRA] Commented: (ISPN-1317) JGroupsDistSync.flushWaitGate appears to be left open
Galder Zamarreño (JIRA)
jira-events at lists.jboss.org
Mon Aug 8 12:46:24 EDT 2011
[ https://issues.jboss.org/browse/ISPN-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619439#comment-12619439 ]
Galder Zamarreño commented on ISPN-1317:
----------------------------------------
Actually, this might a different issue altogether, more related to concurrent state transfer request. Assuming NodeA is the node that should provide state, and NodeB the one that should apply the state:
- NodeB starts up and requests state transfer from NodeB, sending a StateTransferControlCommand command, opening the flush wait latch.
- NodeB takes the StateTransferControlCommand and opens the flush wait too.
- NodeB's latch.await() succeeds and is able to write the commit log.
- NodeA receives the state and closes the flush wait and sends a NodeA a StateTransferControlCommand request that it closes it too.
Now, the logs attached to JBPAPP-6929 would appear to show that while a node (michal-linhard-37465) is requesting state from another (michal-linhard-12702), another node's (michal-linhard-61619) concurrent state transfer request to michal-linhard-12702 might be closing the wait latch. I'll provide a more detailed view tomorrow.
> JGroupsDistSync.flushWaitGate appears to be left open
> -----------------------------------------------------
>
> Key: ISPN-1317
> URL: https://issues.jboss.org/browse/ISPN-1317
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.0.0.FINAL
> Reporter: Galder Zamarreño
> Assignee: Galder Zamarreño
> Fix For: 5.1.0.ALPHA1, 5.1.0.FINAL
>
>
> Logs in JBPAPP-6929 show:
> {code}15:40:22,698 ERROR [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (STREAMING_STATE_TRANSFER-sender-1,default,michal-linhard-12702)
> ISPN000095: Caught while responding to state transfer request: org.infinispan.statetransfer.StateTransferException:
> java.util.concurrent.TimeoutException: Timed out waiting for a cluster-wide sync to be acquired. (timeout = 60 seconds)
> at org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:162) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
> at org.infinispan.remoting.InboundInvocationHandlerImpl.generateState(InboundInvocationHandlerImpl.java:248) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.getState(JGroupsTransport.java:590) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:690) [jgroups-2.12.1.Final.jar:]
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:771) [jgroups-2.12.1.Final.jar:]
> at org.jgroups.JChannel.up(JChannel.java:1484) [jgroups-2.12.1.Final.jar:]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1074) [jgroups-2.12.1.Final.jar:]
> at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:477) [jgroups-2.12.1.Final.jar:]
> at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderHandler.process(STREAMING_STATE_TRANSFER.java:651) [jgroups-2.12.1.Final.jar:]
> at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderThreadSpawner$1.run(STREAMING_STATE_TRANSFER.java:580) [jgroups-2.12.1.Final.jar:]
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [:1.6.0_25]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [:1.6.0_25]
> at java.lang.Thread.run(Thread.java:662) [:1.6.0_25]
> Caused by: java.util.concurrent.TimeoutException: Timed out waiting for a cluster-wide sync to be acquired. (timeout = 60 seconds)
> at org.infinispan.remoting.transport.jgroups.JGroupsDistSync.blockUntilAcquired(JGroupsDistSync.java:62) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferManagerImpl.generateTransactionLog(StateTransferManagerImpl.java:196) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:152) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
> ... 12 more{code}
> Now, what's odd about this is that the JGroupsDistSync.flushWaitGate behind it is only acquired/released while state transfer control command is sent and the logs show that both the enabling(acquiring) and disabling(releasing) state transfer control commands where built:
> {code}15:39:20,902 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4)
> dests=[michal-linhard-37465], command=StateTransferControlCommand{enabled=true}, mode=SYNCHRONOUS, timeout=480000
> ...
> 15:39:21,074 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4)
> dests=[michal-linhard-37465], command=StateTransferControlCommand{enabled=false}, mode=SYNCHRONOUS, timeout=480000{code}
> There's no other references to StateTransferControlCommand, so how can it be that flushWaitGate is open?
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list