[infinispan-issues] [JBoss JIRA] Updated: (ISPN-1317) JGroupsDistSync.flushWaitGate appears to be left open

Galder Zamarreño (JIRA) jira-events at lists.jboss.org
Tue Aug 9 03:55:24 EDT 2011


     [ https://issues.jboss.org/browse/ISPN-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Galder Zamarreño updated ISPN-1317:
-----------------------------------

    Attachment: 1317-analysis.txt


Indeed this is a different issue. Basically, concurrent state transfer requests appear to lead to premature flush wait gate closures. StateTransferControlCommand needs a bit more logging in perform() to confirm my suspicions, but I'm pretty sure about this.

State transfer is being phased out in favour of rehashing that applies to replication too, so I won't be looking into this issue immediately.

> JGroupsDistSync.flushWaitGate appears to be left open
> -----------------------------------------------------
>
>                 Key: ISPN-1317
>                 URL: https://issues.jboss.org/browse/ISPN-1317
>             Project: Infinispan
>          Issue Type: Bug
>          Components: State transfer
>    Affects Versions: 5.0.0.FINAL
>            Reporter: Galder Zamarreño
>            Assignee: Galder Zamarreño
>             Fix For: 5.1.0.ALPHA1, 5.1.0.FINAL
>
>         Attachments: 1317-analysis.txt
>
>
> Logs in JBPAPP-6929 show:
> {code}15:40:22,698 ERROR [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (STREAMING_STATE_TRANSFER-sender-1,default,michal-linhard-12702)
>  ISPN000095: Caught while responding to state transfer request: org.infinispan.statetransfer.StateTransferException: 
> java.util.concurrent.TimeoutException: Timed out waiting for a cluster-wide sync to be acquired. (timeout = 60 seconds)
> 	at org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:162) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
> 	at org.infinispan.remoting.InboundInvocationHandlerImpl.generateState(InboundInvocationHandlerImpl.java:248) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
> 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.getState(JGroupsTransport.java:590) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
> 	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:690) [jgroups-2.12.1.Final.jar:]
> 	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:771) [jgroups-2.12.1.Final.jar:]
> 	at org.jgroups.JChannel.up(JChannel.java:1484) [jgroups-2.12.1.Final.jar:]
> 	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1074) [jgroups-2.12.1.Final.jar:]
> 	at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:477) [jgroups-2.12.1.Final.jar:]
> 	at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderHandler.process(STREAMING_STATE_TRANSFER.java:651) [jgroups-2.12.1.Final.jar:]
> 	at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderThreadSpawner$1.run(STREAMING_STATE_TRANSFER.java:580) [jgroups-2.12.1.Final.jar:]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [:1.6.0_25]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [:1.6.0_25]
> 	at java.lang.Thread.run(Thread.java:662) [:1.6.0_25]
> Caused by: java.util.concurrent.TimeoutException: Timed out waiting for a cluster-wide sync to be acquired. (timeout = 60 seconds)
> 	at org.infinispan.remoting.transport.jgroups.JGroupsDistSync.blockUntilAcquired(JGroupsDistSync.java:62) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
> 	at org.infinispan.statetransfer.StateTransferManagerImpl.generateTransactionLog(StateTransferManagerImpl.java:196) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
> 	at org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:152) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
> 	... 12 more{code}
> Now, what's odd about this is that the JGroupsDistSync.flushWaitGate behind it is only acquired/released while state transfer control command is sent and the logs show that both the enabling(acquiring) and disabling(releasing) state transfer control commands where built:
> {code}15:39:20,902 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) 
> dests=[michal-linhard-37465], command=StateTransferControlCommand{enabled=true}, mode=SYNCHRONOUS, timeout=480000
> ...
> 15:39:21,074 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) 
> dests=[michal-linhard-37465], command=StateTransferControlCommand{enabled=false}, mode=SYNCHRONOUS, timeout=480000{code}
> There's no other references to StateTransferControlCommand, so how can it be that flushWaitGate is open?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       



More information about the infinispan-issues mailing list