]
Galder Zamarreño updated ISPN-1317:
-----------------------------------
Attachment: 1317-analysis.txt
Indeed this is a different issue. Basically, concurrent state transfer requests appear to
lead to premature flush wait gate closures. StateTransferControlCommand needs a bit more
logging in perform() to confirm my suspicions, but I'm pretty sure about this.
State transfer is being phased out in favour of rehashing that applies to replication too,
so I won't be looking into this issue immediately.
JGroupsDistSync.flushWaitGate appears to be left open
-----------------------------------------------------
Key: ISPN-1317
URL:
https://issues.jboss.org/browse/ISPN-1317
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.0.0.FINAL
Reporter: Galder Zamarreño
Assignee: Galder Zamarreño
Fix For: 5.1.0.ALPHA1, 5.1.0.FINAL
Attachments: 1317-analysis.txt
Logs in JBPAPP-6929 show:
{code}15:40:22,698 ERROR [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(STREAMING_STATE_TRANSFER-sender-1,default,michal-linhard-12702)
ISPN000095: Caught while responding to state transfer request:
org.infinispan.statetransfer.StateTransferException:
java.util.concurrent.TimeoutException: Timed out waiting for a cluster-wide sync to be
acquired. (timeout = 60 seconds)
at
org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:162)
[infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
at
org.infinispan.remoting.InboundInvocationHandlerImpl.generateState(InboundInvocationHandlerImpl.java:248)
[infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.getState(JGroupsTransport.java:590)
[infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:690)
[jgroups-2.12.1.Final.jar:]
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:771)
[jgroups-2.12.1.Final.jar:]
at org.jgroups.JChannel.up(JChannel.java:1484) [jgroups-2.12.1.Final.jar:]
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1074)
[jgroups-2.12.1.Final.jar:]
at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:477) [jgroups-2.12.1.Final.jar:]
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderHandler.process(STREAMING_STATE_TRANSFER.java:651)
[jgroups-2.12.1.Final.jar:]
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderThreadSpawner$1.run(STREAMING_STATE_TRANSFER.java:580)
[jgroups-2.12.1.Final.jar:]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[:1.6.0_25]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
[:1.6.0_25]
at java.lang.Thread.run(Thread.java:662) [:1.6.0_25]
Caused by: java.util.concurrent.TimeoutException: Timed out waiting for a cluster-wide
sync to be acquired. (timeout = 60 seconds)
at
org.infinispan.remoting.transport.jgroups.JGroupsDistSync.blockUntilAcquired(JGroupsDistSync.java:62)
[infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
at
org.infinispan.statetransfer.StateTransferManagerImpl.generateTransactionLog(StateTransferManagerImpl.java:196)
[infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
at
org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:152)
[infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
... 12 more{code}
Now, what's odd about this is that the JGroupsDistSync.flushWaitGate behind it is
only acquired/released while state transfer control command is sent and the logs show that
both the enabling(acquiring) and disabling(releasing) state transfer control commands
where built:
{code}15:39:20,902 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(MSC service thread 1-4)
dests=[michal-linhard-37465], command=StateTransferControlCommand{enabled=true},
mode=SYNCHRONOUS, timeout=480000
...
15:39:21,074 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC
service thread 1-4)
dests=[michal-linhard-37465], command=StateTransferControlCommand{enabled=false},
mode=SYNCHRONOUS, timeout=480000{code}
There's no other references to StateTransferControlCommand, so how can it be that
flushWaitGate is open?
--
This message is automatically generated by JIRA.
For more information on JIRA, see: