[infinispan-dev] ISPN-83 (FLUSH removal) causing state transfer timeouts

Paul Ferraro paul.ferraro at redhat.com
Wed Feb 23 13:09:28 EST 2011


CR2 was unusable for me because of ISPN-934.

On Wed, 2011-02-23 at 11:54 -0500, Vladimir Blagojevic wrote:
> Paul,
> 
> How is the testsuite tolerating 4.2.1CR2? I removed FLUSH in CR2 release 
> [1].
> 
> Vladimir
> 
> [1] https://issues.jboss.org/browse/ISPN-83
> 
> 
> On 11-02-23 11:18 AM, Paul Ferraro wrote:
> > After testing AS6 with Infinispan 4.2.1.CR3, I started seeing a slew of
> > testsuite failures all due to state transfer timeouts.  These are
> > REPL_SYNC tests, with state transfer enabled:
> > e.g.
> >
> > 2011-02-23 10:59:46,771 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] Initiating state transfer process
> > 2011-02-23 10:59:46,771 INFO  [org.infinispan.remoting.rpc.RpcManagerImpl] Trying to fetch state from T510-29294
> > 2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] GET_STATE: asking T510-29294 for state, passing down a SUSPEND_STABLE event, timeout=10000
> > 2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STABLE] suspending message garbage collection
> > 2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STABLE] resume task started, max_suspend_time=11000
> > 2011-02-23 10:59:46,778 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] StateProviderThreadSpawner listening at 192.168.0.3:37173...
> > 2011-02-23 10:59:46,778 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Responding to state requester T510-23375 with address 192.168.0.3:37173 and digest T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
> > 2011-02-23 10:59:46,781 DEBUG [org.jgroups.protocols.pbcast.NAKACK]
> > [overwriteDigest()]
> > existing digest:  T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
> > new digest:       T510-29294: [0 : 1 (1)], T510-23375: [0 : 0 (0)]
> > resulting digest: T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
> > 2011-02-23 10:59:46,781 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Connecting to state provider /192.168.0.3:37173, original buffer size was 43690 and was reset to 8192
> > 2011-02-23 10:59:46,782 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Connected to state provider, my end of the socket is /192.168.0.3:38305 passing inputstream up...
> > 2011-02-23 10:59:46,783 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] StateProviderThreadSpawner listening at 192.168.0.3:37173...
> > 2011-02-23 10:59:46,783 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Accepted request for state transfer from /192.168.0.3:38305 handing of to PooledExecutor thread
> > 2011-02-23 10:59:46,786 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Running on Thread[STREAMING_STATE_TRANSFER-sender-1,T510-29294,5,JGroups]. Accepted request for state transfer from /192.168.0.3:38305, original buffer size was 25434 and was reset to 8192, passing outputstream up...
> > 2011-02-23 10:59:46,789 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] Applying state
> > 2011-02-23 10:59:46,789 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] Generating state.  Can provide? true
> > 2011-02-23 10:59:46,822 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] Writing 0 StoredEntries to stream
> > 2011-02-23 10:59:46,833 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] State generated, closing object stream
> > 2011-02-23 10:59:46,834 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] State writer is closing the socket
> > 2011-02-23 10:59:56,774 DEBUG [org.jgroups.protocols.pbcast.STABLE] resuming message garbage collection
> > 2011-02-23 10:59:56,834 WARN  [org.infinispan.remoting.rpc.RpcManagerImpl] Could not find available peer for state, backing off and retrying
> > 2011-02-23 10:59:56,837 ERROR [org.infinispan.remoting.transport.jgroups.JGroupsTransport] Caught while requesting or applying state
> > org.infinispan.statetransfer.StateTransferException: org.infinispan.util.concurrent.TimeoutException: Timed out after 10 seconds waiting for a response from T510-29294
> >          at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:332)
> >          at org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:199)
> >          at org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:582)
> >          at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:712)
> >          at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:772)
> >          at org.jgroups.JChannel.up(JChannel.java:1422)
> >          at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
> >          at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:525)
> >          at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:464)
> >          at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:225)
> >          at org.jgroups.protocols.FRAG2.up(FRAG2.java:190)
> >          at org.jgroups.protocols.FlowControl.up(FlowControl.java:419)
> >          at org.jgroups.protocols.FlowControl.up(FlowControl.java:401)
> >          at org.jgroups.protocols.pbcast.GMS.up(GMS.java:888)
> >          at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
> >          at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:615)
> >          at org.jgroups.protocols.UNICAST.up(UNICAST.java:295)
> >          at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:707)
> >          at org.jgroups.protocols.BARRIER.up(BARRIER.java:120)
> >          at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:169)
> >          at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
> >          at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
> >          at org.jgroups.protocols.Discovery.up(Discovery.java:292)
> >          at org.jgroups.protocols.PING.up(PING.java:67)
> >          at org.jgroups.protocols.TP.passMessageUp(TP.java:1093)
> >          at org.jgroups.protocols.TP.access$100(TP.java:56)
> >          at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1633)
> >          at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1615)
> >          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >          at java.lang.Thread.run(Thread.java:679)
> > Caused by: org.infinispan.util.concurrent.TimeoutException: Timed out after 10 seconds waiting for a response from T510-29294
> >          at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:267)
> >          at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:116)
> >          at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:394)
> >          at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:102)
> >          at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:126)
> >          at org.infinispan.statetransfer.StateTransferManagerImpl.mimicPartialFlushViaRPC(StateTransferManagerImpl.java:300)
> >          at org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:253)
> >          at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:321)
> >          ... 30 more
> >
> > The problematic tests use the default jgroups configuration
> > (jgroups-udp.xml from the infinispan-core jar).  If I override the
> > default jgroups config and add FLUSH, these tests pass.
> > Note that there is no cache activity on the 1st cache prior to the time
> > the 2nd node joins and starts its cache.
> > Does the cache need to be configured differently if FLUSH is not used?
> >
> > Thoughts?
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 




More information about the infinispan-dev mailing list