[infinispan-dev] ISPN-83 (FLUSH removal) causing state transfer timeouts

Galder Zamarreño galder at redhat.com
Thu Feb 24 06:31:45 EST 2011


Indeed, Michal just confirmed that restoring FLUSH makes his test work fine.

Vladimir, I've renamed https://issues.jboss.org/browse/ISPN-937 and assigned to you, is that Ok?

Cheers,

On Feb 24, 2011, at 11:29 AM, Galder Zamarreño wrote:

> Interesting stuff, Michal had reported a very similar issue: https://issues.jboss.org/browse/ISPN-937
> 
> And I had thought it might be related to the rehashing changes that also changed some of the state transfer code.
> 
> I've asked Michal to try with a udp stack that contains FLUSH.
> 
> On Feb 23, 2011, at 7:09 PM, Paul Ferraro wrote:
> 
>> CR2 was unusable for me because of ISPN-934.
>> 
>> On Wed, 2011-02-23 at 11:54 -0500, Vladimir Blagojevic wrote:
>>> Paul,
>>> 
>>> How is the testsuite tolerating 4.2.1CR2? I removed FLUSH in CR2 release 
>>> [1].
>>> 
>>> Vladimir
>>> 
>>> [1] https://issues.jboss.org/browse/ISPN-83
>>> 
>>> 
>>> On 11-02-23 11:18 AM, Paul Ferraro wrote:
>>>> After testing AS6 with Infinispan 4.2.1.CR3, I started seeing a slew of
>>>> testsuite failures all due to state transfer timeouts.  These are
>>>> REPL_SYNC tests, with state transfer enabled:
>>>> e.g.
>>>> 
>>>> 2011-02-23 10:59:46,771 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] Initiating state transfer process
>>>> 2011-02-23 10:59:46,771 INFO  [org.infinispan.remoting.rpc.RpcManagerImpl] Trying to fetch state from T510-29294
>>>> 2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] GET_STATE: asking T510-29294 for state, passing down a SUSPEND_STABLE event, timeout=10000
>>>> 2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STABLE] suspending message garbage collection
>>>> 2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STABLE] resume task started, max_suspend_time=11000
>>>> 2011-02-23 10:59:46,778 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] StateProviderThreadSpawner listening at 192.168.0.3:37173...
>>>> 2011-02-23 10:59:46,778 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Responding to state requester T510-23375 with address 192.168.0.3:37173 and digest T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
>>>> 2011-02-23 10:59:46,781 DEBUG [org.jgroups.protocols.pbcast.NAKACK]
>>>> [overwriteDigest()]
>>>> existing digest:  T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
>>>> new digest:       T510-29294: [0 : 1 (1)], T510-23375: [0 : 0 (0)]
>>>> resulting digest: T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
>>>> 2011-02-23 10:59:46,781 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Connecting to state provider /192.168.0.3:37173, original buffer size was 43690 and was reset to 8192
>>>> 2011-02-23 10:59:46,782 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Connected to state provider, my end of the socket is /192.168.0.3:38305 passing inputstream up...
>>>> 2011-02-23 10:59:46,783 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] StateProviderThreadSpawner listening at 192.168.0.3:37173...
>>>> 2011-02-23 10:59:46,783 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Accepted request for state transfer from /192.168.0.3:38305 handing of to PooledExecutor thread
>>>> 2011-02-23 10:59:46,786 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Running on Thread[STREAMING_STATE_TRANSFER-sender-1,T510-29294,5,JGroups]. Accepted request for state transfer from /192.168.0.3:38305, original buffer size was 25434 and was reset to 8192, passing outputstream up...
>>>> 2011-02-23 10:59:46,789 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] Applying state
>>>> 2011-02-23 10:59:46,789 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] Generating state.  Can provide? true
>>>> 2011-02-23 10:59:46,822 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] Writing 0 StoredEntries to stream
>>>> 2011-02-23 10:59:46,833 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] State generated, closing object stream
>>>> 2011-02-23 10:59:46,834 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] State writer is closing the socket
>>>> 2011-02-23 10:59:56,774 DEBUG [org.jgroups.protocols.pbcast.STABLE] resuming message garbage collection
>>>> 2011-02-23 10:59:56,834 WARN  [org.infinispan.remoting.rpc.RpcManagerImpl] Could not find available peer for state, backing off and retrying
>>>> 2011-02-23 10:59:56,837 ERROR [org.infinispan.remoting.transport.jgroups.JGroupsTransport] Caught while requesting or applying state
>>>> org.infinispan.statetransfer.StateTransferException: org.infinispan.util.concurrent.TimeoutException: Timed out after 10 seconds waiting for a response from T510-29294
>>>>        at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:332)
>>>>        at org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:199)
>>>>        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:582)
>>>>        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:712)
>>>>        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:772)
>>>>        at org.jgroups.JChannel.up(JChannel.java:1422)
>>>>        at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
>>>>        at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:525)
>>>>        at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:464)
>>>>        at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:225)
>>>>        at org.jgroups.protocols.FRAG2.up(FRAG2.java:190)
>>>>        at org.jgroups.protocols.FlowControl.up(FlowControl.java:419)
>>>>        at org.jgroups.protocols.FlowControl.up(FlowControl.java:401)
>>>>        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:888)
>>>>        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
>>>>        at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:615)
>>>>        at org.jgroups.protocols.UNICAST.up(UNICAST.java:295)
>>>>        at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:707)
>>>>        at org.jgroups.protocols.BARRIER.up(BARRIER.java:120)
>>>>        at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:169)
>>>>        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
>>>>        at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
>>>>        at org.jgroups.protocols.Discovery.up(Discovery.java:292)
>>>>        at org.jgroups.protocols.PING.up(PING.java:67)
>>>>        at org.jgroups.protocols.TP.passMessageUp(TP.java:1093)
>>>>        at org.jgroups.protocols.TP.access$100(TP.java:56)
>>>>        at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1633)
>>>>        at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1615)
>>>>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>        at java.lang.Thread.run(Thread.java:679)
>>>> Caused by: org.infinispan.util.concurrent.TimeoutException: Timed out after 10 seconds waiting for a response from T510-29294
>>>>        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:267)
>>>>        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:116)
>>>>        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:394)
>>>>        at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:102)
>>>>        at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:126)
>>>>        at org.infinispan.statetransfer.StateTransferManagerImpl.mimicPartialFlushViaRPC(StateTransferManagerImpl.java:300)
>>>>        at org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:253)
>>>>        at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:321)
>>>>        ... 30 more
>>>> 
>>>> The problematic tests use the default jgroups configuration
>>>> (jgroups-udp.xml from the infinispan-core jar).  If I override the
>>>> default jgroups config and add FLUSH, these tests pass.
>>>> Note that there is no cache activity on the 1st cache prior to the time
>>>> the 2nd node joins and starts its cache.
>>>> Does the cache need to be configured differently if FLUSH is not used?
>>>> 
>>>> Thoughts?
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> --
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache




More information about the infinispan-dev mailing list