[infinispan-dev] ISPN-83 (FLUSH removal) causing state transfer timeouts
Vladimir Blagojevic
vblagoje at redhat.com
Wed Feb 23 11:54:34 EST 2011
Paul,
How is the testsuite tolerating 4.2.1CR2? I removed FLUSH in CR2 release
[1].
Vladimir
[1] https://issues.jboss.org/browse/ISPN-83
On 11-02-23 11:18 AM, Paul Ferraro wrote:
> After testing AS6 with Infinispan 4.2.1.CR3, I started seeing a slew of
> testsuite failures all due to state transfer timeouts. These are
> REPL_SYNC tests, with state transfer enabled:
> e.g.
>
> 2011-02-23 10:59:46,771 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] Initiating state transfer process
> 2011-02-23 10:59:46,771 INFO [org.infinispan.remoting.rpc.RpcManagerImpl] Trying to fetch state from T510-29294
> 2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] GET_STATE: asking T510-29294 for state, passing down a SUSPEND_STABLE event, timeout=10000
> 2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STABLE] suspending message garbage collection
> 2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STABLE] resume task started, max_suspend_time=11000
> 2011-02-23 10:59:46,778 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] StateProviderThreadSpawner listening at 192.168.0.3:37173...
> 2011-02-23 10:59:46,778 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Responding to state requester T510-23375 with address 192.168.0.3:37173 and digest T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
> 2011-02-23 10:59:46,781 DEBUG [org.jgroups.protocols.pbcast.NAKACK]
> [overwriteDigest()]
> existing digest: T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
> new digest: T510-29294: [0 : 1 (1)], T510-23375: [0 : 0 (0)]
> resulting digest: T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
> 2011-02-23 10:59:46,781 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Connecting to state provider /192.168.0.3:37173, original buffer size was 43690 and was reset to 8192
> 2011-02-23 10:59:46,782 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Connected to state provider, my end of the socket is /192.168.0.3:38305 passing inputstream up...
> 2011-02-23 10:59:46,783 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] StateProviderThreadSpawner listening at 192.168.0.3:37173...
> 2011-02-23 10:59:46,783 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Accepted request for state transfer from /192.168.0.3:38305 handing of to PooledExecutor thread
> 2011-02-23 10:59:46,786 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Running on Thread[STREAMING_STATE_TRANSFER-sender-1,T510-29294,5,JGroups]. Accepted request for state transfer from /192.168.0.3:38305, original buffer size was 25434 and was reset to 8192, passing outputstream up...
> 2011-02-23 10:59:46,789 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] Applying state
> 2011-02-23 10:59:46,789 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] Generating state. Can provide? true
> 2011-02-23 10:59:46,822 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] Writing 0 StoredEntries to stream
> 2011-02-23 10:59:46,833 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] State generated, closing object stream
> 2011-02-23 10:59:46,834 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] State writer is closing the socket
> 2011-02-23 10:59:56,774 DEBUG [org.jgroups.protocols.pbcast.STABLE] resuming message garbage collection
> 2011-02-23 10:59:56,834 WARN [org.infinispan.remoting.rpc.RpcManagerImpl] Could not find available peer for state, backing off and retrying
> 2011-02-23 10:59:56,837 ERROR [org.infinispan.remoting.transport.jgroups.JGroupsTransport] Caught while requesting or applying state
> org.infinispan.statetransfer.StateTransferException: org.infinispan.util.concurrent.TimeoutException: Timed out after 10 seconds waiting for a response from T510-29294
> at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:332)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:199)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:582)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:712)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:772)
> at org.jgroups.JChannel.up(JChannel.java:1422)
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
> at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:525)
> at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:464)
> at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:225)
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:190)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:419)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:401)
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:888)
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
> at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:615)
> at org.jgroups.protocols.UNICAST.up(UNICAST.java:295)
> at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:707)
> at org.jgroups.protocols.BARRIER.up(BARRIER.java:120)
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:169)
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
> at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
> at org.jgroups.protocols.Discovery.up(Discovery.java:292)
> at org.jgroups.protocols.PING.up(PING.java:67)
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1093)
> at org.jgroups.protocols.TP.access$100(TP.java:56)
> at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1633)
> at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1615)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:679)
> Caused by: org.infinispan.util.concurrent.TimeoutException: Timed out after 10 seconds waiting for a response from T510-29294
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:267)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:116)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:394)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:102)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:126)
> at org.infinispan.statetransfer.StateTransferManagerImpl.mimicPartialFlushViaRPC(StateTransferManagerImpl.java:300)
> at org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:253)
> at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:321)
> ... 30 more
>
> The problematic tests use the default jgroups configuration
> (jgroups-udp.xml from the infinispan-core jar). If I override the
> default jgroups config and add FLUSH, these tests pass.
> Note that there is no cache activity on the 1st cache prior to the time
> the 2nd node joins and starts its cache.
> Does the cache need to be configured differently if FLUSH is not used?
>
> Thoughts?
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
More information about the infinispan-dev
mailing list