Paul,
How is the testsuite tolerating 4.2.1CR2? I removed FLUSH in CR2 release
[1].
Vladimir
[1]
After testing AS6 with Infinispan 4.2.1.CR3, I started seeing a slew
of
testsuite failures all due to state transfer timeouts. These are
REPL_SYNC tests, with state transfer enabled:
e.g.
2011-02-23 10:59:46,771 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl]
Initiating state transfer process
2011-02-23 10:59:46,771 INFO [org.infinispan.remoting.rpc.RpcManagerImpl] Trying to
fetch state from T510-29294
2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER]
GET_STATE: asking T510-29294 for state, passing down a SUSPEND_STABLE event,
timeout=10000
2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STABLE] suspending message
garbage collection
2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STABLE] resume task started,
max_suspend_time=11000
2011-02-23 10:59:46,778 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER]
StateProviderThreadSpawner listening at 192.168.0.3:37173...
2011-02-23 10:59:46,778 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER]
Responding to state requester T510-23375 with address 192.168.0.3:37173 and digest
T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
2011-02-23 10:59:46,781 DEBUG [org.jgroups.protocols.pbcast.NAKACK]
[overwriteDigest()]
existing digest: T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
new digest: T510-29294: [0 : 1 (1)], T510-23375: [0 : 0 (0)]
resulting digest: T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
2011-02-23 10:59:46,781 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER]
Connecting to state provider /192.168.0.3:37173, original buffer size was 43690 and was
reset to 8192
2011-02-23 10:59:46,782 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER]
Connected to state provider, my end of the socket is /192.168.0.3:38305 passing
inputstream up...
2011-02-23 10:59:46,783 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER]
StateProviderThreadSpawner listening at 192.168.0.3:37173...
2011-02-23 10:59:46,783 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER]
Accepted request for state transfer from /192.168.0.3:38305 handing of to PooledExecutor
thread
2011-02-23 10:59:46,786 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER]
Running on Thread[STREAMING_STATE_TRANSFER-sender-1,T510-29294,5,JGroups]. Accepted
request for state transfer from /192.168.0.3:38305, original buffer size was 25434 and was
reset to 8192, passing outputstream up...
2011-02-23 10:59:46,789 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl]
Applying state
2011-02-23 10:59:46,789 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl]
Generating state. Can provide? true
2011-02-23 10:59:46,822 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl]
Writing 0 StoredEntries to stream
2011-02-23 10:59:46,833 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl]
State generated, closing object stream
2011-02-23 10:59:46,834 DEBUG [org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER]
State writer is closing the socket
2011-02-23 10:59:56,774 DEBUG [org.jgroups.protocols.pbcast.STABLE] resuming message
garbage collection
2011-02-23 10:59:56,834 WARN [org.infinispan.remoting.rpc.RpcManagerImpl] Could not find
available peer for state, backing off and retrying
2011-02-23 10:59:56,837 ERROR
[org.infinispan.remoting.transport.jgroups.JGroupsTransport] Caught while requesting or
applying state
org.infinispan.statetransfer.StateTransferException:
org.infinispan.util.concurrent.TimeoutException: Timed out after 10 seconds waiting for a
response from T510-29294
at
org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:332)
at
org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:199)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:582)
at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:712)
at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:772)
at org.jgroups.JChannel.up(JChannel.java:1422)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:525)
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:464)
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:225)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:190)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:419)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:401)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:888)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:615)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:295)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:707)
at org.jgroups.protocols.BARRIER.up(BARRIER.java:120)
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:169)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
at org.jgroups.protocols.Discovery.up(Discovery.java:292)
at org.jgroups.protocols.PING.up(PING.java:67)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1093)
at org.jgroups.protocols.TP.access$100(TP.java:56)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1633)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1615)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.infinispan.util.concurrent.TimeoutException: Timed out after 10 seconds
waiting for a response from T510-29294
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:267)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:116)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:394)
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:102)
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:126)
at
org.infinispan.statetransfer.StateTransferManagerImpl.mimicPartialFlushViaRPC(StateTransferManagerImpl.java:300)
at
org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:253)
at
org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:321)
... 30 more
The problematic tests use the default jgroups configuration
(jgroups-udp.xml from the infinispan-core jar). If I override the
default jgroups config and add FLUSH, these tests pass.
Note that there is no cache activity on the 1st cache prior to the time
the 2nd node joins and starts its cache.
Does the cache need to be configured differently if FLUSH is not used?
Thoughts?
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev