Indeed, Michal just confirmed that restoring FLUSH makes his test work fine.
Vladimir, I've renamed
and assigned to you,
is that Ok?
Cheers,
On Feb 24, 2011, at 11:29 AM, Galder Zamarreño wrote:
Interesting stuff, Michal had reported a very similar issue:
https://issues.jboss.org/browse/ISPN-937
And I had thought it might be related to the rehashing changes that also changed some of
the state transfer code.
I've asked Michal to try with a udp stack that contains FLUSH.
On Feb 23, 2011, at 7:09 PM, Paul Ferraro wrote:
> CR2 was unusable for me because of ISPN-934.
>
> On Wed, 2011-02-23 at 11:54 -0500, Vladimir Blagojevic wrote:
>> Paul,
>>
>> How is the testsuite tolerating 4.2.1CR2? I removed FLUSH in CR2 release
>> [1].
>>
>> Vladimir
>>
>> [1]
https://issues.jboss.org/browse/ISPN-83
>>
>>
>> On 11-02-23 11:18 AM, Paul Ferraro wrote:
>>> After testing AS6 with Infinispan 4.2.1.CR3, I started seeing a slew of
>>> testsuite failures all due to state transfer timeouts. These are
>>> REPL_SYNC tests, with state transfer enabled:
>>> e.g.
>>>
>>> 2011-02-23 10:59:46,771 DEBUG
[org.infinispan.statetransfer.StateTransferManagerImpl] Initiating state transfer process
>>> 2011-02-23 10:59:46,771 INFO [org.infinispan.remoting.rpc.RpcManagerImpl]
Trying to fetch state from T510-29294
>>> 2011-02-23 10:59:46,773 DEBUG
[org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] GET_STATE: asking T510-29294 for
state, passing down a SUSPEND_STABLE event, timeout=10000
>>> 2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STABLE]
suspending message garbage collection
>>> 2011-02-23 10:59:46,773 DEBUG [org.jgroups.protocols.pbcast.STABLE] resume
task started, max_suspend_time=11000
>>> 2011-02-23 10:59:46,778 DEBUG
[org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] StateProviderThreadSpawner
listening at 192.168.0.3:37173...
>>> 2011-02-23 10:59:46,778 DEBUG
[org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Responding to state requester
T510-23375 with address 192.168.0.3:37173 and digest T510-23375: [0 : 0 (0)], T510-29294:
[0 : 1 (1)]
>>> 2011-02-23 10:59:46,781 DEBUG [org.jgroups.protocols.pbcast.NAKACK]
>>> [overwriteDigest()]
>>> existing digest: T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
>>> new digest: T510-29294: [0 : 1 (1)], T510-23375: [0 : 0 (0)]
>>> resulting digest: T510-23375: [0 : 0 (0)], T510-29294: [0 : 1 (1)]
>>> 2011-02-23 10:59:46,781 DEBUG
[org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Connecting to state provider
/192.168.0.3:37173, original buffer size was 43690 and was reset to 8192
>>> 2011-02-23 10:59:46,782 DEBUG
[org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Connected to state provider, my
end of the socket is /192.168.0.3:38305 passing inputstream up...
>>> 2011-02-23 10:59:46,783 DEBUG
[org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] StateProviderThreadSpawner
listening at 192.168.0.3:37173...
>>> 2011-02-23 10:59:46,783 DEBUG
[org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Accepted request for state
transfer from /192.168.0.3:38305 handing of to PooledExecutor thread
>>> 2011-02-23 10:59:46,786 DEBUG
[org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] Running on
Thread[STREAMING_STATE_TRANSFER-sender-1,T510-29294,5,JGroups]. Accepted request for state
transfer from /192.168.0.3:38305, original buffer size was 25434 and was reset to 8192,
passing outputstream up...
>>> 2011-02-23 10:59:46,789 DEBUG
[org.infinispan.statetransfer.StateTransferManagerImpl] Applying state
>>> 2011-02-23 10:59:46,789 DEBUG
[org.infinispan.statetransfer.StateTransferManagerImpl] Generating state. Can provide?
true
>>> 2011-02-23 10:59:46,822 DEBUG
[org.infinispan.statetransfer.StateTransferManagerImpl] Writing 0 StoredEntries to stream
>>> 2011-02-23 10:59:46,833 DEBUG
[org.infinispan.statetransfer.StateTransferManagerImpl] State generated, closing object
stream
>>> 2011-02-23 10:59:46,834 DEBUG
[org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER] State writer is closing the
socket
>>> 2011-02-23 10:59:56,774 DEBUG [org.jgroups.protocols.pbcast.STABLE] resuming
message garbage collection
>>> 2011-02-23 10:59:56,834 WARN [org.infinispan.remoting.rpc.RpcManagerImpl]
Could not find available peer for state, backing off and retrying
>>> 2011-02-23 10:59:56,837 ERROR
[org.infinispan.remoting.transport.jgroups.JGroupsTransport] Caught while requesting or
applying state
>>> org.infinispan.statetransfer.StateTransferException:
org.infinispan.util.concurrent.TimeoutException: Timed out after 10 seconds waiting for a
response from T510-29294
>>> at
org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:332)
>>> at
org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:199)
>>> at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:582)
>>> at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:712)
>>> at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:772)
>>> at org.jgroups.JChannel.up(JChannel.java:1422)
>>> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
>>> at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:525)
>>> at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:464)
>>> at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:225)
>>> at org.jgroups.protocols.FRAG2.up(FRAG2.java:190)
>>> at org.jgroups.protocols.FlowControl.up(FlowControl.java:419)
>>> at org.jgroups.protocols.FlowControl.up(FlowControl.java:401)
>>> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:888)
>>> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
>>> at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:615)
>>> at org.jgroups.protocols.UNICAST.up(UNICAST.java:295)
>>> at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:707)
>>> at org.jgroups.protocols.BARRIER.up(BARRIER.java:120)
>>> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:169)
>>> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
>>> at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
>>> at org.jgroups.protocols.Discovery.up(Discovery.java:292)
>>> at org.jgroups.protocols.PING.up(PING.java:67)
>>> at org.jgroups.protocols.TP.passMessageUp(TP.java:1093)
>>> at org.jgroups.protocols.TP.access$100(TP.java:56)
>>> at
org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1633)
>>> at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1615)
>>> at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> at java.lang.Thread.run(Thread.java:679)
>>> Caused by: org.infinispan.util.concurrent.TimeoutException: Timed out after
10 seconds waiting for a response from T510-29294
>>> at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:267)
>>> at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:116)
>>> at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:394)
>>> at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:102)
>>> at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:126)
>>> at
org.infinispan.statetransfer.StateTransferManagerImpl.mimicPartialFlushViaRPC(StateTransferManagerImpl.java:300)
>>> at
org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:253)
>>> at
org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:321)
>>> ... 30 more
>>>
>>> The problematic tests use the default jgroups configuration
>>> (jgroups-udp.xml from the infinispan-core jar). If I override the
>>> default jgroups config and add FLUSH, these tests pass.
>>> Note that there is no cache activity on the 1st cache prior to the time
>>> the 2nd node joins and starts its cache.
>>> Does the cache need to be configured differently if FLUSH is not used?
>>>
>>> Thoughts?
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache