]
Ryan Emerson resolved ISPN-10343.
---------------------------------
Resolution: Done
LocalCacheStateTransferTest random failures
-------------------------------------------
Key: ISPN-10343
URL:
https://issues.jboss.org/browse/ISPN-10343
Project: Infinispan
Issue Type: Bug
Components: Test Suite - Core
Affects Versions: 10.0.0.Beta3
Reporter: Dan Berindei
Assignee: Pedro Ruivo
Priority: Major
Labels: testsuite_stability
Fix For: 10.0.0.CR3
Attachments:
master_20190622-0130_LocalCacheStateTransferTest-infinispan-core.log.gz,
master_20190622-0130_threaddump-org_infinispan_xsite_statetransfer_LocalCacheStateTransferTest_testStateTransferWithClusterIdle-2019-06-22-28963.log
NodeA starts xsite state transfer before the bridge cluster view is updated, and the push
start command is dropped without reaching NodeB. Then NodeA sends a cancel command which
does reach NodeB, but before NodeB updates its bridge cluster view, so the response is
dropped, and NodeA waits for the response for 20 mins (if the JVM wasn't killed).
{noformat}
01:40:54,271 INFO (testng-Test:[]) [TestSuiteProgress] Test starting:
org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.testStateTransferWithClusterIdle
01:40:54,274 INFO (testng-Test:[]) [CLUSTER] [Context=Test][Context=Test-NodeA-48836]
ISPN100005: Site 'NYC-2' is online.
01:40:54,277 TRACE (testng-Test:[]) [JGroupsTransport] Test-NodeA-48836 sending backup
request 2 to SiteMaster(NYC-2): XSiteStateTransferControlCommand{control=START_RECEIVE,
siteName='null', statusOk=false, cacheName='Test'}
01:40:54,277 ERROR (testng-Test:[]) [TEST_RELAY2] Test-NodeA-48836: no route to NYC-2:
dropping message
01:40:54,313 TRACE
(jgroups-5,bridge-org.infinispan.xsite.statetransfer.Test,_Test-NodeA-48836:LON-1:[])
[TEST_RELAY2] [Relayer _Test-NodeA-48836:LON-1] view: [_Test-NodeA-48836:LON-1|1] (2)
[_Test-NodeA-48836:LON-1, _Test-NodeB-37463:NYC-2]
01:40:54,313 TRACE
(jgroups-5,bridge-org.infinispan.xsite.statetransfer.Test,_Test-NodeA-48836:LON-1:[])
[JGroupsTransport] Sites view changed: up [NYC-2], down [], new view is [NYC-2, LON-1]
01:40:54,347 TRACE (testng-Test:[]) [JGroupsBackupResponse] Communication error with site
NYC-2
org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node null was
suspected
at
org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:34)
~[classes/:?]
at
org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
~[classes/:?]
at
org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
~[classes/:?]
at
org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
~[classes/:?]
at
org.infinispan.remoting.transport.jgroups.SingleSiteRequest.receiveResponse(SingleSiteRequest.java:50)
~[classes/:?]
at
org.infinispan.remoting.transport.jgroups.SingleSiteRequest.sitesUnreachable(SingleSiteRequest.java:68)
~[classes/:?]
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$siteUnreachable$7(JGroupsTransport.java:1229)
~[classes/:?]
at
org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
~[classes/:?]
at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1603) ~[?:?]
at
org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
~[classes/:?]
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.siteUnreachable(JGroupsTransport.java:1227)
~[classes/:?]
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:130)
~[classes/:?]
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1446)
~[classes/:?]
at org.jgroups.JChannel.up(JChannel.java:756) ~[jgroups-4.1.1.Final.jar:4.1.1.Final]
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:914)
~[jgroups-4.1.1.Final.jar:4.1.1.Final]
at org.jgroups.protocols.relay.RELAY2.handleMessage(RELAY2.java:533)
~[jgroups-4.1.1.Final.jar:4.1.1.Final]
Suppressed: org.infinispan.util.logging.TraceException
at
org.infinispan.remoting.transport.jgroups.JGroupsBackupResponse.waitForBackupToFinish(JGroupsBackupResponse.java:93)
[classes/:?]
at
org.infinispan.remoting.transport.RetryOnFailureXSiteCommand.execute(RetryOnFailureXSiteCommand.java:64)
[classes/:?]
at
org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.controlStateTransferOnRemoteSite(XSiteStateTransferManagerImpl.java:343)
[classes/:?]
at
org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.startPushState(XSiteStateTransferManagerImpl.java:136)
[classes/:?]
at org.infinispan.xsite.XSiteAdminOperations.pushState(XSiteAdminOperations.java:276)
[classes/:?]
at
org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.startStateTransfer(LocalCacheStateTransferTest.java:99)
[test-classes/:?]
at
org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.testStateTransferWithClusterIdle(LocalCacheStateTransferTest.java:53)
[test-classes/:?]
...
01:40:54,348 TRACE (testng-Test:[]) [JGroupsTransport] Test-NodeA-48836 sending backup
request 4 to SiteMaster(NYC-2): XSiteStateTransferControlCommand{control=FINISH_RECEIVE,
siteName='null', statusOk=false, cacheName='Test'}
01:40:54,348 TRACE (testng-Test:[]) [TEST_RELAY2] routing message to SiteMaster(NYC-2)
via _Test-NodeB-37463:NYC-2
01:40:54,349 DEBUG (remote-thread-Test-NodeB-p37359-t2:[]) [XSiteStateConsumerImpl]
Ending state transfer from LON-1
01:40:54,349 TRACE (remote-thread-Test-NodeB-p37359-t2:[]) [JGroupsTransport]
Test-NodeB-37463 sending response for request 4 to Test-NodeA-48836:LON-1:
SuccessfulResponse(null)
01:40:54,349 ERROR (remote-thread-Test-NodeB-p37359-t2:[]) [TEST_RELAY2]
Test-NodeB-37463: no route to LON-1: dropping message
01:40:54,350 TRACE (jgroups-6,Test-NodeB-37463:[]) [TEST_RELAY2] [Relayer
_Test-NodeB-37463:NYC-2] view: [_Test-NodeA-48836:LON-1|1] (2) [_Test-NodeA-48836:LON-1,
_Test-NodeB-37463:NYC-2]
01:40:54,350 TRACE (jgroups-6,Test-NodeB-37463:[]) [JGroupsTransport] Sites view changed:
up [NYC-2, LON-1], down [], new view is [NYC-2, LON-1]
... 5 mins later ...
[ERROR] Test
org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.testStateTransferWithClusterIdle
has been running for more than 300 seconds. Interrupting the test thread and dumping
threads of the test suite process and its children.
"testng-LocalCacheStateTransferTest" #17 prio=5 os_prio=0 cpu=26949.68ms
elapsed=898.86s tid=0x00007f527d399800 nid=0x7147 waiting on condition
[0x00007f5203cfb000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base(a)11.0.3/Native Method)
- parking to wait for <0x00000000c8300010> (a
java.util.concurrent.CompletableFuture$Signaller)
at
java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.3/LockSupport.java:234)
at
java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.3/CompletableFuture.java:1798)
at
java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.3/ForkJoinPool.java:3128)
at
java.util.concurrent.CompletableFuture.timedGet(java.base@11.0.3/CompletableFuture.java:1868)
at
java.util.concurrent.CompletableFuture.get(java.base@11.0.3/CompletableFuture.java:2021)
at
org.infinispan.remoting.transport.jgroups.JGroupsBackupResponse.waitForBackupToFinish(JGroupsBackupResponse.java:87)
at
org.infinispan.remoting.transport.RetryOnFailureXSiteCommand.execute(RetryOnFailureXSiteCommand.java:64)
at
org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.controlStateTransferOnRemoteSite(XSiteStateTransferManagerImpl.java:343)
at
org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.handleFailure(XSiteStateTransferManagerImpl.java:328)
at
org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.startPushState(XSiteStateTransferManagerImpl.java:147)
at org.infinispan.xsite.XSiteAdminOperations.pushState(XSiteAdminOperations.java:276)
at
org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.startStateTransfer(LocalCacheStateTransferTest.java:99)
at
org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.testStateTransferWithClusterIdle(LocalCacheStateTransferTest.java:53)
{noformat}