]
Dan Berindei updated ISPN-7064:
-------------------------------
Status: Pull Request Sent (was: Coding In Progress)
Git Pull Request:
RPC to leaver times out instead of finishing immediately
--------------------------------------------------------
Key: ISPN-7064
URL:
https://issues.jboss.org/browse/ISPN-7064
Project: Infinispan
Issue Type: Bug
Components: Core, Test Suite - Core
Affects Versions: 9.0.0.Alpha4, 8.2.4.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Critical
Labels: testsuite_stability
Fix For: 9.0.0.Beta1
This causes random failures in
{{ClusterTopologyManagerTest.testAbruptLeaveAfterGetStatus2}}. Node D leaves and nodes E
and F start a rebalance. Then node F also leaves, but {{DISCARD}} is enabled first, so the
{{CacheTopologyControlCommand(LEAVE)}} synchronous RPC blocks until node F installs a view
by itself.
{noformat}
14:05:17,187 INFO (Incoming-1,Test-NodeF-12732:[]) [JGroupsTransport] ISPN000094:
Received new cluster view for channel ISPN: [Test-NodeE-18953|3] (2) [Test-NodeE-18953,
Test-NodeF-12732]
14:05:17,261 TRACE (testng-Test:[]) [JGroupsTransport] dests=[Test-NodeE-18953],
command=CacheTopologyControlCommand{cache=testCache, type=LEAVE, sender=Test-NodeF-12732,
joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null,
availabilityMode=null, actualMembers=null, throwable=null, viewId=3}, mode=SYNCHRONOUS,
timeout=240000
14:05:22,546 DEBUG (Thread-88606:[]) [GMS] Test-NodeF-12732: installing view
[Test-NodeF-12732|4] (1) [Test-NodeF-12732]
14:05:22,589 DEBUG (testng-Test:[]) [LocalTopologyManagerImpl] Error sending the leave
request for cache testCache to coordinator
{noformat}
Node F then cancels its inbound state transfers by sending a synchronous
{{StateRequestCommand(CANCEL_STATE_TRANSFER)}} RPC to node F. This RPC sometimes hits
JGRP-2103 and it only times out after 5 minutes, causing the test to fail.
{noformat}
14:05:22,608 TRACE (testng-Test:[]) [JGroupsTransport] dests=[Test-NodeE-18953],
command=StateRequestCommand{cache=testCache, origin=Test-NodeF-12732,
type=CANCEL_STATE_TRANSFER, topologyId=7, segments=[130, 3, ...]},
mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=240000
14:05:22,608 TRACE (testng-Test:[]) [RequestCorrelator] Test-NodeF-12732: invoking
multicast RPC [req-id=134161]
14:05:22,609 DEBUG (Thread-88606:[]) [JGroupsTransport] New view accepted:
[Test-NodeF-12732|4] (1) [Test-NodeF-12732]
14:09:22,608 TRACE (timeout-thread-Test-NodeF-p48767-t1:[]) [JGroupsTransport] Responses:
Responses{
Test-NodeE-18953: sender=Test-NodeE-18953, received=false, suspected=true}
14:09:22,608 TRACE (timeout-thread-Test-NodeF-p48767-t1:[]) [RpcManagerImpl] Replication
exception
org.infinispan.util.concurrent.TimeoutException: Replication timeout
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$2(JGroupsTransport.java:659)
~[classes/:?]
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
~[?:1.8.0_101]
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
~[?:1.8.0_101]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
[?:1.8.0_101]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
[?:1.8.0_101]
at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:47)
[classes/:?]
at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:15)
[classes/:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_101]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[?:1.8.0_101]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[?:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[?:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[?:1.8.0_101]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
Suppressed: org.infinispan.remoting.RpcException: Test-NodeE-18953 was suspected
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.addSuppressedExceptions(JGroupsTransport.java:707)
~[classes/:?]
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$2(JGroupsTransport.java:659)
~[classes/:?]
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
~[?:1.8.0_101]
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
~[?:1.8.0_101]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
[?:1.8.0_101]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
[?:1.8.0_101]
at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:47)
[classes/:?]
at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:15)
[classes/:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_101]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[?:1.8.0_101]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[?:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[?:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[?:1.8.0_101]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
14:09:22,609 DEBUG (testng-Test:[]) [InboundTransferTask] Caught an exception while
cancelling state transfer for segments [130, 3, ...] from Test-NodeE-18953
{noformat}