]
Sebastian Łaskawiec closed ISPN-6399.
-------------------------------------
Timeout updating the JGroups view after killing one node
--------------------------------------------------------
Key: ISPN-6399
URL:
https://issues.jboss.org/browse/ISPN-6399
Project: Infinispan
Issue Type: Bug
Components: Test Suite - Core
Affects Versions: 8.2.0.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 8.2.1.Final, 9.0.0.Alpha1, 9.0.0.Final
{{GMS}} can sometimes delay the processing of a join/leave request because of JGRP-2028.
Joiners retry automatically after {{GMS.join_timeout}}, so it's not that bad.
Leavers, however, don't resend their leave requests, so the delay can be worse.
Normally, the FD/FD_ALL/FD_SOCK protocols would wake up the ViewHandler thread. But we
remove the FD* protocols from the stack in most of our tests, unless the test uses
{{DISCARD}}. That means the leave request can be delayed until another node leaves:
{noformat}
16:35:56,247 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeB-8309: sending
LEAVE request to NodeA-45395
16:35:56,268 TRACE (OOB-1,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst:
NodeA-45395, src: NodeB-8309 (3 headers), size=0 bytes, flags=OOB], headers are GMS:
GmsHeader[LEAVE_REQ]: mbr=NodeB-8309, UNICAST3: DATA, seqno=22, TP: [cluster_name=ISPN]
16:35:56,268 TRACE (OOB-1,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeB-8309#22
16:36:07,263 ERROR (testng-ClusterListenerDistAddListenerTest:) [UnitTestTestNGListener]
Test
testMemberJoinsAndRetrievesClusterListenersButMainListenerNodeDiesBeforeInstalled(org.infinispan.notifications.cachelistener.cluster.ClusterListenerDistAddListenerTest)
failed.
org.infinispan.util.concurrent.TimeoutException: Timed out before caches had complete
views. Expected 3 members in each view. Views are as follows: [[NodeA-45395|3] (4)
[NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395,
NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309,
NodeC-53222, NodeD-55165]]
16:37:07,341 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeD-55165:
sending LEAVE request to NodeA-45395
16:37:07,361 TRACE (OOB-4,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst:
NodeA-45395, src: NodeD-55165 (3 headers), size=0 bytes, flags=OOB], headers are GMS:
GmsHeader[LEAVE_REQ]: mbr=NodeD-55165, UNICAST3: DATA, seqno=21, TP: [cluster_name=ISPN]
16:37:07,361 TRACE (OOB-4,NodeA-45395:) [UNICAST3] NodeA-45395: delivering
NodeD-55165#21
16:37:07,361 TRACE (ViewHandler,NodeA-45395:) [GMS] NodeA-45395: joiners=[],
suspected=[], leaving=[NodeB-8309], new view: [NodeA-45395|4] (3) [NodeA-45395,
NodeC-53222, NodeD-55165]
{noformat}
{{FD_ALL}} is pretty cheap: it just sends a message every second, without opening any new
sockets. So I think we should enable it by default, and only enable {{FD_SOCK}} with
{{TransportFlags.withFD(true)}}.