[infinispan-issues] [JBoss JIRA] (ISPN-9801) ClusterTopologyManagerImpl hangs when restarting a node with FORK

Sunday, 27 January 2019

     [
https://issues.jboss.org/browse/ISPN-9801?page=com.atlassian.jira.plugin....
]

Tristan Tarrant updated ISPN-9801:
----------------------------------
    Fix Version/s: 10.0.0.Beta1
                       (was: 10.0.0.Alpha3)

...
 ClusterTopologyManagerImpl hangs when restarting a node with FORK
 -----------------------------------------------------------------

                 Key: ISPN-9801
                 URL: https://issues.jboss.org/browse/ISPN-9801
             Project: Infinispan
          Issue Type: Bug
          Components: Core
    Affects Versions: 10.0.0.Alpha1, 9.4.3.Final
            Reporter: Dan Berindei
            Assignee: Dan Berindei
            Priority: Major
             Fix For: 10.0.0.Beta1, 9.4.6.Final

 When a server is restarted with `kill -9` or similar, both the old node and the new one
can be in the JGroups view for a while. Normally this shouldn't be a problem, but
sometimes the new node doesn't receive the {{HeartBeatCommand}} and the coordinator
cannot process any new view updates.
 {noformat}
 14:29:19,981 INFO  (jgroups-12,Test-NodeA:[]) [CLUSTER] ISPN000094: Received new cluster
view for channel FORKISPN: [Test-NodeA|4] (5) [Test-NodeA, Test-NodeB, Test-NodeC,
Test-NodeD, Test-NodeE]
 14:29:19,982 TRACE (transport-thread-Test-NodeA-p4-t14:[ViewHandling])
[ClusterTopologyManagerImpl] Updating cluster members for all the caches. New list is
[Test-NodeA, Test-NodeB, Test-NodeC, Test-NodeD, Test-NodeE]
 14:29:19,982 TRACE (transport-thread-Test-NodeA-p4-t14:[ViewHandling]) [JGroupsTransport]
Test-NodeA sending request 9 to all: org.infinispan.topology.HeartBeatCommand@1163beb6
 14:29:19,986 TRACE (jgroups-6,Test-NodeA:[]) [JGroupsTransport] Test-NodeA received
response for request 9 from Test-NodeC: SuccessfulResponse(null)
 14:29:19,987 TRACE (jgroups-9,Test-NodeA:[]) [JGroupsTransport] Test-NodeA received
response for request 9 from Test-NodeD: SuccessfulResponse(null)
 14:29:20,032 TRACE (jgroups-6,Test-NodeE:[]) [TCP_NIO2] Test-NodeE: received message
batch of 1 messages from Test-NodeA
 14:29:20,032 TRACE (jgroups-6,Test-NodeE:[]) [NAKACK2] Test-NodeE: message Test-NodeA::39
was added to queue (not yet server)
 14:29:20,054 TRACE (jgroups-6,Test-NodeE:[]) [NAKACK2] Test-NodeE: received
Test-NodeA#38
 14:29:20,054 TRACE (jgroups-6,Test-NodeE:[]) [NAKACK2] Test-NodeE: delivering
Test-NodeA#38
 # not actually delivered :)
 14:29:20,054 TRACE (jgroups-6,Test-NodeE:[]) [MFC] Test-NodeA used 5 credits, 1999995
remaining
 14:29:20,149 INFO  (ForkThread-1,ForkChannelRestartTest:[]) [CLUSTER] ISPN000094:
Received new cluster view for channel FORKISPN: [Test-NodeA|4] (5) [Test-NodeA,
Test-NodeB, Test-NodeC, Test-NodeD, Test-NodeE]
 14:29:21,119 DEBUG (testng-Test-1:[]) [ForkChannelRestartTest] Stopping channel
Test-NodeB
 14:29:23,319 INFO  (VERIFY_SUSPECT.TimerThread-32,Test-NodeA:[]) [CLUSTER] ISPN000094:
Received new cluster view for channel FORKISPN: [Test-NodeA|5] (4) [Test-NodeA,
Test-NodeC, Test-NodeD, Test-NodeE]
 14:29:23,320 TRACE (remote-thread-Test-NodeA-p2-t1:[]) [MultiTargetRequest] Target
Test-NodeB of request 9 left the cluster view
 {noformat}
 So far, it looks like it's a JGroups bug similar to JGRP-2294, but we need to
investigate further. 

--
This message was sent by Atlassian Jira
(v7.12.1#712002)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] (ISPN-9801) ClusterTopologyManagerImpl hangs when restarting a node with FORK