[infinispan-issues] [JBoss JIRA] (ISPN-9801) ClusterTopologyManagerImpl hangs when restarting a node with FORK

Fri Nov 22 04:59:24 EST 2019

     [ https://issues.jboss.org/browse/ISPN-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tristan Tarrant updated ISPN-9801:
----------------------------------
    Sprint: Sprint 10.0.0.Alpha2, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32, DataGrid Sprint #33, DataGrid Sprint #34, DataGrid Sprint #35, DataGrid Sprint #36, DataGrid Sprint #37  (was: Sprint 10.0.0.Alpha2, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32, DataGrid Sprint #33, DataGrid Sprint #34, DataGrid Sprint #35, DataGrid Sprint #36)


> ClusterTopologyManagerImpl hangs when restarting a node with FORK
> -----------------------------------------------------------------
>
>                 Key: ISPN-9801
>                 URL: https://issues.jboss.org/browse/ISPN-9801
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 10.0.0.Alpha1, 9.4.3.Final
>            Reporter: dan.berindei
>            Assignee: dan.berindei
>            Priority: Major
>             Fix For: 9.4.7.Final, 10.0.0.Beta1
>
>
> When a server is restarted with `kill -9` or similar, both the old node and the new one can be in the JGroups view for a while. Normally this shouldn't be a problem, but sometimes the new node doesn't receive the {{HeartBeatCommand}} and the coordinator cannot process any new view updates.
> {noformat}
> 14:29:19,981 INFO  (jgroups-12,Test-NodeA:[]) [CLUSTER] ISPN000094: Received new cluster view for channel FORKISPN: [Test-NodeA|4] (5) [Test-NodeA, Test-NodeB, Test-NodeC, Test-NodeD, Test-NodeE]
> 14:29:19,982 TRACE (transport-thread-Test-NodeA-p4-t14:[ViewHandling]) [ClusterTopologyManagerImpl] Updating cluster members for all the caches. New list is [Test-NodeA, Test-NodeB, Test-NodeC, Test-NodeD, Test-NodeE]
> 14:29:19,982 TRACE (transport-thread-Test-NodeA-p4-t14:[ViewHandling]) [JGroupsTransport] Test-NodeA sending request 9 to all: org.infinispan.topology.HeartBeatCommand at 1163beb6
> 14:29:19,986 TRACE (jgroups-6,Test-NodeA:[]) [JGroupsTransport] Test-NodeA received response for request 9 from Test-NodeC: SuccessfulResponse(null)
> 14:29:19,987 TRACE (jgroups-9,Test-NodeA:[]) [JGroupsTransport] Test-NodeA received response for request 9 from Test-NodeD: SuccessfulResponse(null)
> 14:29:20,032 TRACE (jgroups-6,Test-NodeE:[]) [TCP_NIO2] Test-NodeE: received message batch of 1 messages from Test-NodeA
> 14:29:20,032 TRACE (jgroups-6,Test-NodeE:[]) [NAKACK2] Test-NodeE: message Test-NodeA::39 was added to queue (not yet server)
> 14:29:20,054 TRACE (jgroups-6,Test-NodeE:[]) [NAKACK2] Test-NodeE: received Test-NodeA#38
> 14:29:20,054 TRACE (jgroups-6,Test-NodeE:[]) [NAKACK2] Test-NodeE: delivering Test-NodeA#38
> # not actually delivered :)
> 14:29:20,054 TRACE (jgroups-6,Test-NodeE:[]) [MFC] Test-NodeA used 5 credits, 1999995 remaining
> 14:29:20,149 INFO  (ForkThread-1,ForkChannelRestartTest:[]) [CLUSTER] ISPN000094: Received new cluster view for channel FORKISPN: [Test-NodeA|4] (5) [Test-NodeA, Test-NodeB, Test-NodeC, Test-NodeD, Test-NodeE]
> 14:29:21,119 DEBUG (testng-Test-1:[]) [ForkChannelRestartTest] Stopping channel Test-NodeB
> 14:29:23,319 INFO  (VERIFY_SUSPECT.TimerThread-32,Test-NodeA:[]) [CLUSTER] ISPN000094: Received new cluster view for channel FORKISPN: [Test-NodeA|5] (4) [Test-NodeA, Test-NodeC, Test-NodeD, Test-NodeE]
> 14:29:23,320 TRACE (remote-thread-Test-NodeA-p2-t1:[]) [MultiTargetRequest] Target Test-NodeB of request 9 left the cluster view
> {noformat}
> So far, it looks like it's a JGroups bug similar to JGRP-2294, but we need to investigate further.


--
This message was sent by Atlassian Jira
(v7.13.8#713008)