[
https://issues.redhat.com/browse/JGRP-2435?page=com.atlassian.jira.plugin...
]
Dan Berindei commented on JGRP-2435:
------------------------------------
Yes, it's the problem I described in the delta view thread. After looking closer at
the logs I realized I have some the delta-view failure as well:
{noformat}
14:52:04,511 TRACE (mcast-receiver-3,NodeC:[]) [UDP] Received oob=false internal=false
message GMS: GmsHeader[VIEW], NAKACK2: [MSG, seqno=1], TP:
[cluster=org.infinispan.remoting.transport.Test]
14:52:04,511 TRACE (ucast-receiver-2,NodeC:[]) [UDP] Received oob=false internal=true
message GMS: GmsHeader[JOIN_RSP], UNICAST3: DATA, seqno=1, first, TP:
[cluster=org.infinispan.remoting.transport.Test]
14:52:04,529 TRACE (jgroups-5,NodeC:[]) [UDP] NodeC: received [NodeD to NodeC, 61 bytes,
flags=INTERNAL], headers are GMS: GmsHeader[JOIN_RSP], UNICAST3: DATA, seqno=1, first, TP:
[cluster=org.infinispan.remoting.transport.Test]
14:52:04,529 DEBUG (ForkThread-3,Test:[]) [GMS] NodeC: installing view [NodeD|1] (2)
[NodeD, NodeC]
14:52:04,532 TRACE (mcast-receiver-3,NodeC:[]) [UDP] Received oob=false internal=false
message GMS: GmsHeader[VIEW], NAKACK2: [MSG, seqno=2], TP:
[cluster=org.infinispan.remoting.transport.Test]
### 2nd view is ignored
14:52:04,532 TRACE (jgroups-4,NodeC:[]) [GMS] GmsImpl ignoring view update [NodeD|2] (3)
[NodeD, NodeC, NodeA]
14:52:04,543 TRACE (mcast-receiver-3,NodeC:[]) [UDP] Received oob=false internal=false
message GMS: GmsHeader[VIEW], NAKACK2: [MSG, seqno=3], TP:
[cluster=org.infinispan.remoting.transport.Test]
14:52:04,543 TRACE (jgroups-4,NodeC:[]) [GMS] Handling message GMS: GmsHeader[VIEW],
NAKACK2: [MSG, seqno=3], TP: [cluster=org.infinispan.remoting.transport.Test]
### 3rd view is delta view, can't be restored
14:52:04,543 WARN (jgroups-4,NodeC:[]) [GMS] NodeC: failed to create view from
delta-view; dropping view: java.lang.IllegalStateException: the view-id of the delta view
([NodeD|2]) doesn't match the current view-id ([NodeD|1]); discarding delta view
[NodeD|3], ref-view=[NodeD|2], joined=[NodeB]
14:52:04,543 TRACE (jgroups-4,NodeC:[]) [GMS] NodeC: sending request for full view to
NodeD
14:52:04,543 TRACE (jgroups-4,NodeC:[]) [UDP] NodeC: sending msg to NodeD, src=NodeC,
headers are GMS: GmsHeader[GET_CURRENT_VIEW], UNICAST3: DATA, seqno=3, TP:
[cluster=org.infinispan.remoting.transport.Test]
14:52:04,544 TRACE (ucast-receiver-2,NodeC:[]) [UDP] Received oob=true internal=true
message GMS: GmsHeader[VIEW], UNICAST3: DATA, seqno=2, TP:
[cluster=org.infinispan.remoting.transport.Test]
14:52:04,544 TRACE (jgroups-4,NodeC:[]) [UDP] NodeC: received [NodeD to NodeC, 91 bytes,
flags=OOB|INTERNAL], headers are GMS: GmsHeader[VIEW], UNICAST3: DATA, seqno=2, TP:
[cluster=org.infinispan.remoting.transport.Test]
14:52:04,544 TRACE (jgroups-4,NodeC:[]) [GMS] Handling message GMS: GmsHeader[VIEW],
UNICAST3: DATA, seqno=2, TP: [cluster=org.infinispan.remoting.transport.Test]
### Full 3rd view is also ignored
14:52:04,544 TRACE (jgroups-4,NodeC:[]) [GMS] GmsImpl ignoring view update [NodeD|3] (4)
[NodeD, NodeC, NodeA, NodeB]
14:52:04,545 TRACE (jgroups-4,NodeC:[]) [UDP] NodeC: sending msg to NodeD, src=NodeC,
headers are GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=4, TP:
[cluster=org.infinispan.remoting.transport.Test]
14:52:04,557 TRACE (ForkThread-3,Test:[]) [UDP] NodeC: sending msg to NodeD, src=NodeC,
headers are GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=6, TP:
[cluster=org.infinispan.remoting.transport.Test]
{noformat}
ClientGmsImpl ignores newer view during join
--------------------------------------------
Key: JGRP-2435
URL:
https://issues.redhat.com/browse/JGRP-2435
Project: JGroups
Issue Type: Bug
Affects Versions: 4.1.9
Reporter: Dan Berindei
Assignee: Bela Ban
Priority: Major
Fix For: 5.0, 4.2.0
We have random failures in a test that starts 4 nodes in parallel
({{org.infinispan.InitialClusterSizeTest.testInitialClusterSize}}). I get at least one
failure if I add {{@Test(invocationCount=100)}}, but I did not get any failure when I did
the same with
{{org.jgroups.tests.ConcurrentStartupTest.testConcurrentJoinWithLOCAL_PING()}}, maybe
because the Infinispan test sends some additional messages and sometimes changes how
messages are processed.
The problem seems to be that after receiving a {{JOIN_RSP}} and installing the first
view, the GMS implementation is still {{ClientGmsImpl}} when receiving the second view,
and the view is ignored. Because this second view already has all 4 members, there is no
other {{VIEW}} message and the test just times out.
I added some logs in TP and GMS for debugging, and this is what I see:
{noformat}
10:37:42,896 TRACE (ucast-receiver-2,Test-NodeC:[]) [UDP] Received oob=false
internal=true message GMS: GmsHeader[JOIN_RSP], UNICAST3: DATA, seqno=1, first, TP:
[cluster=org.infinispan.remoting.transport.InitialClusterSizeTest]
10:37:42,898 TRACE (mcast-receiver-3,Test-NodeC:[]) [UDP] Received oob=false
internal=false message GMS: GmsHeader[VIEW], NAKACK2: [MSG, seqno=1], TP:
[cluster=org.infinispan.remoting.transport.InitialClusterSizeTest]
10:37:42,906 TRACE (jgroups-4,Test-NodeC:[]) [UDP] Test-NodeC: received [Test-NodeB to
Test-NodeC, 61 bytes, flags=INTERNAL], headers are GMS: GmsHeader[JOIN_RSP], UNICAST3:
DATA, seqno=1, first, TP:
[cluster=org.infinispan.remoting.transport.InitialClusterSizeTest]
10:37:42,906 TRACE (jgroups-4,Test-NodeC:[]) [GMS] Handling message GMS:
GmsHeader[JOIN_RSP], UNICAST3: DATA, seqno=1, first, TP:
[cluster=org.infinispan.remoting.transport.InitialClusterSizeTest]
10:37:42,907 DEBUG (ForkThread-1,InitialClusterSizeTest:[]) [GMS] Test-NodeC: installing
view [Test-NodeB|1] (2) [Test-NodeB, Test-NodeC]
10:37:42,947 TRACE (mcast-receiver-3,Test-NodeC:[]) [UDP] Received oob=false
internal=false message GMS: GmsHeader[VIEW], NAKACK2: [MSG, seqno=2], TP:
[cluster=org.infinispan.remoting.transport.InitialClusterSizeTest]
10:37:42,948 TRACE (jgroups-5,Test-NodeC:[]) [UDP] Test-NodeC: received message batch of
from Test-NodeB: dest=null, sender=Test-NodeB
1:
#1: GMS: GmsHeader[VIEW], NAKACK2: [MSG, seqno=1], TP:
[cluster=org.infinispan.remoting.transport.InitialClusterSizeTest]
10:37:42,948 TRACE (jgroups-5,Test-NodeC:[]) [GMS] Handling message GMS: GmsHeader[VIEW],
NAKACK2: [MSG, seqno=1], TP:
[cluster=org.infinispan.remoting.transport.InitialClusterSizeTest]
10:37:42,948 TRACE (jgroups-5,Test-NodeC:[]) [UDP] Test-NodeC: received message batch of
from Test-NodeB: dest=null, sender=Test-NodeB
1:
#1: GMS: GmsHeader[VIEW], NAKACK2: [MSG, seqno=2], TP:
[cluster=org.infinispan.remoting.transport.InitialClusterSizeTest]
10:37:42,948 TRACE (jgroups-5,Test-NodeC:[]) [GMS] Handling message GMS: GmsHeader[VIEW],
NAKACK2: [MSG, seqno=2], TP:
[cluster=org.infinispan.remoting.transport.InitialClusterSizeTest]
### GmsImpl.handleViewChange() {}
10:37:42,948 TRACE (jgroups-5,Test-NodeC:[]) [GMS] GmsImpl ignoring view update
[Test-NodeB|2] (4) [Test-NodeB, Test-NodeC, Test-NodeA, Test-NodeD]
10:37:42,950 TRACE (jgroups-5,Test-NodeC:[]) [UDP] Test-NodeC: sending msg to Test-NodeB,
src=Test-NodeC, headers are GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=3, TP:
[cluster=org.infinispan.remoting.transport.InitialClusterSizeTest]
### The ack for the JOIN_RSP is only sent here
10:37:43,034 TRACE (ForkThread-1,InitialClusterSizeTest:[]) [UDP] Test-NodeC: sending msg
to Test-NodeB, src=Test-NodeC, headers are GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA,
seqno=4, TP: [cluster=org.infinispan.remoting.transport.InitialClusterSizeTest]
10:37:43,034 DEBUG (ForkThread-1,InitialClusterSizeTest:[]) [JGroupsTransport] Waiting
for 4 nodes, current view has 2
{noformat}
To help debugging, {{TP.passBatchUp}} should really log the headers of the messages in
the batch, and {{GMS}} and the {{GmsImpl}} subclasses should log at least a DEBUG message
every time they ignore a view (even when {{log_view_warnings==false}}). I also suggest
removing the default implementations from {{GmsImpl}} and updating the {{ClientGmsImpl}}
javadoc, because it talks about handling {{ViewChange}} instead of {{JoinResponse}}.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)