[jboss-jira] [JBoss JIRA] Commented: (JGRP-177) Join problem

Wednesday, 18 February 2009

    [
https://jira.jboss.org/jira/browse/JGRP-177?page=com.atlassian.jira.plugi...
] 

Victor N commented on JGRP-177:
-------------------------------

Bela,
not sure whether my problem is exact the same or something similar,

I ran my test on 5 nodes (N1...N5)  with a simple tcp config with tcpping (based on
tcp.xml from JGroups 2.7 sources) and everything was working for about 3 days, but then I
saw that only 4 nodes can see each other and receive messages from each other, and one of
the nodes (N2) is excluded from theirs view.
I looked into logs, it is interesting:
view at N1,N3,N4,N5 is {N1,N3,N4,N5}
view at N2 is {N1,N2,N3,N4,N5} - all 5 nodes!

N2 did not receive viewAccepted and it continues sending messages to all other nodes (I
see in tcpdump), but those nodes know that N2 is not member, so they respond with
"discarded message from non-member".

The situation does not change during several hours, N2 does not receive the updated view
and continues sending messages to all the nodes!
Why does not N2 receive the new view? Or why does not it react to "discarded message
from non-member" error from other nodes?

...
 Join problem
 ------------

                 Key: JGRP-177
                 URL: https://jira.jboss.org/jira/browse/JGRP-177
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 2.2.8, 2.2.9, 2.2.9.1
            Reporter: Bela Ban
            Assignee: Bela Ban
             Fix For: 2.3

         Attachments: BaseJGroupsTestCase.java, jgroups.xml, JGroupsTestMain.java,
JGroupsTestRemote.java, test.zip

 I run a testcase that spawns 4 JGroups nodes in 4 separate java processes. Several nodes
are then restarted at random and try to reconnect to the group. 
 The first node sends a ping and counts the responses received by each node. 
 After a couple of iterations ranging from 20 to 100, some nodes are unable to join the
group. 
 I use JGroups 2.2.9 with a TCP based config (TCP / TCPPING / MERGE2 / FD or FD_SOCK /
VERIFY_SUSPECT / pbcast.NAKACK / pbcast.STABLE / VIEW_SYNC / pbcast.GMS ). 

 EXAMPLE 1 with FD_SOCK: 

 Node 0 
 WARN [GMS] failed to collect all ACKs (1) for view [127.0.0.1:7700|32] after 20000ms,
missing ACKs from [127.0.0.1:7701] (received=[127.0.0.1:7700]) 
 Ping result: {127.0.0.1:7701=3, 127.0.0.1:7700=3, 127.0.0.1:7703=3} 

 Node 1 
 WARN [NAKACK] 127.0.0.1:7701] discarded message from non-member 127.0.0.1:7702 
 WARN [NAKACK] 127.0.0.1:7701] discarded message from non-member 127.0.0.1:7702 

 Node 2 
 WARN [NAKACK] 127.0.0.1:7702] discarded message from non-member 127.0.0.1:7700 
 ERROR [FD_SOCK] received null cache; retrying 
 ERROR [FD_SOCK] received null cache; retrying 
 ERROR [FD_SOCK] received null cache; retrying 

 Node 3 
 WARN [NAKACK] 127.0.0.1:7703] discarded message from non-member 127.0.0.1:7702 
 WARN [NAKACK] 127.0.0.1:7703] discarded message from non-member 127.0.0.1:7702 

 EXAMPLE 2 with FD timeout=&quot;2000&quot; max_tries=&quot;4&quot;: 

 Node 0 
 Ping result: {127.0.0.1:7701=0, 127.0.0.1:7700=2, 127.0.0.1:7703=2} 

 Node 1 
 WARN [GMS] handleJoin(127.0.0.1:7701)() should not be invoked on an instance of
org.jgroups.protocols.pbcast.ClientGmsImpl 
 WARN [GMS] join(127.0.0.1:7701) failed (coord=127.0.0.1:7701), retrying 
 WARN [GMS] handleJoin(127.0.0.1:7701)() should not be invoked on an instance of
org.jgroups.protocols.pbcast.ClientGmsImpl 
 WARN [GMS] join(127.0.0.1:7701) failed (coord=127.0.0.1:7701), retrying 

 Node 2 
 No ERROR or WARN messages. 

 Node 3 
 WARN [GMS] join(127.0.0.1:7703) failed (coord=127.0.0.1:7701), retrying 
 WARN [GMS] join(127.0.0.1:7703) failed (coord=127.0.0.1:7700), retrying 

 Is there something wrong with my JGroups config ?  
-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] Commented: (JGRP-177) Join problem