[jboss-jira] [JBoss JIRA] Commented: (JGRP-177) Join problem
Victor N (JIRA)
jira-events at lists.jboss.org
Wed Feb 18 14:33:44 EST 2009
[ https://jira.jboss.org/jira/browse/JGRP-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12453137#action_12453137 ]
Victor N commented on JGRP-177:
-------------------------------
Bela,
not sure whether my problem is exact the same or something similar,
I ran my test on 5 nodes (N1...N5) with a simple tcp config with tcpping (based on tcp.xml from JGroups 2.7 sources) and everything was working for about 3 days, but then I saw that only 4 nodes can see each other and receive messages from each other, and one of the nodes (N2) is excluded from theirs view.
I looked into logs, it is interesting:
view at N1,N3,N4,N5 is {N1,N3,N4,N5}
view at N2 is {N1,N2,N3,N4,N5} - all 5 nodes!
N2 did not receive viewAccepted and it continues sending messages to all other nodes (I see in tcpdump), but those nodes know that N2 is not member, so they respond with "discarded message from non-member".
The situation does not change during several hours, N2 does not receive the updated view and continues sending messages to all the nodes!
Why does not N2 receive the new view? Or why does not it react to "discarded message from non-member" error from other nodes?
> Join problem
> ------------
>
> Key: JGRP-177
> URL: https://jira.jboss.org/jira/browse/JGRP-177
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.2.8, 2.2.9, 2.2.9.1
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 2.3
>
> Attachments: BaseJGroupsTestCase.java, jgroups.xml, JGroupsTestMain.java, JGroupsTestRemote.java, test.zip
>
>
> I run a testcase that spawns 4 JGroups nodes in 4 separate java processes. Several nodes are then restarted at random and try to reconnect to the group.
> The first node sends a ping and counts the responses received by each node.
> After a couple of iterations ranging from 20 to 100, some nodes are unable to join the group.
> I use JGroups 2.2.9 with a TCP based config (TCP / TCPPING / MERGE2 / FD or FD_SOCK / VERIFY_SUSPECT / pbcast.NAKACK / pbcast.STABLE / VIEW_SYNC / pbcast.GMS ).
>
> EXAMPLE 1 with FD_SOCK:
>
> Node 0
> WARN [GMS] failed to collect all ACKs (1) for view [127.0.0.1:7700|32] after 20000ms, missing ACKs from [127.0.0.1:7701] (received=[127.0.0.1:7700])
> Ping result: {127.0.0.1:7701=3, 127.0.0.1:7700=3, 127.0.0.1:7703=3}
>
> Node 1
> WARN [NAKACK] 127.0.0.1:7701] discarded message from non-member 127.0.0.1:7702
> WARN [NAKACK] 127.0.0.1:7701] discarded message from non-member 127.0.0.1:7702
>
> Node 2
> WARN [NAKACK] 127.0.0.1:7702] discarded message from non-member 127.0.0.1:7700
> ERROR [FD_SOCK] received null cache; retrying
> ERROR [FD_SOCK] received null cache; retrying
> ERROR [FD_SOCK] received null cache; retrying
>
> Node 3
> WARN [NAKACK] 127.0.0.1:7703] discarded message from non-member 127.0.0.1:7702
> WARN [NAKACK] 127.0.0.1:7703] discarded message from non-member 127.0.0.1:7702
>
> EXAMPLE 2 with FD timeout="2000" max_tries="4":
>
> Node 0
> Ping result: {127.0.0.1:7701=0, 127.0.0.1:7700=2, 127.0.0.1:7703=2}
>
> Node 1
> WARN [GMS] handleJoin(127.0.0.1:7701)() should not be invoked on an instance of org.jgroups.protocols.pbcast.ClientGmsImpl
> WARN [GMS] join(127.0.0.1:7701) failed (coord=127.0.0.1:7701), retrying
> WARN [GMS] handleJoin(127.0.0.1:7701)() should not be invoked on an instance of org.jgroups.protocols.pbcast.ClientGmsImpl
> WARN [GMS] join(127.0.0.1:7701) failed (coord=127.0.0.1:7701), retrying
>
> Node 2
> No ERROR or WARN messages.
>
> Node 3
> WARN [GMS] join(127.0.0.1:7703) failed (coord=127.0.0.1:7701), retrying
> WARN [GMS] join(127.0.0.1:7703) failed (coord=127.0.0.1:7700), retrying
>
> Is there something wrong with my JGroups config ?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list