[jboss-jira] [JBoss JIRA] Created: (JGRP-985) Admin Join problem -GMS flush by coordinator failed
Ronn C (JIRA)
jira-events at lists.jboss.org
Sun May 24 20:50:56 EDT 2009
Admin Join problem -GMS flush by coordinator failed
----------------------------------------------------
Key: JGRP-985
URL: https://jira.jboss.org/jira/browse/JGRP-985
Project: JGroups
Issue Type: Bug
Affects Versions: 2.7, 2.6.5, 2.6.4, 2.6.3
Environment: linux redhat 2.9.6, jdk 1.5
Reporter: Ronn C
Assignee: Bela Ban
Attachments: jgroup.tar.gz
I am experiencing a problem with jgroups trying to join existing cluster.
Occasionally, new node joining a existing cluster can experience this problem.
2009-05-21 12:04:02,568 [main] WARN org.jgroups.protocols.pbcast.GMS:144 - join(callisto.tmca.com.au-18715) sent to callisto.tmca.com.au-8185 timed out (after 3000 ms), retrying
Retries can varies from a couple of times to infinitely retrying.
Debugging the code, I've discovered that before join the coordinator will perform a GMS flush and unless that GMS flush success it won't reply with a join response.
So sure enough at the coordinator, I see this log.
2009-05-21 12:05:25,902 [ViewHandler,callisto.tmca.com.au-8185] WARN org.jgroups.protocols.pbcast.GMS:749 - GMS flush by coordinator at callisto.tmca.com.au-8185 failed
I've originally come across this problem in our prod environment with 2.6.3. I have been able to replicate it reliably with 2.6.3. I have tested with 2.7.0 and 2.8.0.alpha3 and retries still occurs but generally it would sort itself out within a minute. However, I've found that retries can still occurs infinitely on 2.8.0 if you keep repeating the test often enough.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list