[jboss-jira] [JBoss JIRA] Commented: (JGRP-985) Admin Join problem -GMS flush by coordinator failed

Thu Jun 4 19:30:56 EDT 2009

    [ https://jira.jboss.org/jira/browse/JGRP-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12470732#action_12470732 ] 

Ronn C commented on JGRP-985:
-----------------------------

Hi Vladimir,

No reason why UNICAST was commented out. I was trying different settings. My original problem would have UNICAST entry.

We do implement state in our real application but in the test case I didn't need to.

I agree that it is hard to replicate on 2.8.0alpha3 (compare to 2.6.3) and I did spend a whole afternoon constantly  at it and thought it was fixed. The next morning I turned up to work tried it a few times and it happened again. 

Besides all of that, are you able to confirm that this is an old problem that 2.8.0alpha3 had specifically addressed? and if it has been ported to 2.6.10.merge? We need to deploy to prod very soon and I'm hesitant to use library that is still in alpha version but if it is not a known issue and has not been fixed in 2.6.10.merge then I need to know.

> Admin  Join problem -GMS flush by coordinator failed
> ----------------------------------------------------
>
>                 Key: JGRP-985
>                 URL: https://jira.jboss.org/jira/browse/JGRP-985
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.6.3, 2.6.4, 2.6.5, 2.7
>         Environment: linux redhat 2.9.6, jdk 1.5
>            Reporter: Ronn C
>            Assignee: Vladimir Blagojevic
>             Fix For: 2.6.11, 2.8
>
>         Attachments: jgroup.tar.gz
>
>
> I am experiencing a problem with jgroups trying to join existing cluster.  
>  
> Occasionally, new node joining a existing cluster can experience this problem. 
>  
> 2009-05-21 12:04:02,568 [main] WARN org.jgroups.protocols.pbcast.GMS:144 - join(callisto.tmca.com.au-18715) sent to callisto.tmca.com.au-8185 timed out (after 3000 ms), retrying 
>  
> Retries can varies from a couple of times to infinitely retrying. 
>  
> Debugging the code, I've discovered that before join the coordinator will perform a GMS flush and unless that GMS flush success it won't reply with a join response. 
>  
> So sure enough at the coordinator, I see this log. 
> 2009-05-21 12:05:25,902 [ViewHandler,callisto.tmca.com.au-8185] WARN org.jgroups.protocols.pbcast.GMS:749 - GMS flush by coordinator at callisto.tmca.com.au-8185 failed 
>  
> I've originally come across this problem in our prod environment with 2.6.3. I have been able to replicate it reliably with 2.6.3. I have tested with 2.7.0 and 2.8.0.alpha3 and retries still occurs but generally it would sort itself out within a minute. However, I've found that retries can still occurs infinitely on 2.8.0 if you keep repeating the test often enough.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira