[jboss-jira] [JBoss JIRA] (JGRP-2262) "Frozen" coordinator causes the whole cluster to hang

Pietro Paolini (JIRA) issues at jboss.org
Thu Apr 12 05:00:01 EDT 2018


     [ https://issues.jboss.org/browse/JGRP-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pietro Paolini updated JGRP-2262:
---------------------------------
    Description: 
This is the result of an investigation I carried out for a problem we have experienced within our
application, the scenario it has been re-created by pausing the JVM using a debugger.

The discovery mechanism is JDBC_PING.

If the coordinator's JVM gets fronzen (for whatever reason) before the coordinator sets itself as the cluster coordinator and another node is started after that it will be unable to join the cluster and it will hang indefinitely.

This seems to be caused by the "continue" statement at

https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/pbcast/ClientGmsImpl.java:92

I have prepared a simple application which can help in replicating the problem.

To replicate the problem :

1) Make sure the JGROUPSPING is empty
2) Run the application using an IDE and attaching a debugger to cause the JVM to 
    be paused at line Main.java:67, wait for it.
3) Run the application in non debug mode or with gradle using "gradle run" and it will 
    hang indefinitely 

Depending on the UUID/IP Address being used generated/assigned this may not happen all the time but it happened quite often in my local tests. 
       





  was:
This is the result of an investigation I carried out for a problem we have experienced within our
application, in this scenario it has been re-created by pausing the JVM using a debugger.

The discovery mechanism is JDBC_PING.

If the coordinator's JVM gets fronzen (for whatever reason) before the coordinator sets itself as the cluster coordinator and another node is started after that it will be unable to join the cluster and it will hang indefinitely.

This seems to be caused by the "continue" statement at

https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/pbcast/ClientGmsImpl.java:92

I have prepared a simple application which can help in replicating the problem.

To replicate the problem :

1) Make sure the JGROUPSPING is empty
2) Run the application using an IDE and attaching a debugger to cause the JVM to 
    be paused at line Main.java:67, wait for it.
3) Run the application in non debug mode or with gradle using "gradle run" and it will 
    hang indefinitely 

Depending on the UUID/IP Address being used generated/assigned this may not happen all the time but it happened quite often in my local tests. 
       







> "Frozen" coordinator causes the whole cluster to hang
> -----------------------------------------------------
>
>                 Key: JGRP-2262
>                 URL: https://issues.jboss.org/browse/JGRP-2262
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.6.7
>            Reporter: Pietro Paolini
>            Assignee: Bela Ban
>         Attachments: jdbc_test.xml, jgroup.zip
>
>
> This is the result of an investigation I carried out for a problem we have experienced within our
> application, the scenario it has been re-created by pausing the JVM using a debugger.
> The discovery mechanism is JDBC_PING.
> If the coordinator's JVM gets fronzen (for whatever reason) before the coordinator sets itself as the cluster coordinator and another node is started after that it will be unable to join the cluster and it will hang indefinitely.
> This seems to be caused by the "continue" statement at
> https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/pbcast/ClientGmsImpl.java:92
> I have prepared a simple application which can help in replicating the problem.
> To replicate the problem :
> 1) Make sure the JGROUPSPING is empty
> 2) Run the application using an IDE and attaching a debugger to cause the JVM to 
>     be paused at line Main.java:67, wait for it.
> 3) Run the application in non debug mode or with gradle using "gradle run" and it will 
>     hang indefinitely 
> Depending on the UUID/IP Address being used generated/assigned this may not happen all the time but it happened quite often in my local tests. 
>        



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)


More information about the jboss-jira mailing list