[
https://issues.jboss.org/browse/JGRP-2262?page=com.atlassian.jira.plugin....
]
Steven Schlansker commented on JGRP-2262:
-----------------------------------------
We've been running into hangs on cluster join. It's unclear to us whether this
issue is related -- the stack trace isn't exactly the same, but we definitely hang
after a network event causes the coordinator to move. We'd like to upgrade to 4.0.12
but it isn't out yet, unfortunately, so count this as another vote for a release soon
:)
"Frozen" coordinator causes the whole cluster to hang
-----------------------------------------------------
Key: JGRP-2262
URL:
https://issues.jboss.org/browse/JGRP-2262
Project: JGroups
Issue Type: Bug
Affects Versions: 3.6.7
Reporter: Pietro Paolini
Assignee: Bela Ban
Fix For: 4.0.12, 3.6.16
Attachments: jdbc_test.xml, jgroup.zip
This is the result of an investigation I carried out for a problem we have experienced
within our
application, the scenario it has been re-created by pausing the JVM using a debugger.
The discovery mechanism is JDBC_PING.
If the coordinator's JVM gets fronzen (for whatever reason) before the coordinator
sets itself as the cluster coordinator and another node is started after that it will be
unable to join the cluster and it will hang indefinitely.
This seems to be caused by the "continue" statement at
https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/...
I have prepared a simple application which can help in replicating the problem.
To replicate the problem :
1) Make sure the JGROUPSPING is empty
2) Run the application using an IDE and attaching a debugger to cause the JVM to
be paused at line Main.java:67, wait for it.
3) Run the application in non debug mode or with gradle using "gradle run" and
it will
hang indefinitely
Depending on the UUID/IP Address being used generated/assigned this may not happen all
the time but it happened quite often in my local tests.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)