]
Vladimir Blagojevic resolved JGRP-1024.
---------------------------------------
Resolution: Done
Affected file is FLUSH.java. If timeout occurs then rejectFlush to all members.
Join and state transfer locks up entire cluster
-----------------------------------------------
Key: JGRP-1024
URL:
https://jira.jboss.org/jira/browse/JGRP-1024
Project: JGroups
Issue Type: Bug
Affects Versions: 2.6.9, 2.6.10, 2.6.11
Reporter: Vladimir Blagojevic
Assignee: Vladimir Blagojevic
Fix For: 2.6.12, 2.8
When a new node joins a cluster and requests a join-and-state-transfer, coordinator C
receives a request and calls a startFlush (CoordGmsImpl.java:400). Call startFlush might
fail to due timeout (see FLUSH#startFlush(List<Address> flushParticipants) ). This
causes function to return false and a repeat of start flush to be issue by the caller.
However, since switch flushInProgress is still true all repeated requests for start flush
fail. On the other hand joining node goes into connect loop - it never receives a new view
due to a failed flush at coordinator C and it never issues necessary stopFlush that would
switch back flushInProgress to false. So coordinator C never comes out of stuck state and
client stays in stuck state. The end result is that we have a lockup forever.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: