[
http://jira.jboss.com/jira/browse/JGRP-603?page=comments#action_12382877 ]
Vladimir Blagojevic commented on JGRP-603:
------------------------------------------
Bela described the scenario to reproduce this issue:
- View is V2 {A,B,C}
- Member C has address C:7602
- Member C is killed and *immediately* restarted (before A can exclude it)
- Member C has the *same* address: C:7602
- On JOIN, A will return the *existing view* V2 because it sees that C is still a member
!
Does C's FLUSH.down() unblock in such a case ? I don't think so, it only unblocks
on reception of the STOP-FLUSH message ! However, this message is never received because
the coord (A) never even starts the flush in this case !
FLUSH: problems with TCP and concurrent startup/shutdowns
---------------------------------------------------------
Key: JGRP-603
URL:
http://jira.jboss.com/jira/browse/JGRP-603
Project: JGroups
Issue Type: Bug
Reporter: Bela Ban
Assigned To: Vladimir Blagojevic
Priority: Critical
Fix For: 2.6
Attachments: 1.txt, 103.txt, 104.txt, 105.txt, 2.txt, 3.txt, jgtest.zip
The attached ZIP file has code that reproduces this.
Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and
"tcp-nio" if you want to test the different stacks.
To reproduce:
- Start a number of instances (e.g. 5) concurrently. This almost never works, even under
"udp". Joiners' JOIN requests time out and they have to retry (possibly
because the coord is busy with the FLUSH protocol). They become singleton members and
*never* merge !
- This works fine without FLUSH
- With "udp", it works almost always, with "tcp" it works 50% of the
time, with "tcp-nio" is almost never works
- Randomly kill and restart instances
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira