[JBoss JIRA] Created: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[JBoss JIRA] Created:...

[JBoss JIRA] Created: (JBAS-4942)...

Bela Ban (JIRA)

Wednesday, 3 October 2007 Wed, 3 Oct '07

3:29 a.m.

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Show replies by date

Bela Ban (JIRA)

Wednesday, 3 October Wed, 3 Oct

3:31 a.m.

New subject: [JBoss JIRA] Updated: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=all ] Bela Ban updated JGRP-603: -------------------------- Attachment: jgtest.zip

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Attachments: jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

-- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Bela Ban (JIRA)

3:33 a.m.

New subject: [JBoss JIRA] Updated: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=all ] Bela Ban updated JGRP-603: -------------------------- Fix Version/s: 2.6 Priority: Critical (was: Major)

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Priority: Critical Fix For: 2.6 Attachments: jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

Bela Ban (JIRA)

3:43 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=comments#action_12380083 ] Bela Ban commented on JGRP-603: ------------------------------- Actually, this works really well with "udp" (I probably tested "tcp" when I thought I was testing "udp")... But there are definitely issues with "tcp". Most issues are with "tcp-nio", and we should look into those, although tcp-nio has not yet been really well tested (and supported).

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Priority: Critical Fix For: 2.6 Attachments: jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

Bela Ban (JIRA)

4:02 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=comments#action_12380086 ] Bela Ban commented on JGRP-603: ------------------------------- Okay, tcp and tcp-nio now work too if the following changes are made: - Remove UNICAST from the tcp and tcp-nio stacks - Set suspect_on_send_failure="true" in TCP and TCP_NIO

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Priority: Critical Fix For: 2.6 Attachments: jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

Bela Ban (JIRA)

Monday, 15 October Mon, 15 Oct

4:33 p.m.

New subject: [JBoss JIRA] Updated: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=all ] Bela Ban updated JGRP-603: -------------------------- Attachment: 2.txt

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Priority: Critical Fix For: 2.6 Attachments: 1.txt, 2.txt, 3.txt, jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

Bela Ban (JIRA)

4:33 p.m.

New subject: [JBoss JIRA] Updated: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=all ] Bela Ban updated JGRP-603: -------------------------- Attachment: 3.txt

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Priority: Critical Fix For: 2.6 Attachments: 1.txt, 2.txt, 3.txt, jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

Bela Ban (JIRA)

4:33 p.m.

New subject: [JBoss JIRA] Updated: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=all ] Bela Ban updated JGRP-603: -------------------------- Attachment: 1.txt

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Priority: Critical Fix For: 2.6 Attachments: 1.txt, 2.txt, 3.txt, jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

Bela Ban (JIRA)

Tuesday, 16 October Tue, 16 Oct

3:42 p.m.

New subject: [JBoss JIRA] Updated: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=all ] Bela Ban updated JGRP-603: -------------------------- Attachment: 103.txt

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Priority: Critical Fix For: 2.6 Attachments: 1.txt, 103.txt, 104.txt, 105.txt, 2.txt, 3.txt, jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

Bela Ban (JIRA)

3:42 p.m.

New subject: [JBoss JIRA] Updated: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=all ] Bela Ban updated JGRP-603: -------------------------- Attachment: 104.txt

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Priority: Critical Fix For: 2.6 Attachments: 1.txt, 103.txt, 104.txt, 105.txt, 2.txt, 3.txt, jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

Bela Ban (JIRA)

3:42 p.m.

New subject: [JBoss JIRA] Updated: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=all ] Bela Ban updated JGRP-603: -------------------------- Attachment: 105.txt

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Priority: Critical Fix For: 2.6 Attachments: 1.txt, 103.txt, 104.txt, 105.txt, 2.txt, 3.txt, jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

Vladimir Blagojevic (JIRA)

3:52 p.m.

New subject: [JBoss JIRA] Commented: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=comments#action_12382877 ] Vladimir Blagojevic commented on JGRP-603: ------------------------------------------ Bela described the scenario to reproduce this issue: - View is V2 {A,B,C} - Member C has address C:7602 - Member C is killed and *immediately* restarted (before A can exclude it) - Member C has the *same* address: C:7602 - On JOIN, A will return the *existing view* V2 because it sees that C is still a member ! Does C's FLUSH.down() unblock in such a case ? I don't think so, it only unblocks on reception of the STOP-FLUSH message ! However, this message is never received because the coord (A) never even starts the flush in this case !

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Priority: Critical Fix For: 2.6 Attachments: 1.txt, 103.txt, 104.txt, 105.txt, 2.txt, 3.txt, jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

Bela Ban (JIRA)

Tuesday, 6 November Tue, 6 Nov

8:29 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=comments#action_12386176 ] Bela Ban commented on JGRP-603: ------------------------------- Solved by preventing the reincarnation issue

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Priority: Critical Fix For: 2.6 Attachments: 1.txt, 103.txt, 104.txt, 105.txt, 2.txt, 3.txt, jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

Bela Ban (JIRA)

8:29 a.m.

New subject: [JBoss JIRA] Resolved: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns

[ http://jira.jboss.com/jira/browse/JGRP-603?page=all ] Bela Ban resolved JGRP-603. --------------------------- Resolution: Done Solved by http://jira.jboss.com/jira/browse/JGRP-130

...

FLUSH: problems with TCP and concurrent startup/shutdowns --------------------------------------------------------- Key: JGRP-603 URL: http://jira.jboss.com/jira/browse/JGRP-603 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Vladimir Blagojevic Priority: Critical Fix For: 2.6 Attachments: 1.txt, 103.txt, 104.txt, 105.txt, 2.txt, 3.txt, jgtest.zip The attached ZIP file has code that reproduces this. Modify props.props (pf.cluster.transport.protocol=udp) to "tcp" and "tcp-nio" if you want to test the different stacks. To reproduce: - Start a number of instances (e.g. 5) concurrently. This almost never works, even under "udp". Joiners' JOIN requests time out and they have to retry (possibly because the coord is busy with the FLUSH protocol). They become singleton members and *never* merge ! - This works fine without FLUSH - With "udp", it works almost always, with "tcp" it works 50% of the time, with "tcp-nio" is almost never works - Randomly kill and restart instances

6262

days inactive

6296

days old

jboss-jira@lists.jboss.org

Manage subscription

13 comments

2 participants

tags (0)

participants (2)

Bela Ban (JIRA)
Vladimir Blagojevic (JIRA)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[JBoss JIRA] Created: (JGRP-603) FLUSH: problems with TCP and concurrent startup/shutdowns