[jboss-jira] [JBoss JIRA] (JGRP-1674) STOP_FLUSH race condition

Dennis Reed (JIRA) jira-events at lists.jboss.org
Fri Aug 9 15:32:26 EDT 2013


     [ https://issues.jboss.org/browse/JGRP-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Reed updated JGRP-1674:
------------------------------

    Description: 
There is a race condition in STOP_FLUSH when a node joins the cluster.

JOINER sends JOIN_REQ to MASTER
MASTER does a flush on the existing members (does NOT include JOINER)
MASTER sends JOIN_RSP
MASTER sends STOP_FLUSH

JOINER receives JOIN_RSP
JOINER fetches state, sends START_FLUSH
JOINER receives STOP_FLUSH from MASTER (does not apply, since JOINER was not part of the original FLUSH)

onStopFlush never verifies that the current node was part of the FLUSH, and therefore is valid for the current node.
This STOP_FLUSH corrupts JOINER's FLUSH by resetting all the member variables (and probably unblocking as well).


  was:
There is a race condition in STOP_FLUSH when a node joins the cluster.

JOINER sends JOIN_REQ to MASTER
MASTER does a flush on the existing members (START_FLUSH, ...)
MASTER sends JOIN_RSP
MASTER sends STOP_FLUSH

JOINER receives JOIN_RSP
JOINER fetches state, sends START_FLUSH
JOINER receives STOP_FLUSH from MASTER

onStopFlush never verifies that the current node was part of the FLUSH, and therefore is valid for the current node.
So this STOP_FLUSH corrupts JOINER's FLUSH by resetting all the member variables (and probably unblocking).



    
> STOP_FLUSH race condition
> -------------------------
>
>                 Key: JGRP-1674
>                 URL: https://issues.jboss.org/browse/JGRP-1674
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.6.21
>            Reporter: Dennis Reed
>            Assignee: Bela Ban
>
> There is a race condition in STOP_FLUSH when a node joins the cluster.
> JOINER sends JOIN_REQ to MASTER
> MASTER does a flush on the existing members (does NOT include JOINER)
> MASTER sends JOIN_RSP
> MASTER sends STOP_FLUSH
> JOINER receives JOIN_RSP
> JOINER fetches state, sends START_FLUSH
> JOINER receives STOP_FLUSH from MASTER (does not apply, since JOINER was not part of the original FLUSH)
> onStopFlush never verifies that the current node was part of the FLUSH, and therefore is valid for the current node.
> This STOP_FLUSH corrupts JOINER's FLUSH by resetting all the member variables (and probably unblocking as well).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list