[infinispan-issues] [JBoss JIRA] (ISPN-4743) Rebalance can hang after the coordinator and another node leave

Galder Zamarreño (JIRA) issues at jboss.org
Wed Oct 1 03:18:02 EDT 2014


     [ https://issues.jboss.org/browse/ISPN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Galder Zamarreño updated ISPN-4743:
-----------------------------------
        Status: Resolved  (was: Pull Request Sent)
    Resolution: Done


> Rebalance can hang after the coordinator and another node leave
> ---------------------------------------------------------------
>
>                 Key: ISPN-4743
>                 URL: https://issues.jboss.org/browse/ISPN-4743
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core, State Transfer
>    Affects Versions: 7.0.0.Beta2
>            Reporter: Dan Berindei
>            Assignee: Dan Berindei
>            Priority: Critical
>              Labels: testsuite_stability
>             Fix For: 7.0.0.CR1
>
>
> This caused a failure in {{ClusterTopologyManagerTest.testAbruptLeaveAfterGetStatus}}.
> When the coordinator changes, the new coordinator first sends a {{CacheTopologyControlCommand(type=CH_UPDATE)}} to reset any ongoing rebalance, then a {{CacheTopologyControlCommand(type=REBALANCE_START)}} to start a new rebalance with the remaining members. If another node leaves afterwards, the coordinator sends yet another {{CacheTopologyControlCommand(type=CH_UPDATE)}} to remove the leaver from the CHs.
> If one node (in this case the coordinator itself) processes the last {{CH_UPDATE}} before the other two commands, it will fail to confirm the rebalance, and the cache will stay in "rebalancing" state until another node joins or leaves.



--
This message was sent by Atlassian JIRA
(v6.3.1#6329)



More information about the infinispan-issues mailing list