[infinispan-issues] [JBoss JIRA] (ISPN-4310) StateResponse chunk with lastChunk=true from cancelled ST stops receiving data in next ST

Wednesday, 28 May 2014

     [
https://issues.jboss.org/browse/ISPN-4310?page=com.atlassian.jira.plugin....
]

Dan Berindei updated ISPN-4310:
-------------------------------

              Status: Pull Request Sent  (was: Open)
    Git Pull Request: https://github.com/infinispan/infinispan/pull/2584

Keep track of the topology id when state transfer was initiated, and reject state response
commands with older topology id.

We can't use the exact topology id, because a node (other than the coordinator)
leaving could increment the topology id without restarting the state transfer.

Note that the state transfer topology id is not necessarily the same as the topology id of
the {{REBALANCE_START}} command. If a node leaves, causing a
{{CH_UPDATE(rebalanceTopologyId + 1)}} command to be sent, other nodes may receive the
{{CH_UPDATE}} first and start the state transfer with the higher topology id.

...
 StateResponse chunk with lastChunk=true from cancelled ST stops
receiving data in next ST

-----------------------------------------------------------------------------------------

                 Key: ISPN-4310
                 URL: https://issues.jboss.org/browse/ISPN-4310
             Project: Infinispan
          Issue Type: Bug
          Components: State Transfer
    Affects Versions: 6.0.2.Final, 7.0.0.Alpha4
            Reporter: Radim Vansa
            Assignee: Dan Berindei
            Priority: Critical

 1. A requests segment from B (there are multiple chunks)
 2. B sends all chunks, but before A receives them, new topology arrives and A cancels the
ST.
 3. Another topology comes and A requests this segment again
 4. A receives the old StateResponseCommand with lastChunk=true and thinks that it got all
segments, therefore, it discards further chunks.
 Result is inconsistent cluster, and after further rebalances completely lost data.
 This ought to be rare, but was repeatedly observed when gracefully stopping coordinator
on a 32-node cluster full of data. 

--
This message was sent by Atlassian JIRA
(v6.2.3#6260)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] (ISPN-4310) StateResponse chunk with lastChunk=true from cancelled ST stops receiving data in next ST