[JBoss JIRA] (ISPN-4546) Possible stale lock when the primary owner leaves during rebalance

Monday, 30 March 2015

     [
https://issues.jboss.org/browse/ISPN-4546?page=com.atlassian.jira.plugin....
]

Galder Zamarreño updated ISPN-4546:
-----------------------------------
    Fix Version/s: 7.2.0.CR1
                       (was: 7.2.0.Beta2)

...
 Possible stale lock when the primary owner leaves during rebalance
 ------------------------------------------------------------------

                 Key: ISPN-4546
                 URL: https://issues.jboss.org/browse/ISPN-4546
             Project: Infinispan
          Issue Type: Bug
          Components: Core, State Transfer
    Affects Versions: 7.0.0.Alpha5
            Reporter: Dan Berindei
            Assignee: Dan Berindei
            Priority: Critical
             Fix For: 7.2.0.CR1

 Topology T: coordinator = A, owners(k) = [C, D], pending_owners(k) = null
 B sends prepareCommand(tx1, put(k, v)) to C, D
 D adds backup locks and replies
 C acquires lock, ready to send reply to B
 A starts installing topology T+1: owners(k) = [C, D], pending_owners(k) = [C, E]
 A, C and E install topology T+1, B and D do not
 E requests and receives tx data from C, including tx1
 C leaves
 B sees a SuspectException, sends rollbackCommand(tx1) to C, D
 D removes tx1
 C has left, but is ignored
 B reports to the user that the tx has been rolled back
 B and D install topology T+1 (optional)
 A starts installing topology T+2: owners(k) = [D], pending_owners(k) = [E]
 A, B, D, E all install topology T+2
 E requests and receives state from D, but it does not remove tx1
 A starts installing topology T+3: owners(k) = [E], pending_owners(k) = null
 E now has a stale backup lock on k
 It seems very hard to reproduce in production: C would have to leave soon enough so that
B and D haven't received the T+1 topology yet, but late enough for it to send its
transaction data to E.
 A possible solution would be to catch any SuspectException during prepare/commit/rollback
(without ignoring leavers), wait for a new topology, and replicate the command again on
the new owners. Obviously, this wouldn't work with asynchronous
prepare/commit/rollback. 

--
This message was sent by Atlassian JIRA
(v6.3.11#6341)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009