[infinispan-issues] [JBoss JIRA] (ISPN-4546) Possible stale lock when the primary owner leaves during rebalance
Dan Berindei (JIRA)
issues at jboss.org
Fri Apr 17 05:30:19 EDT 2015
[ https://issues.jboss.org/browse/ISPN-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dan Berindei reopened ISPN-4546:
--------------------------------
LockControlCommands should be retried as well.
> Possible stale lock when the primary owner leaves during rebalance
> ------------------------------------------------------------------
>
> Key: ISPN-4546
> URL: https://issues.jboss.org/browse/ISPN-4546
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha5, 7.1.1.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.2.0.CR1, 7.2.0.Final
>
>
> Topology T: coordinator = A, owners(k) = [C, D], pending_owners(k) = null
> B sends prepareCommand(tx1, put(k, v)) to C, D
> D adds backup locks and replies
> C acquires lock, ready to send reply to B
> A starts installing topology T+1: owners(k) = [C, D], pending_owners(k) = [C, E]
> A, C and E install topology T+1, B and D do not
> E requests and receives tx data from C, including tx1
> C leaves
> B sees a SuspectException, sends rollbackCommand(tx1) to C, D
> D removes tx1
> C has left, but is ignored
> B reports to the user that the tx has been rolled back
> B and D install topology T+1 (optional)
> A starts installing topology T+2: owners(k) = [D], pending_owners(k) = [E]
> A, B, D, E all install topology T+2
> E requests and receives state from D, but it does not remove tx1
> A starts installing topology T+3: owners(k) = [E], pending_owners(k) = null
> E now has a stale backup lock on k
> It seems very hard to reproduce in production: C would have to leave soon enough so that B and D haven't received the T+1 topology yet, but late enough for it to send its transaction data to E.
> A possible solution would be to catch any SuspectException during prepare/commit/rollback (without ignoring leavers), wait for a new topology, and replicate the command again on the new owners. Obviously, this wouldn't work with asynchronous prepare/commit/rollback.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
More information about the infinispan-issues
mailing list