[JBoss JIRA] (ISPN-9109) Commands may be executed in the wrong order after a merge

Friday, 27 April 2018

Dan Berindei created ISPN-9109:
----------------------------------

             Summary: Commands may be executed in the wrong order after a merge
                 Key: ISPN-9109
                 URL: https://issues.jboss.org/browse/ISPN-9109
             Project: Infinispan
          Issue Type: Bug
          Components: Core
    Affects Versions: 9.3.0.Alpha1, 9.2.1.Final
            Reporter: Dan Berindei

This is related to ISPN-9104, but it applies to any commands in a REPL_SYNC cache.

We have a topology id check to avoid running commands from an older topology, but if the
cluster splits cleanly in 2, then both partitions rebalance and install a topology with
the same id. After the partitions merge, commands that were broadcast in one partition are
retransmitted by NAKACK2 to the nodes in the other partition, and they will have the right
topology id (until the post-merge cache topology update is received) so they will be
executed.

The worst scenario is in a transactional cache, where you could have node A in partition
\[AB\] broadcast a lock acquisition command ({{LockControlCommand}} in a pessimistic
cache, or {{PrepareCommand}} in an optimistic cache), wait for the responses, and then
broadcast a lock release command (1-phase {{PrepareCommand}}, {{CommitCommand}}, or
{{TxCompletionNotificationCommand}}). In partition \[AB\], the
{{TxCompletionNotificationCommand}} is only sent after all the nodes confirmed that they
acquired the lock. When partitions \[AB\] and \[CD\] merge, C and D receive both commands,
but there's no guarantee that they will be processed in the right order. If the lock
release command runs first, it won't do anything, then the lock acquisition command
will acquire the lock, and no other command is going to release it.

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009