[infinispan-issues] [JBoss JIRA] (ISPN-9109) Commands may be executed in the wrong order after a merge
Dan Berindei (JIRA)
issues at jboss.org
Fri Apr 27 02:09:00 EDT 2018
Dan Berindei created ISPN-9109:
----------------------------------
Summary: Commands may be executed in the wrong order after a merge
Key: ISPN-9109
URL: https://issues.jboss.org/browse/ISPN-9109
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.3.0.Alpha1, 9.2.1.Final
Reporter: Dan Berindei
This is related to ISPN-9104, but it applies to any commands in a REPL_SYNC cache.
We have a topology id check to avoid running commands from an older topology, but if the cluster splits cleanly in 2, then both partitions rebalance and install a topology with the same id. After the partitions merge, commands that were broadcast in one partition are retransmitted by NAKACK2 to the nodes in the other partition, and they will have the right topology id (until the post-merge cache topology update is received) so they will be executed.
The worst scenario is in a transactional cache, where you could have node A in partition \[AB\] broadcast a lock acquisition command ({{LockControlCommand}} in a pessimistic cache, or {{PrepareCommand}} in an optimistic cache), wait for the responses, and then broadcast a lock release command (1-phase {{PrepareCommand}}, {{CommitCommand}}, or {{TxCompletionNotificationCommand}}). In partition \[AB\], the {{TxCompletionNotificationCommand}} is only sent after all the nodes confirmed that they acquired the lock. When partitions \[AB\] and \[CD\] merge, C and D receive both commands, but there's no guarantee that they will be processed in the right order. If the lock release command runs first, it won't do anything, then the lock acquisition command will acquire the lock, and no other command is going to release it.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
More information about the infinispan-issues
mailing list