[infinispan-issues] [JBoss JIRA] (ISPN-3421) State Transfer can leave keys on < numOwners nodes for transactional caches.

Dan Berindei (JIRA) issues at jboss.org
Tue Jan 7 10:28:33 EST 2014


    [ https://issues.jboss.org/browse/ISPN-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934193#comment-12934193 ] 

Dan Berindei commented on ISPN-3421:
------------------------------------

I think the problem is that A leaves the cluster before it managed to send the CommitCommand to all the replicas. E.g. B receives the commit, but C doesn't. When C sees a new view without A in it, it deletes all transactions that originated on A in {{TransactionTable.cleanupStaleTransactions()}} (at least if recovery is not enabled). [~an1310], did I get it right?

I think fixing this would require changing the CH update that happens after a leave to be done in two phases, just like rebalancing. In the first phase, all the members would install the new CH and would stop accepting commands from the leaver. In the second phase, each member would delete the pending transactions from the leaver, but first it would check with the other owners if they received a commit command. If any owner received the commit, the transaction should be committed, otherwise it should be removed.

Enabling recovery may help a bit, since C won't delete transactions originated from A outright, but on the other hand it doesn't keep backup locks for them - new transactions can update the same data while the originator is down, so we still have inconsistencies.
                
> State Transfer can leave keys on < numOwners nodes for transactional caches.
> ----------------------------------------------------------------------------
>
>                 Key: ISPN-3421
>                 URL: https://issues.jboss.org/browse/ISPN-3421
>             Project: Infinispan
>          Issue Type: Bug
>          Components: State transfer
>    Affects Versions: 5.2.7.Final
>            Reporter: Erik Salter
>            Assignee: Dan Berindei
>            Priority: Critical
>
> There's a hole in state transfer mechanism that can occur when a node is leaving the cluster, but it was creating the entries and was only able to replicate the data to some of the nodes. 
> The problem occurs when the segment ownership of the node doesn't change after the rebalance.  Since state transfer does not request state for keys in which it is already an owner, the cache could be left in a state where a key is resident < numOwners nodes.  In addition, this could be any subset of the primary OR backup nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list