[infinispan-issues] [JBoss JIRA] (ISPN-5046) PartitionHandling: split during commit can leave the cache inconsistent after merge

Dan Berindei (JIRA) issues at jboss.org
Tue Dec 9 07:58:40 EST 2014


    [ https://issues.jboss.org/browse/ISPN-5046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026243#comment-13026243 ] 

Dan Berindei edited comment on ISPN-5046 at 12/9/14 7:58 AM:
-------------------------------------------------------------

If A and B crashed, C and D will stay in degraded mode until the administrator will change the cache availability to Available manually. When that happens, C can decide that the transaction was not committed anywhere else and roll back T.

C should only give up when the cache is Available again. Of course, it wouldn't be a good idea to keep a thread blocked for all that time, but I think it's reasonable to block other transactions from updating the same keys.
Edit: We could probably end such transactions early with an AvailabilityException instead of trying to lock the key.

A cache won't suddenly become available because another node joined: availability can only be restored when merging with other nodes from the latest stable cache topology (or when the admin intervenes manually).


was (Author: dan.berindei):
If A and B crashed, C and D will stay in degraded mode until the administrator will change the cache availability to Available manually. When that happens, C can decide that the transaction was not committed anywhere else and roll back T.

C should only give up when the cache is Available again. Of course, it wouldn't be a good idea to keep a thread blocked for all that time, but I think it's reasonable to block other transactions from updating the same keys.

A cache won't suddenly become available because another node joined: availability can only be restored when merging with other nodes from the latest stable cache topology (or when the admin intervenes manually).

> PartitionHandling: split during commit can leave the cache inconsistent after merge
> -----------------------------------------------------------------------------------
>
>                 Key: ISPN-5046
>                 URL: https://issues.jboss.org/browse/ISPN-5046
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core, State Transfer
>    Affects Versions: 7.0.2.Final, 7.1.0.Alpha1
>            Reporter: Dan Berindei
>            Priority: Critical
>             Fix For: 7.1.0.Beta1
>
>
> Say we have a cluster ABCD; a transaction T was started on A, with B as the primary owner and C the backup owner. B and C both acknowledge the prepare, and the network splits into AB and CD right before A sends the commit command. Eventually A suspects C and D, but the commit still succeeds on B before C and D are suspected. And SuspectExceptions are ignored for commit commands, so the user won't see any error.
> However, C will eventually suspect A and B. When the CD cache topology is installed, it will roll back transaction T. After the merge, both partitions are in degraded mode, so we assume that they both have the latest data and the key is never updated on C.
> From C's point of view, this is very similar to ISPN-3421. The fix should also be similar, we could delay the transaction rollback on C until we get a confirmation from B that T was not committed there. Since B is inaccessible, it will eventually get a SuspectException and the CD cache topology, at which point the cache is in degraded mode and it can wait for a merge. On merge, it should check the status of the transaction on B again, and either commit or rollback based on what B did.
> We also need to suspend the cleanup of completed transactions while the cache is in degraded mode, otherwise C might not find T on B after the merge.



--
This message was sent by Atlassian JIRA
(v6.3.8#6338)


More information about the infinispan-issues mailing list