]
Dan Berindei commented on ISPN-5274:
------------------------------------
Agree, it is probably better to fail the transaction in this situation. The transaction
still won't be atomic, as some of the nodes will have already committed it, but it
would be better than the current outcome.
I believe it will be enough to fail the transaction if one of the commit command's
targets is missing from the view and the local partition is unavailable. To minimize the
chances of a partial commit, we should perform the check before sending the commit
command, but we also need to run it after - in case the cluster split right then.
The "after" check could create some false positives because it won't have
access to the actual responses to the commit command, so the partition could become
degraded right after the commit succeeded. This will leave the cache inconsistent, but I
think it is an acceptable risk for now.
Inconsistent data after transaction rollback (with success on
originator)
-------------------------------------------------------------------------
Key: ISPN-5274
URL:
https://issues.jboss.org/browse/ISPN-5274
Project: Infinispan
Issue Type: Bug
Components: Core
Reporter: Matej Čimbora
Assignee: Dan Berindei
Scenario
Nodes edg-perf[10-13], partition handling on
1.Transaction is started on edg-perf10
{code}
10:24:48,228 TRACE [org.infinispan.transaction.xa.TransactionXaAdapter]
(DefaultStressor-6) start called on tx
GlobalTransaction:<edg-perf10-20667>:38910:local
{code}
2. Value of key_000000000000065B is updated within the transaction
{code}
10:24:48,405 TRACE [org.radargun.service.InfinispanOperations$Cache] (DefaultStressor-6)
PUT cache=testCache key=key_000000000000065B value=[6 #15: 296, 501, 1109, 1119, 1459,
1544, 1999, 2083, 2130, 2257, 2298, 2784, 2941, 3009, 3147, ]
{code}
3. Transaction is successfully prepared on edg-perf11 & edg-perf12
{code}
10:24:48,559 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(DefaultStressor-6) Responses: [sender=edg-perf11-61837, received=true, suspected=false]
[sender=edg-perf12-7305, received=true, suspected=false]
{code}
4. Transaction commit is issued
{code}
10:24:48,562 TRACE [org.infinispan.transaction.TransactionCoordinator]
(DefaultStressor-6) Committing transaction
GlobalTransaction:<edg-perf10-20667>:38910:local
{code}
5. Other participating nodes (edg-perf11 & edg-perf12) are suspected...
{code}
10:24:52,705 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(DefaultStressor-6) Target node edg-perf11-61837 left during remote call, ignoring
10:24:52,716 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(DefaultStressor-6) Target node edg-perf12-7305 left during remote call, ignoring
{code}
... as they received a new view (without edg-perf10) meanwhile.
{code}
10:24:48,547 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-1,edg-perf12-7305) ISPN000093: Received new, MERGED cluster view:
MergeView::[edg-perf12-7305|20] (3) [edg-perf12-7305, edg-perf11-61837, edg-perf13-25187],
2 subgroups: [edg-perf13-25187|8] (1) [edg-perf13-25187], [edg-perf13-25187|19] (1)
[edg-perf13-25187]
{code}
Still, the transaction is commited on edg-perf10 & updated entry is stored locally
{code}
10:24:52,894 TRACE [org.infinispan.statetransfer.CommitManager] (DefaultStressor-6)
Trying to commit. Key=key_000000000000065B. Operation Flag=null, L1 invalidation=false
10:24:52,896 TRACE [org.infinispan.statetransfer.CommitManager] (DefaultStressor-6)
Committing key=key_000000000000065B. It is a L1 invalidation or a normal put and no
tracking is enabled!
10:24:52,908 TRACE [org.infinispan.container.DefaultDataContainer] (DefaultStressor-6)
Creating new ICE for writing. Existing=ImmortalCacheEntry{key=key_000000000000065B,
value=[6 #14: 296, 501, 1109, 1119, 1459, 1544, 1999, 2083, 2130, 2257, 2298, 2784, 2941,
3009, ]}, metadata=EmbeddedMetadata{version=null}, new value=[6 #15: 296, 501, 1109, 1119,
1459, 1544, 1999, 2083, 2130, 2257, 2298, 2784, 2941, 3009, 3147, ]
{code}
6. Other nodes rollback the transaction
{code}
10:24:50,376 DEBUG [org.infinispan.transaction.TransactionTable] (transport-thread-10)
Rolling back transaction GlobalTransaction:<edg-perf10-20667>:38910:remote because
originator edg-perf10-20667 left the cluster
{code}
7. edg-perf10 receives a new view, containing nodes edg-perf[10,11,13]. Incoming state
transfer overwrites the updated value
{code}
10:25:09,614 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-2,edg-perf10-20667) ISPN000093: Received new, MERGED cluster view:
MergeView::[edg-perf10-20667|22] (3) [edg-perf10-20667, edg-perf11-61837,
edg-perf13-25187], 4 subgroups: [edg-perf10-20667|15] (2) [edg-perf10-20667,
edg-perf11-61837], [edg-perf10-20667|19] (3) [edg-perf10-20667, edg-perf12-7305,
edg-perf11-61837], [edg-perf10-20667|18] (1) [edg-perf10-20667], [edg-perf10-20667|21] (2)
[edg-perf10-20667, edg-perf13-25187]
{code}
8. get operation returns outdated value
{code}
10:26:21,020 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-6)
Response(s) to ClusteredGetCommand{key=key_000000000000065B, flags=null} is
{edg-perf12-7305=SuccessfulResponse{responseValue=ImmortalCacheValue {value=[6 #14: 296,
501, 1109, 1119, 1459, 1544, 1999, 2083, 2130, 2257, 2298, 2784, 2941, 3009, ]}} }
{code}
From client perspective, this behavior is not transparent. Provided the transaction ended
up successfully, presence of the updated entry can be assumed.