[infinispan-issues] [JBoss JIRA] (ISPN-5261) Transaction is recommited by joining node during merge

Matej Čimbora (JIRA) issues at jboss.org
Tue Mar 17 10:47:20 EDT 2015


    [ https://issues.jboss.org/browse/ISPN-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051027#comment-13051027 ] 

Matej Čimbora commented on ISPN-5261:
-------------------------------------

I came across a variation of the original problem described by this JIRA. The impact is slightly different - transaction is commited on TX originator (A) and one of the participating nodes (B), however it gets rollbacked on another node (C). When get(K) is invoked on A, B & C send response, however each of them sends a different 'version' of the entry, depending on success of the transaction on particular node.  

Scenario - 4 nodes - edg-perf01-04

1. Transaction is started on edg-perf01, K is updated within transaction
{code}
13:55:14,641 TRACE [org.infinispan.transaction.xa.TransactionXaAdapter] (DefaultStressor-1) start called on tx GlobalTransaction:<edg-perf01-60360>:181323:local
{code}

2. Transaction is successfully prepared on edg-perf02 & edg-perf03

3. edg-perf01 invokes a commit command, which is sent to 02 & 03, check messageId & seqno
{code}
13:55:36,061 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (DefaultStressor-1) real_dests=[edg-perf02-25521, edg-perf03-30958]
13:55:36,063 TRACE [org.jgroups.protocols.TCP] (DefaultStressor-1) edg-perf01-60360: sending msg to null, src=edg-perf01-60360, headers are RequestCorrelator: id=200, type=REQ, id=977458, rsp_expected=true, exclusion_list=[edg-perf01-60360], NAKACK2: [MSG, seqno=82297], 20903: slaveIndex=0, TCP: [cluster_name=default]
{code}

4. On edg-perf02, the message is received & delivered. Commit is performed & entry is stored in DataContainer
{code}
13:55:36,086 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (OOB-243,edg-perf02-25521) edg-perf02-25521: received edg-perf01-60360#82297
13:55:36,086 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (OOB-243,edg-perf02-25521) edg-perf02-25521: delivering edg-perf01-60360#82297
...
13:55:36,148 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-86) About to send back response SuccessfulResponse{responseValue=null}  for command CommitCommand {gtx=GlobalTransaction:<edg-perf01-60360>:181323:remote, cacheName='testCache', topologyId=228}
{code}

5. As edg-perf03 received a new view, which does not contain TX originator, transaction gets rollbacked & removed from transaction table
{code}
13:55:36,018 DEBUG [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (OOB-2643,edg-perf03-30958) New view accepted: [edg-perf03-30958|70] (1) [edg-perf03-30958]
...
13:55:36,774 DEBUG [org.infinispan.transaction.TransactionTable] (transport-thread-24) Rolling back transaction GlobalTransaction:<edg-perf01-60360>:181323:remote because originator edg-perf01-60360 left the cluster
{code}
edg-perf03 then receives a new view, requests retransmission & edg-perf01 sends the commit once again
{code}
13:55:38,719 DEBUG [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-1,edg-perf03-30958) New view accepted: MergeView::[edg-perf02-25521|71] (4) [edg-perf02-25521, edg-perf03-30958, edg-perf01-60360, edg-perf04-55740], 2 subgroups: [edg-perf02-25521|69] (3) [edg-perf02-25521, edg-perf03-30958, edg-perf01-60360], [edg-perf04-55740|68] (1) [edg-perf04-55740]
...
13:55:40,798 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (Timer-3,edg-perf03-30958) edg-perf03-30958: sending XMIT_REQ ((7): {82296-82302}) to edg-perf01-60360
13:55:40,811 TRACE [org.jgroups.protocols.TCP] (Timer-3,edg-perf03-30958) edg-perf03-30958: sending msg to edg-perf01-60360, src=edg-perf03-30958, headers are NAKACK2: [XMIT_REQ, sender=edg-perf01-60360], 20903: slaveIndex=2, TCP: [cluster_name=default]
13:55:41,055 TRACE [org.jgroups.protocols.TCP] (INT-428,edg-perf03-30958) edg-perf03-30958: received [dst: edg-perf03-30958, src: edg-perf01-60360 (4 headers), size=64 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|INTERNAL], headers are RequestCorrelator: id=200, type=REQ, id=977458, rsp_expected=true, exclusion_list=[edg-perf01-60360], NAKACK2: [XMIT_RSP, seqno=82297], 20903: slaveIndex=0, TCP: [cluster_name=default]
13:55:41,081 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (INT-428,edg-perf03-30958) edg-perf03-30958: received edg-perf01-60360#82297
13:55:41,081 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (INT-428,edg-perf03-30958) edg-perf03-30958: delivering edg-perf01-60360#82297
{code}
Transaction is not present in TX table and successful response is sent to edg-perf01 
{code}
13:55:41,177 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (INT-428,edg-perf03-30958) Attempting to execute command: CommitCommand {gtx=GlobalTransaction:<edg-perf01-60360>:181323:local, cacheName='testCache', topologyId=228} [sender=edg-perf01-60360]
13:55:41,197 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-978) Calling perform() on CommitCommand {gtx=GlobalTransaction:<edg-perf01-60360>:181323:remote, cacheName='testCache', topologyId=228}
13:55:41,197 TRACE [org.infinispan.commands.tx.AbstractTransactionBoundaryCommand] (remote-thread-978) Did not find a RemoteTransaction for GlobalTransaction:<edg-perf01-60360>:181323:remote
13:55:41,215 TRACE [org.jgroups.blocks.RequestCorrelator] (remote-thread-978) sending rsp for 977458 to edg-perf01-60360
13:55:41,215 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-978) About to send back response SuccessfulResponse{responseValue=null}  for command CommitCommand {gtx=GlobalTransaction:<edg-perf01-60360>:181323:remote, cacheName='testCache', topologyId=228}
{code}

6. edg-perf01 receives successful responses for commit command from both edg-perf02 & edg-perf03
{code}
13:55:41,230 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-1) Response(s) to CommitCommand {gtx=GlobalTransaction:<edg-perf01-60360>:181323:local, cacheName='testCache', topologyId=228} is {edg-perf03-30958=SuccessfulResponse{responseValue=null} , edg-perf02-25521=SuccessfulResponse{responseValue=null} }
{code}

7. Transaction is commited again in new topology, further not affecting K.
{code}
13:55:41,252 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-1) edg-perf01-60360 invoking CommitCommand {gtx=GlobalTransaction:<edg-perf01-60360>:181323:local, cacheName='testCache', topologyId=233} to recipient list [edg-perf02-25521, edg-perf03-30958, edg-perf04-55740] with options RpcOptions{timeout=60000, unit=MILLISECONDS, fifoOrder=false, totalOrder=false, responseFilter=null, responseMode=SYNCHRONOUS, skipReplicationQueue=false}
...
13:55:54,118 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-1) Response(s) to CommitCommand {gtx=GlobalTransaction:<edg-perf01-60360>:181323:local, cacheName='testCache', topologyId=233} is {edg-perf03-30958=SuccessfulResponse{responseValue=null} , edg-perf04-55740=SuccessfulResponse{responseValue=null} , edg-perf02-25521=SuccessfulResponse{responseValue=null} }
{code}

8. Later, when get(K) is invoked on edg-perf01, both 02 & 03 send their response. However, each of them possesses a different version of the entry, leading to unexpected return value when message from 03 is received first.

> Transaction is recommited by joining node during merge
> ------------------------------------------------------
>
>                 Key: ISPN-5261
>                 URL: https://issues.jboss.org/browse/ISPN-5261
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 7.1.1.Final
>            Reporter: Matej Čimbora
>            Assignee: Dan Berindei
>
> Transaction gets recommited after a node joins a cluster during merge and requests retransmission of previous messages, overwriting updates which happened till that point.
> 4 nodes - edg-perf01-04
> Scenario:
> 1. edg-perf01 begins a new transaction & updates value of K
> {code}
> 06:50:50,421 TRACE [org.infinispan.transaction.xa.TransactionXaAdapter] (DefaultStressor-7) start called on tx GlobalTransaction:<edg-perf01-19349>:3516:local
> {code}
> The latest view on edg-perf01 is
> {code}
> 06:50:35,103 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf01-19349) ISPN000093: Received new, MERGED cluster view: MergeView::[edg-perf01-19349|5] (3) [edg-perf01-19349, edg-perf04-20753, edg-perf02-19191], 1 subgroups: [edg-perf01-19349|4] (2) [edg-perf01-19349, edg-perf02-19191]
> {code}
> 2. Prepare command is sent with nodes edg-perf02 & edg-perf04 responding successfully
> {code}
> 06:50:51,186 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-7) edg-perf01-19349 invoking PrepareCommand ... topologyId=12} to recipient list [edg-perf02-19191, edg-perf04-20753, edg-perf01-19349] ...
> {code}
> check message id & seqno
> {code}
> (06:50:51,195 TRACE [org.jgroups.protocols.TCP] (DefaultStressor-7) edg-perf01-19349: sending msg to null, src=edg-perf01-19349, headers are RequestCorrelator: id=200, type=REQ, id=17134, rsp_expected=true, exclusion_list=[edg-perf01-19349], NAKACK2: [MSG, seqno=1020], 20903: slaveIndex=0, TCP: [cluster_name=default])
> {code}
> 3. Transaction gets commited & value of K gets updated multiple times on edg-perf01 from this point on
> 4. edg-perf03 receives a new view containing all nodes
> {code}
> 06:50:53,963 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf03-14171) ISPN000093: Received new, MERGED cluster view: MergeView::[edg-perf01-19349|6] (4) [edg-perf01-19349, edg-perf03-14171, edg-perf04-20753, edg-perf02-19191], 3 subgroups: [edg-perf03-14171|4] (2) [edg-perf03-14171, edg-perf04-20753], [edg-perf01-19349|5] (3) [edg-perf01-19349, edg-perf04-20753, edg-perf02-19191], [edg-perf01-19349|4] (2) [edg-perf01-19349, edg-perf02-19191]
> {code}
> 5. edg-perf03 apparently requests retransmission of previous messages (containing prepare from step#2) & edg-perf01 sends the response
> Request
> {code}
> 06:51:35,241 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (Timer-3,edg-perf03-14171) edg-perf03-14171: sending XMIT_REQ ((79): {1015-1093}) to edg-perf01-19349
> {code}
> Response
> {code}
> 06:51:35,299 TRACE [org.jgroups.protocols.TCP] (INT-14,edg-perf03-14171) edg-perf03-14171: received [dst: edg-perf03-14171, src: edg-perf01-19349 (4 headers), size=975 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|INTERNAL], headers are RequestCorrelator: id=200, type=REQ, id=17134, rsp_expected=true, exclusion_list=[edg-perf01-19349], NAKACK2: [XMIT_RSP, seqno=1020], 20903: slaveIndex=0, TCP: [cluster_name=default]
> {code}
> 6. edg-perf03 prepares the old transaction & eventually commits it again
> {code}
> 06:51:35,304 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (INT-14,edg-perf03-14171) Attempting to execute command: PrepareCommand.....gtx=GlobalTransaction:<edg-perf01-19349>:3516:local, cacheName='testCache', topologyId=12} [sender=edg-perf01-19349]
> 06:51:35,304 TRACE [org.infinispan.statetransfer.StateTransferLockImpl] (INT-14,edg-perf03-14171) Checking if transaction data was received for topology 12, current topology is 14
> 06:51:35,305 TRACE [org.infinispan.transaction.TransactionTable] (remote-thread-40) Created and registered remote transaction ... tx=GlobalTransaction:<edg-perf01-19349>:3516:remote, ...
> 06:51:35,308 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (remote-thread-40) edg-perf03-14171 invoking PrepareCommand ... gtx=GlobalTransaction:<edg-perf01-19349>:3516:remote, cacheName='testCache', topologyId=15} to recipient list [edg-perf04-20753, edg-perf02-19191] ...
> {code}
> Successful commit responses
> {code}
> 06:51:36,750 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (remote-thread-44) Response(s) to CommitCommand {gtx=GlobalTransaction:<edg-perf01-19349>:3516:remote, cacheName='testCache', topologyId=15} is {edg-perf02-19191=SuccessfulResponse{responseValue=null} , edg-perf04-20753=SuccessfulResponse{responseValue=null} }
> {code}
> 7. When get(K) is invoked on edg-perf01, stale value is returned (the one from step #6, ignoring updates which have happened since step #3) 



--
This message was sent by Atlassian JIRA
(v6.3.11#6341)



More information about the infinispan-issues mailing list