[
https://issues.jboss.org/browse/ISPN-5261?page=com.atlassian.jira.plugin....
]
Matej Čimbora commented on ISPN-5261:
-------------------------------------
I came across a variation of the original problem described by this JIRA. The impact is
slightly different - transaction is commited on TX originator (A) and one of the
participating nodes (B), however it gets rollbacked on another node (C). When get(K) is
invoked on A, B & C send response, however each of them sends a different
'version' of the entry, depending on success of the transaction on particular
node.
Scenario - 4 nodes - edg-perf01-04
1. Transaction is started on edg-perf01, K is updated within transaction
{code}
13:55:14,641 TRACE [org.infinispan.transaction.xa.TransactionXaAdapter]
(DefaultStressor-1) start called on tx
GlobalTransaction:<edg-perf01-60360>:181323:local
{code}
2. Transaction is successfully prepared on edg-perf02 & edg-perf03
3. edg-perf01 invokes a commit command, which is sent to 02 & 03, check messageId
& seqno
{code}
13:55:36,061 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(DefaultStressor-1) real_dests=[edg-perf02-25521, edg-perf03-30958]
13:55:36,063 TRACE [org.jgroups.protocols.TCP] (DefaultStressor-1) edg-perf01-60360:
sending msg to null, src=edg-perf01-60360, headers are RequestCorrelator: id=200,
type=REQ, id=977458, rsp_expected=true, exclusion_list=[edg-perf01-60360], NAKACK2: [MSG,
seqno=82297], 20903: slaveIndex=0, TCP: [cluster_name=default]
{code}
4. On edg-perf02, the message is received & delivered. Commit is performed & entry
is stored in DataContainer
{code}
13:55:36,086 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (OOB-243,edg-perf02-25521)
edg-perf02-25521: received edg-perf01-60360#82297
13:55:36,086 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (OOB-243,edg-perf02-25521)
edg-perf02-25521: delivering edg-perf01-60360#82297
...
13:55:36,148 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl]
(remote-thread-86) About to send back response SuccessfulResponse{responseValue=null} for
command CommitCommand {gtx=GlobalTransaction:<edg-perf01-60360>:181323:remote,
cacheName='testCache', topologyId=228}
{code}
5. As edg-perf03 received a new view, which does not contain TX originator, transaction
gets rollbacked & removed from transaction table
{code}
13:55:36,018 DEBUG [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(OOB-2643,edg-perf03-30958) New view accepted: [edg-perf03-30958|70] (1)
[edg-perf03-30958]
...
13:55:36,774 DEBUG [org.infinispan.transaction.TransactionTable] (transport-thread-24)
Rolling back transaction GlobalTransaction:<edg-perf01-60360>:181323:remote because
originator edg-perf01-60360 left the cluster
{code}
edg-perf03 then receives a new view, requests retransmission & edg-perf01 sends the
commit once again
{code}
13:55:38,719 DEBUG [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-1,edg-perf03-30958) New view accepted: MergeView::[edg-perf02-25521|71] (4)
[edg-perf02-25521, edg-perf03-30958, edg-perf01-60360, edg-perf04-55740], 2 subgroups:
[edg-perf02-25521|69] (3) [edg-perf02-25521, edg-perf03-30958, edg-perf01-60360],
[edg-perf04-55740|68] (1) [edg-perf04-55740]
...
13:55:40,798 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (Timer-3,edg-perf03-30958)
edg-perf03-30958: sending XMIT_REQ ((7): {82296-82302}) to edg-perf01-60360
13:55:40,811 TRACE [org.jgroups.protocols.TCP] (Timer-3,edg-perf03-30958)
edg-perf03-30958: sending msg to edg-perf01-60360, src=edg-perf03-30958, headers are
NAKACK2: [XMIT_REQ, sender=edg-perf01-60360], 20903: slaveIndex=2, TCP:
[cluster_name=default]
13:55:41,055 TRACE [org.jgroups.protocols.TCP] (INT-428,edg-perf03-30958)
edg-perf03-30958: received [dst: edg-perf03-30958, src: edg-perf01-60360 (4 headers),
size=64 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|INTERNAL], headers are
RequestCorrelator: id=200, type=REQ, id=977458, rsp_expected=true,
exclusion_list=[edg-perf01-60360], NAKACK2: [XMIT_RSP, seqno=82297], 20903: slaveIndex=0,
TCP: [cluster_name=default]
13:55:41,081 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (INT-428,edg-perf03-30958)
edg-perf03-30958: received edg-perf01-60360#82297
13:55:41,081 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (INT-428,edg-perf03-30958)
edg-perf03-30958: delivering edg-perf01-60360#82297
{code}
Transaction is not present in TX table and successful response is sent to edg-perf01
{code}
13:55:41,177 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(INT-428,edg-perf03-30958) Attempting to execute command: CommitCommand
{gtx=GlobalTransaction:<edg-perf01-60360>:181323:local,
cacheName='testCache', topologyId=228} [sender=edg-perf01-60360]
13:55:41,197 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl]
(remote-thread-978) Calling perform() on CommitCommand
{gtx=GlobalTransaction:<edg-perf01-60360>:181323:remote,
cacheName='testCache', topologyId=228}
13:55:41,197 TRACE [org.infinispan.commands.tx.AbstractTransactionBoundaryCommand]
(remote-thread-978) Did not find a RemoteTransaction for
GlobalTransaction:<edg-perf01-60360>:181323:remote
13:55:41,215 TRACE [org.jgroups.blocks.RequestCorrelator] (remote-thread-978) sending rsp
for 977458 to edg-perf01-60360
13:55:41,215 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl]
(remote-thread-978) About to send back response SuccessfulResponse{responseValue=null}
for command CommitCommand {gtx=GlobalTransaction:<edg-perf01-60360>:181323:remote,
cacheName='testCache', topologyId=228}
{code}
6. edg-perf01 receives successful responses for commit command from both edg-perf02 &
edg-perf03
{code}
13:55:41,230 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-1)
Response(s) to CommitCommand {gtx=GlobalTransaction:<edg-perf01-60360>:181323:local,
cacheName='testCache', topologyId=228} is
{edg-perf03-30958=SuccessfulResponse{responseValue=null} ,
edg-perf02-25521=SuccessfulResponse{responseValue=null} }
{code}
7. Transaction is commited again in new topology, further not affecting K.
{code}
13:55:41,252 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-1)
edg-perf01-60360 invoking CommitCommand
{gtx=GlobalTransaction:<edg-perf01-60360>:181323:local,
cacheName='testCache', topologyId=233} to recipient list [edg-perf02-25521,
edg-perf03-30958, edg-perf04-55740] with options RpcOptions{timeout=60000,
unit=MILLISECONDS, fifoOrder=false, totalOrder=false, responseFilter=null,
responseMode=SYNCHRONOUS, skipReplicationQueue=false}
...
13:55:54,118 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-1)
Response(s) to CommitCommand {gtx=GlobalTransaction:<edg-perf01-60360>:181323:local,
cacheName='testCache', topologyId=233} is
{edg-perf03-30958=SuccessfulResponse{responseValue=null} ,
edg-perf04-55740=SuccessfulResponse{responseValue=null} ,
edg-perf02-25521=SuccessfulResponse{responseValue=null} }
{code}
8. Later, when get(K) is invoked on edg-perf01, both 02 & 03 send their response.
However, each of them possesses a different version of the entry, leading to unexpected
return value when message from 03 is received first.
Transaction is recommited by joining node during merge
------------------------------------------------------
Key: ISPN-5261
URL:
https://issues.jboss.org/browse/ISPN-5261
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 7.1.1.Final
Reporter: Matej Čimbora
Assignee: Dan Berindei
Transaction gets recommited after a node joins a cluster during merge and requests
retransmission of previous messages, overwriting updates which happened till that point.
4 nodes - edg-perf01-04
Scenario:
1. edg-perf01 begins a new transaction & updates value of K
{code}
06:50:50,421 TRACE [org.infinispan.transaction.xa.TransactionXaAdapter]
(DefaultStressor-7) start called on tx
GlobalTransaction:<edg-perf01-19349>:3516:local
{code}
The latest view on edg-perf01 is
{code}
06:50:35,103 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-2,edg-perf01-19349) ISPN000093: Received new, MERGED cluster view:
MergeView::[edg-perf01-19349|5] (3) [edg-perf01-19349, edg-perf04-20753,
edg-perf02-19191], 1 subgroups: [edg-perf01-19349|4] (2) [edg-perf01-19349,
edg-perf02-19191]
{code}
2. Prepare command is sent with nodes edg-perf02 & edg-perf04 responding
successfully
{code}
06:50:51,186 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-7)
edg-perf01-19349 invoking PrepareCommand ... topologyId=12} to recipient list
[edg-perf02-19191, edg-perf04-20753, edg-perf01-19349] ...
{code}
check message id & seqno
{code}
(06:50:51,195 TRACE [org.jgroups.protocols.TCP] (DefaultStressor-7) edg-perf01-19349:
sending msg to null, src=edg-perf01-19349, headers are RequestCorrelator: id=200,
type=REQ, id=17134, rsp_expected=true, exclusion_list=[edg-perf01-19349], NAKACK2: [MSG,
seqno=1020], 20903: slaveIndex=0, TCP: [cluster_name=default])
{code}
3. Transaction gets commited & value of K gets updated multiple times on edg-perf01
from this point on
4. edg-perf03 receives a new view containing all nodes
{code}
06:50:53,963 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-2,edg-perf03-14171) ISPN000093: Received new, MERGED cluster view:
MergeView::[edg-perf01-19349|6] (4) [edg-perf01-19349, edg-perf03-14171, edg-perf04-20753,
edg-perf02-19191], 3 subgroups: [edg-perf03-14171|4] (2) [edg-perf03-14171,
edg-perf04-20753], [edg-perf01-19349|5] (3) [edg-perf01-19349, edg-perf04-20753,
edg-perf02-19191], [edg-perf01-19349|4] (2) [edg-perf01-19349, edg-perf02-19191]
{code}
5. edg-perf03 apparently requests retransmission of previous messages (containing prepare
from step#2) & edg-perf01 sends the response
Request
{code}
06:51:35,241 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (Timer-3,edg-perf03-14171)
edg-perf03-14171: sending XMIT_REQ ((79): {1015-1093}) to edg-perf01-19349
{code}
Response
{code}
06:51:35,299 TRACE [org.jgroups.protocols.TCP] (INT-14,edg-perf03-14171)
edg-perf03-14171: received [dst: edg-perf03-14171, src: edg-perf01-19349 (4 headers),
size=975 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|INTERNAL], headers are
RequestCorrelator: id=200, type=REQ, id=17134, rsp_expected=true,
exclusion_list=[edg-perf01-19349], NAKACK2: [XMIT_RSP, seqno=1020], 20903: slaveIndex=0,
TCP: [cluster_name=default]
{code}
6. edg-perf03 prepares the old transaction & eventually commits it again
{code}
06:51:35,304 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(INT-14,edg-perf03-14171) Attempting to execute command:
PrepareCommand.....gtx=GlobalTransaction:<edg-perf01-19349>:3516:local,
cacheName='testCache', topologyId=12} [sender=edg-perf01-19349]
06:51:35,304 TRACE [org.infinispan.statetransfer.StateTransferLockImpl]
(INT-14,edg-perf03-14171) Checking if transaction data was received for topology 12,
current topology is 14
06:51:35,305 TRACE [org.infinispan.transaction.TransactionTable] (remote-thread-40)
Created and registered remote transaction ...
tx=GlobalTransaction:<edg-perf01-19349>:3516:remote, ...
06:51:35,308 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (remote-thread-40)
edg-perf03-14171 invoking PrepareCommand ...
gtx=GlobalTransaction:<edg-perf01-19349>:3516:remote, cacheName='testCache',
topologyId=15} to recipient list [edg-perf04-20753, edg-perf02-19191] ...
{code}
Successful commit responses
{code}
06:51:36,750 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (remote-thread-44)
Response(s) to CommitCommand {gtx=GlobalTransaction:<edg-perf01-19349>:3516:remote,
cacheName='testCache', topologyId=15} is
{edg-perf02-19191=SuccessfulResponse{responseValue=null} ,
edg-perf04-20753=SuccessfulResponse{responseValue=null} }
{code}
7. When get(K) is invoked on edg-perf01, stale value is returned (the one from step #6,
ignoring updates which have happened since step #3)
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)