Dan Berindei created ISPN-5460:
----------------------------------
Summary: Prepare commands sent before the target became a member should be
rejected
Key: ISPN-5460
URL:
https://issues.jboss.org/browse/ISPN-5460
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 7.2.1.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Critical
Fix For: 8.0.0.Alpha1
Since ISPN-4198 was fixed, joiners ignore commands that were sent in a topology in which
they were not members. But the joiner will still send a {{null}} response, which is valid,
so the tx originator will not retry the command.
The owner may have already sent the list of transactions to the joiner before the prepare,
so this sequence of events can happen:
1. A new topology is installed, which includes the joiner (B) in the write consistent
hash. B requests the transactions from A, but doesn't receive anything because the
affected keys of {{GlobalTransaction:<NodeA-40680>:99974:local}} haven't been
updated yet.
{noformat}
19:14:11,027 TRACE (remote-thread-NodeB-p35288-t2:) [StateConsumerImpl] Received new
topology for cache resultCache2, isRebalance = false, isMember = false, topology =
CacheTopology{id=0, rebalanceId=0, currentCH=ReplicatedConsistentHash{ns = 1, owners =
(1)[NodeA-40680: 1]}, pendingCH=null, unionCH=null, actualMembers=[NodeA-40680]}
19:14:11,039 TRACE (transport-thread-NodeA-p35281-t6:) [StateTransferInterceptor]
handleTxCommand for command PrepareCommand {modifications=[PutKeyValueCommand{key=rules,
value=1, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false,
gtx=GlobalTransaction:<NodeA-40680>:99974:local, cacheName='resultCache2',
topologyId=-1}, origin null
19:14:11,044 TRACE (transport-thread-NodeA-p35281-t1:) [StateConsumerImpl] Received new
topology for cache resultCache2, isRebalance = true, isMember = true, topology =
CacheTopology{id=1, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 1, owners =
(1)[NodeA-40680: 1]}, pendingCH=ReplicatedConsistentHash{ns = 1, owners = (2)[NodeA-40680:
1, NodeB-64486: 0]}, unionCH=ReplicatedConsistentHash{ns = 1, owners = (2)[NodeA-40680: 1,
NodeB-64486: 0]}, actualMembers=[NodeA-40680, NodeB-64486]}
19:14:11,045 TRACE (remote-thread-NodeA-p35282-t4:) [StateProviderImpl] Received request
for transactions from node NodeB-64486 for segments [0] of cache resultCache2 with
topology id 1
19:14:11,045 TRACE (remote-thread-NodeA-p35282-t4:) [StateProviderImpl] Skipping
transaction NodeB-64486 because the state requestor doesn't own any key
19:14:11,045 TRACE (remote-thread-NodeA-p35282-t4:) [CommandAwareRpcDispatcher] About to
send back response SuccessfulResponse{responseValue=[]} for command
StateRequestCommand{cache=resultCache2, origin=NodeB-64486, type=GET_TRANSACTIONS,
topologyId=1, segments=[0]}
19:14:11,045 DEBUG (transport-thread-NodeB-p35287-t3:) [StateConsumerImpl] Applying 0
transactions for cache resultCache2 transferred from node NodeA-40680
{noformat}
2. Node A adds the affected key and sends the prepare command to B with topology 0. B
ignores the command.
{noformat}
19:14:11,039 TRACE (transport-thread-NodeA-p35281-t6:) [AbstractCacheTransaction]
Registering locked key: rules
19:14:11,047 TRACE (transport-thread-NodeA-p35281-t6:) [JGroupsTransport] dests=null,
command=PrepareCommand {modifications=[PutKeyValueCommand{key=rules, value=1, flags=null,
putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null},
successful=true}], onePhaseCommit=false,
gtx=GlobalTransaction:<NodeA-40680>:99974:local, cacheName='resultCache2',
topologyId=0}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=15000
19:14:11,048 TRACE (OOB-4,NodeB-64486:) [GlobalInboundInvocationHandler] Attempting to
execute CacheRpcCommand: PrepareCommand {modifications=[PutKeyValueCommand{key=rules,
value=1, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false,
gtx=GlobalTransaction:<NodeA-40680>:99974:local, cacheName='resultCache2',
topologyId=0} [sender=NodeA-40680]
19:14:11,048 TRACE (remote-thread-NodeB-p35288-t6:)
[NonTotalOrderPerCacheInboundInvocationHandler] Ignoring command sent before the local
node was a member (command topology id is 0)
19:14:11,048 TRACE (remote-thread-NodeB-p35288-t6:) [CommandAwareRpcDispatcher] About to
send back response null for command PrepareCommand
{modifications=[PutKeyValueCommand{key=rules, value=1, flags=null, putIfAbsent=false,
valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}],
onePhaseCommit=false, gtx=GlobalTransaction:<NodeA-40680>:99974:remote,
cacheName='resultCache2', topologyId=0}
19:14:11,058 TRACE (transport-thread-NodeA-p35281-t6:) [RpcManagerImpl] Response(s) to
PrepareCommand {modifications=[PutKeyValueCommand{key=rules, value=1, flags=null,
putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null},
successful=true}], onePhaseCommit=false,
gtx=GlobalTransaction:<NodeA-40680>:99974:local, cacheName='resultCache2',
topologyId=0} is {}
{noformat}
3. A tries to commit the tx, but B throws an exception because it can't find the
remote tx.
{noformat}
19:14:11,066 TRACE (transport-thread-NodeA-p35281-t6:) [LocalTransaction] Adding remote
locks on [NodeA-40680, NodeB-64486]. Remote locks are null
19:14:11,066 TRACE (transport-thread-NodeA-p35281-t6:) [TransactionCoordinator] Committing
transaction GlobalTransaction:<NodeA-40680>:99974:local
19:14:11,066 TRACE (transport-thread-NodeA-p35281-t6:) [RpcManagerImpl] NodeA-40680
invoking CommitCommand {gtx=GlobalTransaction:<NodeA-40680>:99974:local,
cacheName='resultCache2', topologyId=2} to recipient list null with options
RpcOptions{timeout=15000, unit=MILLISECONDS, deliverOrder=NONE, responseFilter=null,
responseMode=SYNCHRONOUS_IGNORE_LEAVERS, skipReplicationQueue=false}
19:14:11,066 TRACE (OOB-5,NodeB-64486:) [GlobalInboundInvocationHandler] Attempting to
execute CacheRpcCommand: CommitCommand
{gtx=GlobalTransaction:<NodeA-40680>:99974:local, cacheName='resultCache2',
topologyId=2} [sender=NodeA-40680]
19:14:11,066 TRACE (remote-thread-NodeB-p35288-t6:)
[NonTotalOrderPerCacheInboundInvocationHandler] Calling perform() on CommitCommand
{gtx=GlobalTransaction:<NodeA-40680>:99974:remote, cacheName='resultCache2',
topologyId=2}
19:14:11,066 TRACE (remote-thread-NodeB-p35288-t6:) [AbstractTransactionBoundaryCommand]
Did not find a RemoteTransaction for GlobalTransaction:<NodeA-40680>:99974:remote
19:14:11,067 WARN (remote-thread-NodeB-p35288-t6:)
[NonTotalOrderPerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling
command CommitCommand {gtx=GlobalTransaction:<NodeA-40680>:99974:remote,
cacheName='resultCache2', topologyId=2}
java.lang.IllegalStateException: Remote transaction not found:
GlobalTransaction:<NodeA-40680>:99974:remote
at
org.infinispan.commands.tx.CommitCommand.invalidRemoteTxReturnValue(CommitCommand.java:54)
at
org.infinispan.commands.tx.AbstractTransactionBoundaryCommand.perform(AbstractTransactionBoundaryCommand.java:89)
at
org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokePerform(BasePerCacheInboundInvocationHandler.java:84)
at
org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:31)
{noformat}
This is causing random failures in the map/reduce tests (e.g.
DistributedSharedCacheTwoNodesMapReduceTest), because map/reduce doesn't wait for all
the nodes to join before inserting in the cache.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)