Dan Berindei created ISPN-2410:
----------------------------------
Summary: A PrepareCommand forwarded back to the originator can time out
waiting on a key already locked by itself
Key: ISPN-2410
URL:
https://issues.jboss.org/browse/ISPN-2410
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.2.0.Beta2
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Critical
Fix For: 5.2.0.CR1
If a rebalance happens while a prepare command is executing on a remote node, and the
originator has become an owner, it makes sense to forward the command back to the
originator to lock the keys (or just add them to the backup locks list).
However, we don't keep the old consistent hashes around, so we don't know if the
originator became an owner after invoking the remote command or was already an owner. So
if the topology changed, we always forward the prepare back to the owner.
Back on the originator, minTxTopologyId < currentTopologyId, so the prepare command has
to wait for all the backup locks from pending transactions to be released. The problem is
that we wait for the current transaction as well, causing a deadlock.
Seen in OnePhaseXATest:
{noformat}
18:07:46,873 TRACE (testng-OnePhaseXATest:TestCache) [RpcManagerImpl] NodeA-46125
broadcasting call PrepareCommand {modifications=[PutKeyValueCommand{key=key0, value=value,
flags=null, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}],
onePhaseCommit=false, gtx=GlobalTransaction:<NodeA-46125>:4353:local,
cacheName='TestCache', topologyId=-1} to recipient list null
18:07:46,873 DEBUG (transport-thread-2,NodeA:TestCache) [LocalTopologyManagerImpl]
Updating local consistent hash(es) for cache TestCache: new topology = CacheTopology{id=2,
currentCH=ReplicatedConsistentHash{members=[NodeA-46125, NodeB-49450]}, pendingCH=null}
18:07:46,894 TRACE (OOB-1,ISPN,NodeB-49450:TestCache) [StateTransferManagerImpl]
Forwarding command PrepareCommand {modifications=[PutKeyValueCommand{key=key0,
value=value, flags=null, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}],
onePhaseCommit=false, gtx=GlobalTransaction:<NodeA-46125>:4353:remote,
cacheName='TestCache', topologyId=2} to new targets [NodeA-46125]
18:07:46,935 TRACE (OOB-3,ISPN,NodeA-46125:TestCache) [StateTransferInterceptor]
handleTopologyAffectedCommand for command PrepareCommand
{modifications=[PutKeyValueCommand{key=key0, value=value, flags=null, putIfAbsent=false,
lifespanMillis=-1, maxIdleTimeMillis=-1}], onePhaseCommit=false,
gtx=GlobalTransaction:<NodeA-46125>:4353:remote, cacheName='TestCache',
topologyId=2}, originLocal=false
18:07:46,935 TRACE (OOB-3,ISPN,NodeA-46125:TestCache) [AbstractCacheTransaction]
Transaction gtx=GlobalTransaction:<NodeA-46125>:4353:local potentially locks key
key0? true
18:08:16,874 TRACE (testng-OnePhaseXATest:TestCache) [RpcManagerImpl] replication
exception:
org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to
NodeB-49450
{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira