[infinispan-issues] [JBoss JIRA] (ISPN-2410) A PrepareCommand forwarded back to the originator can time out waiting on a key already locked by itself

Tuesday, 30 October 2012

    [
https://issues.jboss.org/browse/ISPN-2410?page=com.atlassian.jira.plugin....
] 

Dan Berindei commented on ISPN-2410:
------------------------------------

If the originator became the primary owner, it means that either the originator was a
backup owner already (and so had the backup locks) or all the previous owners left the
cache - in which case I don't think we need to worry *too* much about consistency.

The problem I think is with keys for which the originator just became a backup owner. The
commit will assume that the transaction already has those keys in its lookup table, but it
doesn't: the prepare command that was forwarded back created it's own
{{RemoteTransaction}} object, which is isolated from the original {{LocalTransaction}}.

I'm considering renaming {{RemoteTransaction}} to {{CacheTransactionImpl}} and
changing {{LocalTransaction}} to wrap a {{CacheTransaction}} instead of implementing it.
This would allow the local command and the remote one to access the same transaction
object, but it might involve changing a lot of transaction-related code. What do you
think, Mircea?

...
 A PrepareCommand forwarded back to the originator can time out
waiting on a key already locked by itself

--------------------------------------------------------------------------------------------------------

                 Key: ISPN-2410
                 URL: https://issues.jboss.org/browse/ISPN-2410
             Project: Infinispan
          Issue Type: Bug
          Components: State transfer
    Affects Versions: 5.2.0.Beta2
            Reporter: Dan Berindei
            Assignee: Dan Berindei
            Priority: Critical
             Fix For: 5.2.0.Beta3

 If a rebalance happens while a prepare command is executing on a remote node, and the
originator has become an owner, it makes sense to forward the command back to the
originator to lock the keys (or just add them to the backup locks list).
 However, we don't keep the old consistent hashes around, so we don't know if the
originator became an owner after invoking the remote command or was already an owner. So
if the topology changed, we always forward the prepare back to the owner.
 Back on the originator, minTxTopologyId < currentTopologyId, so the prepare command
has to wait for all the backup locks from pending transactions to be released. The problem
is that we wait for the current transaction as well, causing a deadlock.
 Seen in OnePhaseXATest:
 {noformat}
 18:07:46,873 TRACE (testng-OnePhaseXATest:TestCache) [RpcManagerImpl] NodeA-46125
broadcasting call PrepareCommand {modifications=[PutKeyValueCommand{key=key0, value=value,
flags=null, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}],
onePhaseCommit=false, gtx=GlobalTransaction:<NodeA-46125>:4353:local,
cacheName='TestCache', topologyId=-1} to recipient list null
 18:07:46,873 DEBUG (transport-thread-2,NodeA:TestCache) [LocalTopologyManagerImpl]
Updating local consistent hash(es) for cache TestCache: new topology = CacheTopology{id=2,
currentCH=ReplicatedConsistentHash{members=[NodeA-46125, NodeB-49450]}, pendingCH=null}
 18:07:46,894 TRACE (OOB-1,ISPN,NodeB-49450:TestCache) [StateTransferManagerImpl]
Forwarding command PrepareCommand {modifications=[PutKeyValueCommand{key=key0,
value=value, flags=null, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}],
onePhaseCommit=false, gtx=GlobalTransaction:<NodeA-46125>:4353:remote,
cacheName='TestCache', topologyId=2} to new targets [NodeA-46125]
 18:07:46,935 TRACE (OOB-3,ISPN,NodeA-46125:TestCache) [StateTransferInterceptor]
handleTopologyAffectedCommand for command PrepareCommand
{modifications=[PutKeyValueCommand{key=key0, value=value, flags=null, putIfAbsent=false,
lifespanMillis=-1, maxIdleTimeMillis=-1}], onePhaseCommit=false,
gtx=GlobalTransaction:<NodeA-46125>:4353:remote, cacheName='TestCache',
topologyId=2}, originLocal=false
 18:07:46,935 TRACE (OOB-3,ISPN,NodeA-46125:TestCache) [AbstractCacheTransaction]
Transaction gtx=GlobalTransaction:<NodeA-46125>:4353:local potentially locks key
key0? true
 18:08:16,874 TRACE (testng-OnePhaseXATest:TestCache) [RpcManagerImpl] replication
exception: 
 org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to
NodeB-49450
 {noformat} 
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] (ISPN-2410) A PrepareCommand forwarded back to the originator can time out waiting on a key already locked by itself