[
https://issues.jboss.org/browse/ISPN-2291?page=com.atlassian.jira.plugin....
]
Erik Salter commented on ISPN-2291:
-----------------------------------
So here's the root cause -- the StaleTransactionCleanupService can remove transactions
from the tx table while there are pending locks. I've seen 3 cases of this during a
rehash:
1. A RemoteTransaction was registered and the lock was attempting to be acquired. The tx
is cleaned up by the StaleTxCleanupService in another thread since there were no locks yet
acquired. Afterwards, the invocation context acquires the lock and registers it anyway.
2. Similar to the above, but the invocation context is waiting in the interceptor on the
pending locks (lockKeyAndCheckOwnership). By the time it finishes waiting on the pending
locks, the tx has been cleaned up.
3. I've seen the tx get cleaned up AS SOON as the RemoteTransaction is registered
with the txTable.
..
2012-10-06 18:15:14,249 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl]
(OOB-19,erm-cluster,phl-dg3-8896(phl)) Calling perform() on
LockControlCommand{cache=serviceGroup,
keys=[ServiceGroupKey[edgeDeviceId=4,serviceGroupNo=401]], flags=null, unlock=false}
2012-10-06 18:15:14,249 TRACE [org.infinispan.transaction.TransactionTable]
(OOB-19,erm-cluster,phl-dg3-8896(phl)) Created and registered remote transaction
RemoteTransaction{modifications=[], lookedUpEntries={}, lockedKeys= null, backupKeyLocks
null, missingLookedUpEntries false,
tx=GlobalTransaction:<phl-dg2-51640(phl)>:8900:remote}
...
2012-10-06 18:15:14,250 TRACE [org.infinispan.transaction.StaleTransactionCleanupService]
(transport-thread-20) Killing remote transaction without any local keys
GlobalTransaction:<phl-dg2-51640(phl)>:8900:remote
Tx rollback during state transfer has stale locks
-------------------------------------------------
Key: ISPN-2291
URL:
https://issues.jboss.org/browse/ISPN-2291
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.2.0.Alpha3, 5.2.0.Beta1
Reporter: Erik Salter
Assignee: Adrian Nistor
Priority: Critical
Fix For: 5.2.0.CR1
There are stale locks that happened when a transaction failed due to a replication
timeout to its peer node. The transaction contained grouped keys that were submitted to
the primary owner. During execution of one such transaction, there was a state transfer,
which changed ownership of these keys.
Here's an example flow:
The task executed and started a transaction. Since it was not the primary owner, it sent
a LockControlCommand to the new owner. This call timed out, and a rollback was issued.
The local transaction is completed, but subsequent attempts to lock this key fail.
The logs can be found here:
http://dl.dropbox.com/u/50401510/5.2.0.ALPHA3/lock/10.30.12.83/server.log.gz
http://dl.dropbox.com/u/50401510/5.2.0.ALPHA3/lock/10.30.12.84/server.log.gz
http://dl.dropbox.com/u/50401510/5.2.0.ALPHA3/lock/10.30.12.85/server.log.gz
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira