[infinispan-issues] [JBoss JIRA] (ISPN-2316) Distributed deadlock in StateTransferInterceptor
Dan Berindei (JIRA)
jira-events at lists.jboss.org
Wed Sep 26 07:04:35 EDT 2012
[ https://issues.jboss.org/browse/ISPN-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dan Berindei updated ISPN-2316:
-------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/1342
Reorganize the StateTransferLock:
Remove the transactions lock and the commands lock.
Split waitForTopology() into waitForTopology() and waitForTransactionData()
Introduce a new read-write topology lock to replace the commands lock.
It's ok to remove the transactions lock because any transaction
modification (after the transactions table snapshot was taken)
is guaranteed to be forwarded to the new owners.
waitForTopology() is now used only to ensure that when joiners start requesting
state, every other node already knows about them.
waitForTransactionData() is used to block commands until the local node
has received transaction data for all the new segments.
The new read-write topology lock is held exclusively during topology changes
and non-exclusively during commits to the data container - so that data not
owned by the local node is not written to the data container.
> Distributed deadlock in StateTransferInterceptor
> ------------------------------------------------
>
> Key: ISPN-2316
> URL: https://issues.jboss.org/browse/ISPN-2316
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer, Transactions
> Affects Versions: 5.2.0.Alpha3
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Blocker
> Fix For: 5.2.0.Beta1
>
>
> When using transactions, a distributed deadlock may occur when a node is joining under these circumstances:
> 1) the new node requests transactions using GET_TRANSACTIONS
> 2) the old node tries to commit a transaction, broadcasting PrepareCommand - in StateTransferIntreceptor it locks the transactionLock in shared way
> 3) the request GET_TRANSACTIONS comes on the new node, the node is waiting for the transactionLock (it requires it exclusively)
> 4) transaction commit on new node is waiting for the commandsLock (requires this in shared way) but it is locked exclusively by the onTopologyUpdate - addTransfer - requestTransactions ( = synchronous GET_TRANSACTIONS).
> Found in some traces, but not required:
> After the transaction commit times out on old node releasing the lock, the GET_TRANSACTION request may continue, but the state transfer itself can also timeout if not set properly longer.
> The transaction commit continues on the new node after the ST times out, until it is found invalid (rolled back).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list