[infinispan-issues] [JBoss JIRA] (ISPN-2316) Distributed deadlock in StateTransferInterceptor
Radim Vansa (JIRA)
jira-events at lists.jboss.org
Wed Sep 19 11:08:35 EDT 2012
[ https://issues.jboss.org/browse/ISPN-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719962#comment-12719962 ]
Radim Vansa commented on ISPN-2316:
-----------------------------------
Another weird situation, this time in https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/ispn-52-radargun-resilience-dist-tx/18/ :
* at 08:07:48,271 thread OOB-11 on slave4 requests transactions from slave2
* this is carried at 08:07:48,275 by thread OOB-45 on slave2, the thread is waiting for transactionLock
* nevertheless, the OOB-11 on slave4 gets response at 08:08:09,313, in fact failing to get the transactions. StateTransferTimeout is set to 10 minutes, so having the connection timed out is not possible
* shortly after that at 08:08:09,713 the slave4 gets new cluster view with itself only
> Distributed deadlock in StateTransferInterceptor
> ------------------------------------------------
>
> Key: ISPN-2316
> URL: https://issues.jboss.org/browse/ISPN-2316
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer, Transactions
> Affects Versions: 5.2.0.Alpha3
> Reporter: Radim Vansa
> Assignee: Mircea Markus
> Priority: Critical
>
> When using transactions, a distributed deadlock may occur when a node is joining under these circumstances:
> 1) the new node requests transactions using GET_TRANSACTIONS
> 2) the old node tries to commit a transaction, broadcasting PrepareCommand - in StateTransferIntreceptor it locks the transactionLock in shared way
> 3) the request GET_TRANSACTIONS comes on the new node, the node is waiting for the transactionLock (it requires it exclusively)
> 4) transaction commit on new node is waiting for the commandsLock (requires this in shared way) but it is locked exclusively by the onTopologyUpdate - addTransfer - requestTransactions ( = synchronous GET_TRANSACTIONS).
> Found in some traces, but not required:
> After the transaction commit times out on old node releasing the lock, the GET_TRANSACTION request may continue, but the state transfer itself can also timeout if not set properly longer.
> The transaction commit continues on the new node after the ST times out, until it is found invalid (rolled back).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list