[infinispan-issues] [JBoss JIRA] (ISPN-2316) Distributed deadlock in StateTransferInterceptor

Radim Vansa (JIRA) jira-events at lists.jboss.org
Wed Sep 19 11:08:35 EDT 2012


    [ https://issues.jboss.org/browse/ISPN-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719962#comment-12719962 ] 

Radim Vansa commented on ISPN-2316:
-----------------------------------

Another weird situation, this time in https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/ispn-52-radargun-resilience-dist-tx/18/ :
* at 08:07:48,271 thread OOB-11 on slave4 requests transactions from slave2
* this is carried at 08:07:48,275 by thread OOB-45 on slave2, the thread is waiting for transactionLock
* nevertheless, the OOB-11 on slave4 gets response at 08:08:09,313, in fact failing to get the transactions. StateTransferTimeout is set to 10 minutes, so having the connection timed out is not possible
* shortly after that at 08:08:09,713 the slave4 gets new cluster view with itself only

                
> Distributed deadlock in StateTransferInterceptor
> ------------------------------------------------
>
>                 Key: ISPN-2316
>                 URL: https://issues.jboss.org/browse/ISPN-2316
>             Project: Infinispan
>          Issue Type: Bug
>          Components: State transfer, Transactions
>    Affects Versions: 5.2.0.Alpha3
>            Reporter: Radim Vansa
>            Assignee: Mircea Markus
>            Priority: Critical
>
> When using transactions, a distributed deadlock may occur when a node is joining under these circumstances:
> 1) the new node requests transactions using GET_TRANSACTIONS
> 2) the old node tries to commit a transaction, broadcasting PrepareCommand - in StateTransferIntreceptor it locks the transactionLock in shared way
> 3) the request GET_TRANSACTIONS comes on the new node, the node is waiting for the transactionLock (it requires it exclusively)
> 4) transaction commit on new node is waiting for the commandsLock (requires this in shared way) but it is locked exclusively by the onTopologyUpdate - addTransfer - requestTransactions ( = synchronous GET_TRANSACTIONS).
> Found in some traces, but not required:
> After the transaction commit times out on old node releasing the lock, the GET_TRANSACTION request may continue, but the state transfer itself can also timeout if not set properly longer.
> The transaction commit continues on the new node after the ST times out, until it is found invalid (rolled back).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list