[infinispan-issues] [JBoss JIRA] (ISPN-2316) Distributed deadlock in StateTransferInterceptor

Radim Vansa (JIRA) jira-events at lists.jboss.org
Tue Sep 18 08:21:35 EDT 2012


     [ https://issues.jboss.org/browse/ISPN-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radim Vansa updated ISPN-2316:
------------------------------

    Description: 
When using transactions, a distributed deadlock may occur when a node is joining under these circumstances:

1) the new node requests transactions using GET_TRANSACTIONS
2) the old node tries to commit a transaction, broadcasting PrepareCommand - in StateTransferIntreceptor it locks the transactionLock in shared way
3) the request GET_TRANSACTIONS comes on the new node, the node is waiting for the transactionLock (it requires it exclusively)
4) transaction commit on new node is somehow delayed (for one minute - until ST times out) in the interceptor chain between InvocationContextInterceptor and OptimisticLockingInterceptor (it looks like the only place is waiting for StateTransferInterceptor transactionLock (shared), however, I cannot find any trace that it is held by anyone)

After the transaction commit times out on old node releasing the lock, the GET_TRANSACTION request may continue, but the state transfer itself can also timeout if not set properly longer.
Note that the transaction commit continues on the new node after the ST times out, until it is found invalid (rolled back).

  was:
When using transactions, a distributed deadlock may occur when a node is joining under these circumstances:

1) the new node requests transactions using GET_TRANSACTIONS
2) the old node tries to commit a transaction, broadcasting PrepareCommand - in StateTransferIntreceptor it locks the transactionLock in shared way
3) the request GET_TRANSACTIONS comes on the new node, the node is waiting for the transactionLock (it requires it exclusively)
4) transaction commit on new node is somehow delayed (for one minute) in the interceptor chain between InvocationContextInterceptor and OptimisticLockingInterceptor (it looks like the only place is waiting for StateTransferInterceptor transactionLock (shared), however, I cannot find any trace that it is held by anyone)

After the transaction commit times out on old node releasing the lock, the GET_TRANSACTION request may continue, but the state transfer itself can also timeout if not set properly longer.
Note that the transaction commit continues on the new node after the ST times out, until it is found invalid (rolled back).


    
> Distributed deadlock in StateTransferInterceptor
> ------------------------------------------------
>
>                 Key: ISPN-2316
>                 URL: https://issues.jboss.org/browse/ISPN-2316
>             Project: Infinispan
>          Issue Type: Feature Request
>          Components: State transfer, Transactions
>    Affects Versions: 5.2.0.Alpha3
>            Reporter: Radim Vansa
>            Assignee: Mircea Markus
>
> When using transactions, a distributed deadlock may occur when a node is joining under these circumstances:
> 1) the new node requests transactions using GET_TRANSACTIONS
> 2) the old node tries to commit a transaction, broadcasting PrepareCommand - in StateTransferIntreceptor it locks the transactionLock in shared way
> 3) the request GET_TRANSACTIONS comes on the new node, the node is waiting for the transactionLock (it requires it exclusively)
> 4) transaction commit on new node is somehow delayed (for one minute - until ST times out) in the interceptor chain between InvocationContextInterceptor and OptimisticLockingInterceptor (it looks like the only place is waiting for StateTransferInterceptor transactionLock (shared), however, I cannot find any trace that it is held by anyone)
> After the transaction commit times out on old node releasing the lock, the GET_TRANSACTION request may continue, but the state transfer itself can also timeout if not set properly longer.
> Note that the transaction commit continues on the new node after the ST times out, until it is found invalid (rolled back).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list