[
https://issues.jboss.org/browse/ISPN-2316?page=com.atlassian.jira.plugin....
]
Radim Vansa updated ISPN-2316:
------------------------------
Description:
When using transactions, a distributed deadlock may occur when a node is joining under
these circumstances:
1) the new node requests transactions using GET_TRANSACTIONS
2) the old node tries to commit a transaction, broadcasting PrepareCommand - in
StateTransferIntreceptor it locks the transactionLock in shared way
3) the request GET_TRANSACTIONS comes on the new node, the node is waiting for the
transactionLock (it requires it exclusively)
4) transaction commit on new node is waiting for the commandsLock (requires this in shared
way) but it is locked exclusively by the onTopologyUpdate - addTransfer -
requestTransactions ( = synchronous GET_TRANSACTIONS).
Found in some traces, but not required:
After the transaction commit times out on old node releasing the lock, the GET_TRANSACTION
request may continue, but the state transfer itself can also timeout if not set properly
longer.
The transaction commit continues on the new node after the ST times out, until it is found
invalid (rolled back).
was:
When using transactions, a distributed deadlock may occur when a node is joining under
these circumstances:
1) the new node requests transactions using GET_TRANSACTIONS
2) the old node tries to commit a transaction, broadcasting PrepareCommand - in
StateTransferIntreceptor it locks the transactionLock in shared way
3) the request GET_TRANSACTIONS comes on the new node, the node is waiting for the
transactionLock (it requires it exclusively)
4) transaction commit on new node is somehow delayed (for one minute - until ST times out)
in the interceptor chain between InvocationContextInterceptor and
OptimisticLockingInterceptor (it looks like the only place is waiting for
StateTransferInterceptor transactionLock (shared), however, I cannot find any trace that
it is held by anyone)
After the transaction commit times out on old node releasing the lock, the GET_TRANSACTION
request may continue, but the state transfer itself can also timeout if not set properly
longer.
Note that the transaction commit continues on the new node after the ST times out, until
it is found invalid (rolled back).
Distributed deadlock in StateTransferInterceptor
------------------------------------------------
Key: ISPN-2316
URL:
https://issues.jboss.org/browse/ISPN-2316
Project: Infinispan
Issue Type: Feature Request
Components: State transfer, Transactions
Affects Versions: 5.2.0.Alpha3
Reporter: Radim Vansa
Assignee: Mircea Markus
When using transactions, a distributed deadlock may occur when a node is joining under
these circumstances:
1) the new node requests transactions using GET_TRANSACTIONS
2) the old node tries to commit a transaction, broadcasting PrepareCommand - in
StateTransferIntreceptor it locks the transactionLock in shared way
3) the request GET_TRANSACTIONS comes on the new node, the node is waiting for the
transactionLock (it requires it exclusively)
4) transaction commit on new node is waiting for the commandsLock (requires this in
shared way) but it is locked exclusively by the onTopologyUpdate - addTransfer -
requestTransactions ( = synchronous GET_TRANSACTIONS).
Found in some traces, but not required:
After the transaction commit times out on old node releasing the lock, the
GET_TRANSACTION request may continue, but the state transfer itself can also timeout if
not set properly longer.
The transaction commit continues on the new node after the ST times out, until it is
found invalid (rolled back).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira