[
https://issues.jboss.org/browse/ISPN-2316?page=com.atlassian.jira.plugin....
]
Dan Berindei updated ISPN-2316:
-------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request:
https://github.com/infinispan/infinispan/pull/1342
Reorganize the StateTransferLock:
Remove the transactions lock and the commands lock.
Split waitForTopology() into waitForTopology() and waitForTransactionData()
Introduce a new read-write topology lock to replace the commands lock.
It's ok to remove the transactions lock because any transaction
modification (after the transactions table snapshot was taken)
is guaranteed to be forwarded to the new owners.
waitForTopology() is now used only to ensure that when joiners start requesting
state, every other node already knows about them.
waitForTransactionData() is used to block commands until the local node
has received transaction data for all the new segments.
The new read-write topology lock is held exclusively during topology changes
and non-exclusively during commits to the data container - so that data not
owned by the local node is not written to the data container.
Distributed deadlock in StateTransferInterceptor
------------------------------------------------
Key: ISPN-2316
URL:
https://issues.jboss.org/browse/ISPN-2316
Project: Infinispan
Issue Type: Bug
Components: State transfer, Transactions
Affects Versions: 5.2.0.Alpha3
Reporter: Radim Vansa
Assignee: Dan Berindei
Priority: Blocker
Fix For: 5.2.0.Beta1
When using transactions, a distributed deadlock may occur when a node is joining under
these circumstances:
1) the new node requests transactions using GET_TRANSACTIONS
2) the old node tries to commit a transaction, broadcasting PrepareCommand - in
StateTransferIntreceptor it locks the transactionLock in shared way
3) the request GET_TRANSACTIONS comes on the new node, the node is waiting for the
transactionLock (it requires it exclusively)
4) transaction commit on new node is waiting for the commandsLock (requires this in
shared way) but it is locked exclusively by the onTopologyUpdate - addTransfer -
requestTransactions ( = synchronous GET_TRANSACTIONS).
Found in some traces, but not required:
After the transaction commit times out on old node releasing the lock, the
GET_TRANSACTION request may continue, but the state transfer itself can also timeout if
not set properly longer.
The transaction commit continues on the new node after the ST times out, until it is
found invalid (rolled back).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira