[
https://issues.jboss.org/browse/ISPN-6997?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-6997:
------------------------------------
I started investigating this again when I stumbled across some new failures on the
ISPN-6997 branch. My previous comment was only partially correct: the
{{LockControlCommand}} is not delivered during the partition, but immediately before (or
possibly in parallel) with the view message.
The reason is that {{UNICAST3}} connections are not closed immediately when the peer
disappears from the view, and messages sent during the partition stay in the
connection's send table and may be delivered after the merge. But the caller already
got a {{CacheNotFoundResponse}} when the partition view got installed, so it committed the
transaction locally with a {{PrepareCommand(1PC)}}. Crucially, the 1PC command is *not*
sent to all the nodes that got the {{LockControlCommand}}, but only those that are still
in the cluster. So the {{LockControlCommand}} is retransmitted after the merge, but the
{{PrepareCommand(1PC)}}/{{RollbackCommand}} isn't, and the remote transaction is only
cleaned up after the remote transaction timeout expires (60 seconds by default).
Bela added a method in {{UNICAST3}} to manually close a connection to a peer, but I've
decided instead to change our code so that the targets of {{RollbackCommand}} and
{{PrepareCommand}} include the targets of any previous {{LockControlCommand}}.
PessimisticTxPartitionAndMergeDuringRuntimeTest.testOriginatorIsolatedPartition random
failures
-----------------------------------------------------------------------------------------------
Key: ISPN-6997
URL:
https://issues.jboss.org/browse/ISPN-6997
Project: Infinispan
Issue Type: Bug
Components: Core, Test Suite - Core
Affects Versions: 9.0.0.Alpha4
Reporter: Dan Berindei
Assignee: Dan Berindei
Labels: testsuite_stability
Fix For: 9.1.0.Final
The test starts with a cluster of 4 nodes, and splits it in 2 partitions while a
transaction is trying to lock a key. After the transaction fails, it checks that the
transaction has been cleaned up properly.
On one of the owners, {{transactionTable.cleanupLeaverTransactions}} is being called only
before the split and after the merge, never with the list of members during the split.
That means it never sees the transaction as an orphan, and doesn't remove it.
{noformat}
15:16:18,893 TRACE (testng-PTPAMDRT:[]) [PTPAMDRT] Local tx=[], remote
tx=[GlobalTx:PTPAMDRT-NodeI-3337:28616], for cache PTPAMDRT-NodeJ-27814
15:16:18,893 ERROR (testng-PTPAMDRT:[]) [TestSuiteProgress] Test failed:
org.infinispan.partitionhandling.PTPAMDRT.testOriginatorIsolatedPartition
java.lang.AssertionError: There are pending transactions!
at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.8.8.jar:?]
at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) ~[testng-6.8.8.jar:?]
at
org.infinispan.test.AbstractInfinispanTest.eventually(AbstractInfinispanTest.java:223)
~[test-classes/:?]
at
org.infinispan.test.AbstractInfinispanTest.eventually(AbstractInfinispanTest.java:519)
~[test-classes/:?]
at
org.infinispan.test.MultipleCacheManagersTest.assertNoTransactions(MultipleCacheManagersTest.java:794)
~[test-classes/:?]
at
org.infinispan.partitionhandling.BaseTxPartitionAndMergeTest.finalAsserts(BaseTxPartitionAndMergeTest.java:96)
~[test-classes/:?]
at
org.infinispan.partitionhandling.BasePessimisticTxPartitionAndMergeTest.doTest(BasePessimisticTxPartitionAndMergeTest.java:82)
~[test-classes/:?]
at
org.infinispan.partitionhandling.tionAndMergeDuringRuntimeTest.testOriginatorIsolatedPartition(PessimisticTxPartitionAndMergeDuringRuntimeTest.java:33)
~[test-classes/:?]
{noformat}
{{OptimisticTxPartitionAndMergeDuringCommitTest.testPrimaryOwnerIsolatedPartition}} has
similar random failures.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)