[JBoss JIRA] (ISPN-4546) Possible stale lock when the primary owner leaves during rebalance
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4546?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration updated ISPN-4546:
------------------------------------------
Bugzilla Update: Perform
Bugzilla References: https://bugzilla.redhat.com/show_bug.cgi?id=1163727
> Possible stale lock when the primary owner leaves during rebalance
> ------------------------------------------------------------------
>
> Key: ISPN-4546
> URL: https://issues.jboss.org/browse/ISPN-4546
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Critical
> Fix For: 7.1.0.Alpha1
>
>
> Topology T: coordinator = A, owners(k) = [C, D], pending_owners(k) = null
> B sends prepareCommand(tx1, put(k, v)) to C, D
> D adds backup locks and replies
> C acquires lock, ready to send reply to B
> A starts installing topology T+1: owners(k) = [C, D], pending_owners(k) = [C, E]
> A, C and E install topology T+1, B and D do not
> E requests and receives tx data from C, including tx1
> C leaves
> B sees a SuspectException, sends rollbackCommand(tx1) to C, D
> D removes tx1
> C has left, but is ignored
> B reports to the user that the tx has been rolled back
> B and D install topology T+1 (optional)
> A starts installing topology T+2: owners(k) = [D], pending_owners(k) = [E]
> A, B, D, E all install topology T+2
> E requests and receives state from D, but it does not remove tx1
> A starts installing topology T+3: owners(k) = [E], pending_owners(k) = null
> E now has a stale backup lock on k
> It seems very hard to reproduce in production: C would have to leave soon enough so that B and D haven't received the T+1 topology yet, but late enough for it to send its transaction data to E.
> A possible solution would be to catch any SuspectException during prepare/commit/rollback (without ignoring leavers), wait for a new topology, and replicate the command again on the new owners. Obviously, this wouldn't work with asynchronous prepare/commit/rollback.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4286) Two concurrent putIfAbsent operations can both return null during rebalance
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4286?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration updated ISPN-4286:
------------------------------------------
Bugzilla Update: Perform
Bugzilla References: https://bugzilla.redhat.com/show_bug.cgi?id=1163725
> Two concurrent putIfAbsent operations can both return null during rebalance
> ---------------------------------------------------------------------------
>
> Key: ISPN-4286
> URL: https://issues.jboss.org/browse/ISPN-4286
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Priority: Critical
> Fix For: 7.1.0.Alpha1
>
>
> If the cache topology changes while executing a putIfAbsent operation, the old primary owner will throw an OutdatedTopologyException, and the originator will retry on the new owner.
> When retrying the PutKeyValueCommand on the new primary owner, we compare the current value with the command's new value. If they are equal, we assume that the initial command wrote the old value, and we return {{null}}.
> However, the value might have been written by another putIfAbsent operation. So we could have two {{putIfAbsent(k, v)}} operations, both returning {{null}}.
> {code}
> A is the originator, B is the primary owner, k = null
> A -> B: putIfAbsent(k, v1)
> B dies before writing v, C is now primary owner
> D -> C: putIfAbsent(k, v1) // another put operation from D, with the same value
> C -> D: null // correct
> A -> C: retry_putIfAbsent(k, v1)
> C -> A: null // C assumes A is overwriting its own value, so it's also returning null
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4029) NotifyingFutureTest.testExceptionOtherThread1 && testDoneOtherThread2 random failures
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4029?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4029:
-----------------------------------------------
Roman Macor <rmacor(a)redhat.com> changed the Status of [bug 1074457|https://bugzilla.redhat.com/show_bug.cgi?id=1074457] from ON_QA to VERIFIED
> NotifyingFutureTest.testExceptionOtherThread1 && testDoneOtherThread2 random failures
> -------------------------------------------------------------------------------------
>
> Key: ISPN-4029
> URL: https://issues.jboss.org/browse/ISPN-4029
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 6.0.1.Final
> Reporter: Dan Berindei
> Assignee: Sebastian Łaskawiec
> Labels: testsuite_stability
> Fix For: 7.0.0.Final
>
>
> {noformat}
> 16:58:33,402 ERROR (testng-NotifyingFutureTest:) [UnitTestTestNGListener] Test testExceptionOtherThread1(org.infinispan.commons.util.concurrent.NotifyingFutureTest) failed.
> java.lang.AssertionError: expected [true] but found [false]
> at org.testng.Assert.fail(Assert.java:94)
> at org.testng.Assert.failNotEquals(Assert.java:494)
> at org.testng.Assert.assertTrue(Assert.java:42)
> at org.testng.Assert.assertTrue(Assert.java:52)
> at org.infinispan.commons.util.concurrent.NotifyingFutureTest.testException(NotifyingFutureTest.java:150)
> at org.infinispan.commons.util.concurrent.NotifyingFutureTest.testExceptionOtherThread(NotifyingFutureTest.java:67)
> at org.infinispan.commons.util.concurrent.NotifyingFutureTest.testExceptionOtherThread1(NotifyingFutureTest.java:46)
> {noformat}
> The test submits a task to a ThreadPoolExecutor that notifies a NotifyingFutureImpl and assumes that when the NotifyingFutureImpl is done, the original Future is also done. That's clearly not happening some of the time.
> * Jenkins
> ** RHEL, Oracle JDK7
> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/FUNC/job/e...
> ** RHEL, OpenJDK7
> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/FUNC/job/e...
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4286) Two concurrent putIfAbsent operations can both return null during rebalance
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4286?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4286:
-------------------------------
Priority: Critical (was: Major)
> Two concurrent putIfAbsent operations can both return null during rebalance
> ---------------------------------------------------------------------------
>
> Key: ISPN-4286
> URL: https://issues.jboss.org/browse/ISPN-4286
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Priority: Critical
> Fix For: 7.1.0.Alpha1
>
>
> If the cache topology changes while executing a putIfAbsent operation, the old primary owner will throw an OutdatedTopologyException, and the originator will retry on the new owner.
> When retrying the PutKeyValueCommand on the new primary owner, we compare the current value with the command's new value. If they are equal, we assume that the initial command wrote the old value, and we return {{null}}.
> However, the value might have been written by another putIfAbsent operation. So we could have two {{putIfAbsent(k, v)}} operations, both returning {{null}}.
> {code}
> A is the originator, B is the primary owner, k = null
> A -> B: putIfAbsent(k, v1)
> B dies before writing v, C is now primary owner
> D -> C: putIfAbsent(k, v1) // another put operation from D, with the same value
> C -> D: null // correct
> A -> C: retry_putIfAbsent(k, v1)
> C -> A: null // C assumes A is overwriting its own value, so it's also returning null
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4546) Possible stale lock when the primary owner leaves during rebalance
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4546?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4546:
-------------------------------
Priority: Critical (was: Major)
> Possible stale lock when the primary owner leaves during rebalance
> ------------------------------------------------------------------
>
> Key: ISPN-4546
> URL: https://issues.jboss.org/browse/ISPN-4546
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Critical
> Fix For: 7.1.0.Alpha1
>
>
> Topology T: coordinator = A, owners(k) = [C, D], pending_owners(k) = null
> B sends prepareCommand(tx1, put(k, v)) to C, D
> D adds backup locks and replies
> C acquires lock, ready to send reply to B
> A starts installing topology T+1: owners(k) = [C, D], pending_owners(k) = [C, E]
> A, C and E install topology T+1, B and D do not
> E requests and receives tx data from C, including tx1
> C leaves
> B sees a SuspectException, sends rollbackCommand(tx1) to C, D
> D removes tx1
> C has left, but is ignored
> B reports to the user that the tx has been rolled back
> B and D install topology T+1 (optional)
> A starts installing topology T+2: owners(k) = [D], pending_owners(k) = [E]
> A, B, D, E all install topology T+2
> E requests and receives state from D, but it does not remove tx1
> A starts installing topology T+3: owners(k) = [E], pending_owners(k) = null
> E now has a stale backup lock on k
> It seems very hard to reproduce in production: C would have to leave soon enough so that B and D haven't received the T+1 topology yet, but late enough for it to send its transaction data to E.
> A possible solution would be to catch any SuspectException during prepare/commit/rollback (without ignoring leavers), wait for a new topology, and replicate the command again on the new owners. Obviously, this wouldn't work with asynchronous prepare/commit/rollback.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4370) CacheNotifierImplInitialTransferDistTest random failures
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4370?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4370:
-----------------------------------------------
Roman Macor <rmacor(a)redhat.com> changed the Status of [bug 1148564|https://bugzilla.redhat.com/show_bug.cgi?id=1148564] from ON_QA to VERIFIED
> CacheNotifierImplInitialTransferDistTest random failures
> --------------------------------------------------------
>
> Key: ISPN-4370
> URL: https://issues.jboss.org/browse/ISPN-4370
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 7.0.0.Alpha4
> Reporter: Dan Berindei
> Assignee: William Burns
> Labels: testsuite_stability
> Fix For: 7.0.0.Alpha5
>
> Attachments: CacheNotifierImplInitialTransferDistTest_master_20140612.log.gz, CacheNotifierImplInitialTransferDistTest_t_ISPN-4118_20140609.log.gz
>
>
> Random failures in {{testModificationAfterIterationBeganAndSegmentNotCompleteValueOwnerClustered}} and {{testRemoveAfterIterationBeganAndSegmentNotCompleteValueOwnerClustered}}.
> {noformat}
> java.lang.AssertionError: There was no matching create event for key key-1 expected [true] but found [false]
> at org.testng.Assert.fail(Assert.java:94)
> at org.testng.Assert.failNotEquals(Assert.java:494)
> at org.testng.Assert.assertTrue(Assert.java:42)
> at org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest.testIterationBeganAndSegmentNotComplete(CacheNotifierImplInitialTransferDistTest.java:542)
> at org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest.testRemoveAfterIterationBeganAndSegmentNotCompleteValueOwnerClustered(CacheNotifierImplInitialTransferDistTest.java:655)
> {noformat}
> {noformat}
> java.lang.AssertionError: There was no matching create event for key key-1 expected [true] but found [false]
> at org.testng.Assert.fail(Assert.java:94)
> at org.testng.Assert.failNotEquals(Assert.java:494)
> at org.testng.Assert.assertTrue(Assert.java:42)
> at org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest.testIterationBeganAndSegmentNotComplete(CacheNotifierImplInitialTransferDistTest.java:542)
> at org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest.testRemoveAfterIterationBeganAndSegmentNotCompleteValueOwnerClustered(CacheNotifierImplInitialTransferDistTest.java:655)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4384) CacheNotifierImplInitialTransferDistTest test instability
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4384?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4384:
-----------------------------------------------
Roman Macor <rmacor(a)redhat.com> changed the Status of [bug 1148564|https://bugzilla.redhat.com/show_bug.cgi?id=1148564] from ON_QA to VERIFIED
> CacheNotifierImplInitialTransferDistTest test instability
> ---------------------------------------------------------
>
> Key: ISPN-4384
> URL: https://issues.jboss.org/browse/ISPN-4384
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Reporter: William Burns
> Assignee: William Burns
> Fix For: 7.0.0.Alpha5
>
>
> There was an issue that was just recently fixed for
> However there are some other inconsistent failures as seen here
> http://ci.infinispan.org/viewLog.html?buildId=8820&tab=buildResultsDiv&bu...
> CacheNotifierImplInitialTransferDistTest.testCreateAfterIterationBeganAndSegmentNotCompleteValueNonOwnerClustered
> {code}
> java.lang.AssertionError: expected [11] but found [6]
> at org.testng.Assert.fail(Assert.java:94)
> at org.testng.Assert.failNotEquals(Assert.java:494)
> at org.testng.Assert.assertEquals(Assert.java:123)
> at org.testng.Assert.assertEquals(Assert.java:370)
> at org.testng.Assert.assertEquals(Assert.java:380)
> at org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest.testIterationBeganAndSegmentNotComplete(CacheNotifierImplInitialTransferDistTest.java:510)
> at org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest.testCreateAfterIterationBeganAndSegmentNotCompleteValueNonOwnerClustered(CacheNotifierImplInitialTransferDistTest.java:600)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:714)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
> at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
> at org.testng.TestRunner.privateRun(TestRunner.java:767)
> at org.testng.TestRunner.run(TestRunner.java:617)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:37)
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:368)
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {code}
> and CacheNotifierImplInitialTransferDistTest.testCreateAfterIterationBeganAndSegmentNotCompleteValueOwnerClustered has the same stack trace.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-3918) Inconsistent view of the cache with putIfAbsent in a non-tx cache during state transfer
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-3918?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-3918:
-----------------------------------
Possible workaround is to use FORCE_WRITE_LOCK flag for the get() operation.
> Inconsistent view of the cache with putIfAbsent in a non-tx cache during state transfer
> ---------------------------------------------------------------------------------------
>
> Key: ISPN-3918
> URL: https://issues.jboss.org/browse/ISPN-3918
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 6.0.0.Final
> Reporter: Dan Berindei
> Fix For: 7.1.0.Alpha1
>
> Attachments: ntpiadjst.log.gz
>
>
> In a non-tx cache, sometimes it's possible for a {{get(k)}} to return {{null}} even though a previous {{putIfAbsent(k, v)}} returned a non-null value and the only concurrent operations on the cache are concurrent putIfAbsent calls.
> Say \[B, A, C] are the owners of k (C just joined)
> 1. A starts a {{putIfAbsent(k, v1)}} command, sends it to B
> 2. B forwards the command to A and C
> 3. C writes {{k=v1}}
> 4. C becomes the primary owner of k (owners are now \[C, A])
> 5. A/B see the new topology before committing and throw an outdatedTopologyException
> 6. A retries the command, sends it to C
> 7. C forwards the command to A, which writes {{k=v1}}
> 8. C doesn't have to update the entry, returns null
> If, between steps 3 and 7, another thread on A starts a {{putIfAbsent(k, v2)}} command, the command will fail and return {{v1}} (because the primary owner already has a value). However, a subsequent {{get(k)}} command will return {{null}}, because A is an owner and doesn't have the value.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4404) The coordinator leaving the cluster should not cancel state transfer
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4404?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4404:
-------------------------------
Issue Type: Enhancement (was: Bug)
> The coordinator leaving the cluster should not cancel state transfer
> --------------------------------------------------------------------
>
> Key: ISPN-4404
> URL: https://issues.jboss.org/browse/ISPN-4404
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core, State Transfer
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Fix For: 7.1.0.Alpha1
>
>
> When the cluster coordinator changes or a merge view is received, the new coordinator tries to reconcile the cache topologies of the different partitions and cancels any state transfer in progress.
> But most of the time, the coordinator changing doesn't mean that there are multiple partitions, just that the old coordinator died. In that case there is a single cache topology, so there is nothing to reconcile and no reason to cancel the pending ST.
> We can't rely on the new coordinator receiving a MergeView, because the old coordinator might have received a MergeView and died soon afterwards.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months