[JBoss JIRA] (ISPN-4286) Two concurrent putIfAbsent operations can both return null during rebalance
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4286?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4286:
------------------------------
Fix Version/s: 7.1.0.Beta1
(was: 7.1.0.Alpha1)
> Two concurrent putIfAbsent operations can both return null during rebalance
> ---------------------------------------------------------------------------
>
> Key: ISPN-4286
> URL: https://issues.jboss.org/browse/ISPN-4286
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.1.0.Beta1
>
>
> If the cache topology changes while executing a putIfAbsent operation, the old primary owner will throw an OutdatedTopologyException, and the originator will retry on the new owner.
> When retrying the PutKeyValueCommand on the new primary owner, we compare the current value with the command's new value. If they are equal, we assume that the initial command wrote the old value, and we return {{null}}.
> However, the value might have been written by another putIfAbsent operation. So we could have two {{putIfAbsent(k, v)}} operations, both returning {{null}}.
> {code}
> A is the originator, B is the primary owner, k = null
> A -> B: putIfAbsent(k, v1)
> B dies before writing v, C is now primary owner
> D -> C: putIfAbsent(k, v1) // another put operation from D, with the same value
> C -> D: null // correct
> A -> C: retry_putIfAbsent(k, v1)
> C -> A: null // C assumes A is overwriting its own value, so it's also returning null
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4340) Automatically setup shared indexes when indexing is enabled
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4340?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4340:
------------------------------
Fix Version/s: 7.1.0.Beta1
(was: 7.1.0.Alpha1)
> Automatically setup shared indexes when indexing is enabled
> -----------------------------------------------------------
>
> Key: ISPN-4340
> URL: https://issues.jboss.org/browse/ISPN-4340
> Project: Infinispan
> Issue Type: Feature Request
> Components: Embedded Querying
> Reporter: Sanne Grinovero
> Assignee: Gustavo Fernandes
> Labels: 64QueryBlockers
> Fix For: 7.1.0.Beta1
>
>
> - on replicated Caches, we should create a default index on a FSDirectory and provide some appropriate default tuning, for example enabling NRT.
> - distributed Caches will need the Infinispan Directory (shared) and a master/slave backend (Infinispan IndexManager, while NRT is not compatible in this case)
> We want to keep the properties configuration structure as well as an "advanced tuning" and override capabilities of the default choices.
> Some more common options like sync/async indexing should probably be promoted to be controlled by the XML elements and configuration DSL excplicitly.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4560) FD_SOCK client socket connection timeout in the test suite
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4560?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4560:
------------------------------
Fix Version/s: 7.1.0.Beta1
(was: 7.1.0.Alpha1)
> FD_SOCK client socket connection timeout in the test suite
> ----------------------------------------------------------
>
> Key: ISPN-4560
> URL: https://issues.jboss.org/browse/ISPN-4560
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.1.0.Beta1
>
>
> At least some of the {{createBeforeMethod}} failures in the test suite seem to be caused by FD_SOCK, which is not able to connect to its peer:
> {noformat}
> 08:28:08,144 DEBUG (testng-L1StateTransferOverwriteTest:) [FD_SOCK] L1StateTransferOverwriteTest-NodeBC-2827: VIEW_CHANGE received: [L1StateTransferOverwriteTest-NodeBC-2827]
> 08:28:12,558 DEBUG (Incoming-1,L1StateTransferOverwriteTest-NodeBC-2827:) [FD_SOCK] L1StateTransferOverwriteTest-NodeBC-2827: VIEW_CHANGE received: [L1StateTransferOverwriteTest-NodeBC-2827, L1StateTransferOverwriteTest-NodeBD-12942]
> 08:28:12,631 DEBUG (FD_SOCK pinger,L1StateTransferOverwriteTest-NodeBC-2827:) [FD_SOCK] L1StateTransferOverwriteTest-NodeBC-2827: ping_dest is L1StateTransferOverwriteTest-NodeBD-12942, pingable_mbrs=[L1StateTransferOverwriteTest-NodeBC-2827, L1StateTransferOverwriteTest-NodeBD-12942]
> 08:28:12,716 DEBUG (testng-L1StateTransferOverwriteTest:) [FD_SOCK] L1StateTransferOverwriteTest-NodeBD-12942: VIEW_CHANGE received: [L1StateTransferOverwriteTest-NodeBC-2827, L1StateTransferOverwriteTest-NodeBD-12942]
> 08:28:12,719 DEBUG (ViewHandler,NodeBC-2827:) [STABLE] resuming message garbage collection
> 08:28:20,213 WARN (FD_SOCK pinger,L1StateTransferOverwriteTest-NodeBC-2827:) [FD_SOCK] L1StateTransferOverwriteTest-NodeBC-2827: creating the client socket failed: java.net.SocketTimeoutException
> 08:28:20,230 DEBUG (FD_SOCK pinger,L1StateTransferOverwriteTest-NodeBC-2827:) [FD_SOCK] L1StateTransferOverwriteTest-NodeBC-2827: could not create socket to L1StateTransferOverwriteTest-NodeBD-12942 (pinger thread is running)
> 08:28:20,230 DEBUG (FD_SOCK pinger,L1StateTransferOverwriteTest-NodeBC-2827:) [FD_SOCK] L1StateTransferOverwriteTest-NodeBC-2827: suspecting L1StateTransferOverwriteTest-NodeBD-12942
> 08:28:20,230 DEBUG (FD_SOCK pinger,L1StateTransferOverwriteTest-NodeBC-2827:) [FD_SOCK] L1StateTransferOverwriteTest-NodeBC-2827: ping_dest is null, pingable_mbrs=[L1StateTransferOverwriteTest-NodeBC-2827]
> 08:28:20,232 DEBUG (INT-1,L1StateTransferOverwriteTest-NodeBC-2827:) [FD_SOCK] L1StateTransferOverwriteTest-NodeBC-2827: suspecting [L1StateTransferOverwriteTest-NodeBD-12942]
> 08:28:20,241 DEBUG (Incoming-1,L1StateTransferOverwriteTest-NodeBC-2827:) [FD_SOCK] L1StateTransferOverwriteTest-NodeBC-2827: VIEW_CHANGE received: [L1StateTransferOverwriteTest-NodeBC-2827]
> 08:28:21,442 DEBUG (FD_SOCK pinger,L1StateTransferOverwriteTest-NodeBD-12942:) [FD_SOCK] L1StateTransferOverwriteTest-NodeBD-12942: ping_dest is L1StateTransferOverwriteTest-NodeBC-2827, pingable_mbrs=[L1StateTransferOverwriteTest-NodeBC-2827, L1StateTransferOverwriteTest-NodeBD-12942]
> 08:28:21,442 DEBUG (FD_SOCK pinger,NodeBD-12942:) [FD_SOCK] NodeBD-12942: ping_dest is NodeBC-2827, pingable_mbrs=[NodeBC-2827, NodeBD-12942]
> {noformat}
> There is no message in the log for about 8 seconds (at least for this test), so the timeout could be caused by a GC and/or StateTransferFunctionalTest using too much CPU.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4546) Possible stale lock when the primary owner leaves during rebalance
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4546?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4546:
------------------------------
Fix Version/s: 7.1.0.Beta1
(was: 7.1.0.Alpha1)
> Possible stale lock when the primary owner leaves during rebalance
> ------------------------------------------------------------------
>
> Key: ISPN-4546
> URL: https://issues.jboss.org/browse/ISPN-4546
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.1.0.Beta1
>
>
> Topology T: coordinator = A, owners(k) = [C, D], pending_owners(k) = null
> B sends prepareCommand(tx1, put(k, v)) to C, D
> D adds backup locks and replies
> C acquires lock, ready to send reply to B
> A starts installing topology T+1: owners(k) = [C, D], pending_owners(k) = [C, E]
> A, C and E install topology T+1, B and D do not
> E requests and receives tx data from C, including tx1
> C leaves
> B sees a SuspectException, sends rollbackCommand(tx1) to C, D
> D removes tx1
> C has left, but is ignored
> B reports to the user that the tx has been rolled back
> B and D install topology T+1 (optional)
> A starts installing topology T+2: owners(k) = [D], pending_owners(k) = [E]
> A, B, D, E all install topology T+2
> E requests and receives state from D, but it does not remove tx1
> A starts installing topology T+3: owners(k) = [E], pending_owners(k) = null
> E now has a stale backup lock on k
> It seems very hard to reproduce in production: C would have to leave soon enough so that B and D haven't received the T+1 topology yet, but late enough for it to send its transaction data to E.
> A possible solution would be to catch any SuspectException during prepare/commit/rollback (without ignoring leavers), wait for a new topology, and replicate the command again on the new owners. Obviously, this wouldn't work with asynchronous prepare/commit/rollback.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4491) Cluster Listener Event Batching
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4491?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4491:
------------------------------
Fix Version/s: 7.1.0.Beta1
(was: 7.1.0.Alpha1)
> Cluster Listener Event Batching
> -------------------------------
>
> Key: ISPN-4491
> URL: https://issues.jboss.org/browse/ISPN-4491
> Project: Infinispan
> Issue Type: Enhancement
> Components: Listeners
> Affects Versions: 7.0.0.Alpha4
> Reporter: William Burns
> Assignee: William Burns
> Fix For: 7.1.0.Beta1
>
>
> Currently when a local listener which was installed for a cluster listener finds an event to send back to the parent it does this 1 message per listener. It might be more beneficial if we had batching so that it wouldn't send 1 message per.
> There are 2 cases I can think of which this would benefit.
> # When the underlying transport is UDP. In this case we can send just 1 message to all the nodes for the event instead of N Unicasts
> # When a node has more than 1 cluster listener installed we could send a single message to notifiy more than 1 listener.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4463) AsyncAPITest.testAsyncMethodWithLifespanAndMaxIdle fails randomly
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4463?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4463:
------------------------------
Fix Version/s: 7.1.0.Beta1
(was: 7.1.0.Alpha1)
> AsyncAPITest.testAsyncMethodWithLifespanAndMaxIdle fails randomly
> -----------------------------------------------------------------
>
> Key: ISPN-4463
> URL: https://issues.jboss.org/browse/ISPN-4463
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.0.0.Alpha4
> Reporter: Vitalii Chepeliuk
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.1.0.Beta1
>
> Attachments: AsyncAPITest.log
>
>
> {noformat}
> java.lang.AssertionError: Entry evicted too soon!
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.infinispan.api.AsyncAPITest.verifyEviction(AsyncAPITest.java:356)
> at org.infinispan.api.AsyncAPITest.testAsyncMethodWithLifespanAndMaxIdle(AsyncAPITest.java:279)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:714)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
> at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
> at org.testng.TestRunner.privateRun(TestRunner.java:767)
> at org.testng.TestRunner.run(TestRunner.java:617)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:37)
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:368)
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
> at java.util.concurrent.FutureTask.run(FutureTask.java:273)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1176)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
> at java.lang.Thread.run(Thread.java:853)
> {noformat}
> Jenkins failer here
> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/FUNC/job/e...
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4587) Re-add old owners in the pending CH when a node leaves during rebalance
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4587?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4587:
------------------------------
Fix Version/s: 7.1.0.Beta1
(was: 7.1.0.Alpha1)
> Re-add old owners in the pending CH when a node leaves during rebalance
> -----------------------------------------------------------------------
>
> Key: ISPN-4587
> URL: https://issues.jboss.org/browse/ISPN-4587
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Minor
> Fix For: 7.1.0.Beta1
>
>
> Say we have a distributed cache \[A, B\] with {{numSegments = 1}} and {{numOwners = 2}}. The initial topology is _T_: currentCH = \{0: A B\}, pendingCH = null
> C joins, and A starts a rebalance. The topology is now _T + 1_: currentCH = \{0: A B\}, pendingCH = \{0: A C\}
> C now leaves, A updates the consistent hashes to remove it with a new topology _T + 2: currentCH = \{0: A B\}, pendingCH = \{0: A\}
> A doesn't need to receive any data, so the rebalance ends and the pending CH is installed as the current CH in topology _T + 3_: currentCH = \{0: A\}, pendingCH = null
> This algorithm is relatively easy to follow and implement, but it does result in reduced availability of the cache data. It would be better if topology _T + 2_ could re-add B as an owner in the pending CH.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4586) Too many OutdatedTopologyExceptions in non-transactional caches
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4586?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4586:
------------------------------
Fix Version/s: 7.1.0.Beta1
(was: 7.1.0.Alpha1)
> Too many OutdatedTopologyExceptions in non-transactional caches
> ---------------------------------------------------------------
>
> Key: ISPN-4586
> URL: https://issues.jboss.org/browse/ISPN-4586
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Labels: performance
> Fix For: 7.1.0.Beta1
>
>
> In a non-tx cache, when the topology id is incremented, owners (both primary and backup) receiving a write command with a lower topology id throw an OutdatedTopologyException so that the originator retries the command on the new owners.
> But the originator needs to retry the command only if the owners of the key changed in any way. During a join or a leave, most of the keys should not change owners, so throwing an OutdatedTopologyException is not necessary most of the time.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months