[JBoss JIRA] (ISPN-4286) Two concurrent putIfAbsent operations can both return null during rebalance
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4286?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4286:
----------------------------------
Fix Version/s: 7.1.0.CR1
(was: 7.1.0.Beta1)
> Two concurrent putIfAbsent operations can both return null during rebalance
> ---------------------------------------------------------------------------
>
> Key: ISPN-4286
> URL: https://issues.jboss.org/browse/ISPN-4286
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.1.0.CR1
>
>
> If the cache topology changes while executing a putIfAbsent operation, the old primary owner will throw an OutdatedTopologyException, and the originator will retry on the new owner.
> When retrying the PutKeyValueCommand on the new primary owner, we compare the current value with the command's new value. If they are equal, we assume that the initial command wrote the old value, and we return {{null}}.
> However, the value might have been written by another putIfAbsent operation. So we could have two {{putIfAbsent(k, v)}} operations, both returning {{null}}.
> {code}
> A is the originator, B is the primary owner, k = null
> A -> B: putIfAbsent(k, v1)
> B dies before writing v, C is now primary owner
> D -> C: putIfAbsent(k, v1) // another put operation from D, with the same value
> C -> D: null // correct
> A -> C: retry_putIfAbsent(k, v1)
> C -> A: null // C assumes A is overwriting its own value, so it's also returning null
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-4351) Clarify the behaviour of putForExternalRead in clustered caches
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4351?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4351:
----------------------------------
Fix Version/s: 7.1.0.CR1
(was: 7.1.0.Beta1)
> Clarify the behaviour of putForExternalRead in clustered caches
> ---------------------------------------------------------------
>
> Key: ISPN-4351
> URL: https://issues.jboss.org/browse/ISPN-4351
> Project: Infinispan
> Issue Type: Task
> Components: Core, Documentation-Core
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Fix For: 7.1.0.CR1
>
>
> The {{putForExternal}} [documentation|http://infinispan.org/docs/7.0.x/user_guide/user_guide.html...] says
> {quote}
> putForExternalRead is consider to be a fast operation because regardless of whether it’s successful or not, it doesn’t wait for any locks, and so returns to the caller promptly.
> {quote}
> But the documentation doesn't say how {{putForExternalRead}} should behave in a clustered cache. Currently, the command is replicated to all the owners in distributed mode, and to all the cluster members in replicated mode. So the command may be delayed while waiting for the responses from those other nodes. The exception is invalidation mode, which doesn't replicate {{putForExternalRead}} commands - only regular {{put}}s generate invalidations.
> The documentation also doesn't mention the interaction of {{putForExternalRead}} operations with transactions. Currently, {{PUT_FOR_EXTERNAL_READ}}-flagged commands are executed as non-transactional commands even when the cache is transactional, presumably to avoid the overhead of transactions. But this undermines the argument for wrapping regular write operations in an implicit transaction, when running in a transactional cache.
> I propose changing {{putForExternalRead}} to only write on the local node (as it already does in invalidation mode), and documenting it as such. In distribution mode, it should be a no-op when the local node is not an owner.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-4404) The coordinator leaving the cluster should not cancel state transfer
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4404?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4404:
----------------------------------
Fix Version/s: 7.1.0.CR1
(was: 7.1.0.Beta1)
> The coordinator leaving the cluster should not cancel state transfer
> --------------------------------------------------------------------
>
> Key: ISPN-4404
> URL: https://issues.jboss.org/browse/ISPN-4404
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core, State Transfer
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Fix For: 7.1.0.CR1
>
>
> When the cluster coordinator changes or a merge view is received, the new coordinator tries to reconcile the cache topologies of the different partitions and cancels any state transfer in progress.
> But most of the time, the coordinator changing doesn't mean that there are multiple partitions, just that the old coordinator died. In that case there is a single cache topology, so there is nothing to reconcile and no reason to cancel the pending ST.
> We can't rely on the new coordinator receiving a MergeView, because the old coordinator might have received a MergeView and died soon afterwards.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-4340) Automatically setup shared indexes when indexing is enabled
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4340?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4340:
----------------------------------
Fix Version/s: 7.1.0.CR1
(was: 7.1.0.Beta1)
> Automatically setup shared indexes when indexing is enabled
> -----------------------------------------------------------
>
> Key: ISPN-4340
> URL: https://issues.jboss.org/browse/ISPN-4340
> Project: Infinispan
> Issue Type: Feature Request
> Components: Embedded Querying
> Reporter: Sanne Grinovero
> Assignee: Gustavo Fernandes
> Labels: 64QueryBlockers
> Fix For: 7.1.0.CR1
>
>
> - on replicated Caches, we should create a default index on a FSDirectory and provide some appropriate default tuning, for example enabling NRT.
> - distributed Caches will need the Infinispan Directory (shared) and a master/slave backend (Infinispan IndexManager, while NRT is not compatible in this case)
> We want to keep the properties configuration structure as well as an "advanced tuning" and override capabilities of the default choices.
> Some more common options like sync/async indexing should probably be promoted to be controlled by the XML elements and configuration DSL excplicitly.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-4262) Rolling back a transaction after commit timeout should release locks in two phases
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4262?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4262:
----------------------------------
Fix Version/s: 7.1.0.CR1
(was: 7.1.0.Beta1)
> Rolling back a transaction after commit timeout should release locks in two phases
> ----------------------------------------------------------------------------------
>
> Key: ISPN-4262
> URL: https://issues.jboss.org/browse/ISPN-4262
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Transactions
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Fix For: 7.1.0.CR1
>
>
> This is related to ISPN-4137.
> When a commit command times out, we send a rollback command to release locks. The problem is that the node that timed out might still execute commit command, before the rollback command reaching the backup but after it released the locks on the primary owner.
> The solution is to not release any locks during the rollback command's execution, but send an asynchronous TxCompletionNotification afterwards - just like for commit commands.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-4568) DistSyncL1RepeatableReadFuncTest.testNoEntryInL1MultipleConcurrentGetsWithInvalidation random failures
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4568?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4568:
----------------------------------
Fix Version/s: 7.1.0.CR1
(was: 7.1.0.Beta1)
> DistSyncL1RepeatableReadFuncTest.testNoEntryInL1MultipleConcurrentGetsWithInvalidation random failures
> ------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4568
> URL: https://issues.jboss.org/browse/ISPN-4568
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.1.0.CR1
>
>
> Very likely related to ISPN-4564, as there seem to be 2 unjustified pauses ~ 3s and some log messages also appear to be delayed:
> {noformat}
> 08:23:48,443 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAN-p28720-t1:) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@e9a3538]
> 08:23:48,470 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAN-p28720-t1:) [JGroupsTransport] dests=[DistSyncL1RepeatableReadFuncTest-NodeAN-7764, DistSyncL1RepeatableReadFuncTest-NodeAM-739], command=SingleRpcCommand{cacheName='dist', command=PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}}, mode=SYNCHRONOUS, timeout=60000
> 08:23:50,953 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@62801f8c]
> 08:23:50,953 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [L1ManagerImpl] Invalidating keys [key-to-the-cache] on nodes [DistSyncL1RepeatableReadFuncTest-NodeAK-9309]. Use multicast? false
> 08:23:51,060 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [JGroupsTransport] dests=[DistSyncL1RepeatableReadFuncTest-NodeAK-9309], command=SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=60000
> 08:23:51,062 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAK-p28661-t5:) [BaseRpcInvokingCommand] Invoking command InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}, with originLocal flag set to false
> 08:23:50,972 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [CallInterceptor] Executing command: PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}.
> 08:23:51,786 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAK-p28661-t5:) [InboundInvocationHandlerImpl] About to send back response null for command SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}}
> 08:23:51,796 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [CommandAwareRpcDispatcher] Responses: [sender=DistSyncL1RepeatableReadFuncTest-NodeAK-9309, received=true, suspected=false]
> 08:23:54,561 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [RpcManagerImpl] Response(s) to SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}} is {}
> 08:23:56,955 ERROR (testng-DistSyncL1RepeatableReadFuncTest:) [UnitTestTestNGListener] Test testNoEntryInL1MultipleConcurrentGetsWithInvalidation(org.infinispan.distribution.DistSyncL1RepeatableReadFuncTest) failed.
> java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:201)
> at org.infinispan.commons.util.concurrent.NotifyingFutureImpl.get(NotifyingFutureImpl.java:84)
> at org.infinispan.distribution.BaseDistSyncL1Test.testNoEntryInL1MultipleConcurrentGetsWithInvalidation(BaseDistSyncL1Test.java:217)
> 08:23:54,578 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [L1NonTxInterceptor] Allowing entry to commit as local node is owner
> 08:23:57,861 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [EntryWrappingInterceptor] About to commit entry RepeatableReadEntry(499752d9){key=key-to-the-cache, value=second-put, oldValue=first-put, isCreated=false, isChanged=true, isRemoved=false, isValid=true, skipRemoteGet=false, metadata=EmbeddedMetadata{version=null}}
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-4585) Prioritize commands in the remote executor
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4585?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4585:
----------------------------------
Fix Version/s: 7.1.0.CR1
(was: 7.1.0.Beta1)
> Prioritize commands in the remote executor
> ------------------------------------------
>
> Key: ISPN-4585
> URL: https://issues.jboss.org/browse/ISPN-4585
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Fix For: 7.1.0.CR1
>
>
> The remote executor currently has an unlimited queue of blocked task, but the underlying executor cannot use a queue. With a queue, we wouldn't need to overflow remote commands to the OOB threads, and the OOB threads would be free to process response messages.
> The problem is that {{ThreadPoolExecutor}} executes tasks in the order they are in the queue. If a node has a remote executor thread pool of 100 threads and receives a prepare(tx1, put(k, v1) comand, then 1000 prepare(tx_i, put(k, v_i)) commands, and finally a commit(tx1) command, the commit(tx1) command will block until all but 99 of the the prepare(tx_i, put(k, v_i)) commands have timed out.
> I think we could help this by using a {{PriorityBlockingQueue}} for the underlying executor, with commands ordered so that state transfer commands < commit/tx completion notification < prepare/lock. The commit command would still have to wait for one of the prepare commands currently running to time out, but it wouldn't have to wait for all of them.
> The current code, without a queue, would fill the remote executor and OOB thread pools, and it would discard the commit message (along with most of the prepare commands). The time it would take to process the commit successfully would depend on the timing of the retransmitted messages.
> Another possible improvement would be to keep track of the commands currently being executed, and always keep some threads free for commands with higher priority. But I'm not sure how easy it would be to do that on top of an injected {{ExecutorService}}.
> I believe there is also a problem with {{BlockingTaskAwareExecutorServiceImpl.checkForReadyTasks()}} after a topology change. Commands with the new topology id are all unblocked by submitting them to the underlying executor in FIFO order, on a single thread, so {{CallerRunsPolicy}} is not a valid rejection policy here.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-4586) Too many OutdatedTopologyExceptions in non-transactional caches
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4586?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4586:
----------------------------------
Fix Version/s: 7.1.0.CR1
(was: 7.1.0.Beta1)
> Too many OutdatedTopologyExceptions in non-transactional caches
> ---------------------------------------------------------------
>
> Key: ISPN-4586
> URL: https://issues.jboss.org/browse/ISPN-4586
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Labels: 7.0, performance
> Fix For: 7.1.0.CR1
>
>
> In a non-tx cache, when the topology id is incremented, owners (both primary and backup) receiving a write command with a lower topology id throw an OutdatedTopologyException so that the originator retries the command on the new owners.
> But the originator needs to retry the command only if the owners of the key changed in any way. During a join or a leave, most of the keys should not change owners, so throwing an OutdatedTopologyException is not necessary most of the time.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months