[JBoss JIRA] (ISPN-9988) ScatteredStateConsumerImpl can leak the exclusive topology lock
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9988?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9988:
----------------------------------
Fix Version/s: 10.0.0.Beta3
(was: 10.0.0.Beta2)
> ScatteredStateConsumerImpl can leak the exclusive topology lock
> ---------------------------------------------------------------
>
> Key: ISPN-9988
> URL: https://issues.jboss.org/browse/ISPN-9988
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.4.7.Final, 10.0.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 10.0.0.Beta3
>
>
> When an exception happens in {{ScatteredStateConsumerImpl.beforeTopologyInstalled}}, the exclusive topology lock is not released in {{StateConsumerImpl.onTopologyUpdate}}:
> {noformat}
> 15:21:54,783 ERROR (transport-thread-FunctionalScatteredInMemoryTest-NodeA-p43135-t5:[Topology-scattered]) [LocalTopologyManagerImpl] ISPN000230: Failed to start rebalance for cache scattered
> java.lang.IllegalArgumentException: The task is already cancelled.
> at org.infinispan.statetransfer.InboundTransferTask.cancelSegments(InboundTransferTask.java:172) ~[classes/:?]
> at org.infinispan.statetransfer.StateConsumerImpl.cancelTransfers(StateConsumerImpl.java:959) ~[classes/:?]
> at org.infinispan.scattered.impl.ScatteredStateConsumerImpl.beforeTopologyInstalled(ScatteredStateConsumerImpl.java:115) ~[classes/:?]
> at org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:292) ~[classes/:?]
> at org.infinispan.scattered.impl.ScatteredStateConsumerImpl.onTopologyUpdate(ScatteredStateConsumerImpl.java:102) ~[classes/:?]
> at org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:200) ~[classes/:?]
> {noformat}
> Because the exclusive topology lock is not released, threads that try to apply a new topology update block forever. This causes random failures with the ISPN-9863 thread leak checker:
> {noformat}
> 15:26:25,922 WARN (testng-RehashClusterPublisherManagerTest:[]) [ThreadLeakChecker] Possible leaked thread:
> "transport-thread-FunctionalScatteredInMemoryTest-NodeA-p43135-t3" daemon prio=5 tid=0x236fd nid=NA waiting
> java.lang.Thread.State: WAITING
> java.base(a)11/jdk.internal.misc.Unsafe.park(Native Method)
> java.base@11/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
> java.base@11/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
> java.base@11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)
> java.base@11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)
> java.base@11/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:959)
> app//org.infinispan.statetransfer.StateTransferLockImpl.acquireExclusiveTopologyLock(StateTransferLockImpl.java:42)
> app//org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:291)
> app//org.infinispan.scattered.impl.ScatteredStateConsumerImpl.onTopologyUpdate(ScatteredStateConsumerImpl.java:102)
> app//org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:200)
> app//org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:57)
> app//org.infinispan.statetransfer.StateTransferManagerImpl$1.updateConsistentHash(StateTransferManagerImpl.java:113)
> app//org.infinispan.topology.LocalTopologyManagerImpl.doHandleTopologyUpdate(LocalTopologyManagerImpl.java:353)
> app//org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleTopologyUpdate$1(LocalTopologyManagerImpl.java:275)
> 15:26:25,923 ERROR (testng-RehashClusterPublisherManagerTest:[]) [TestSuiteProgress] Test configuration failed: org.infinispan.reactive.publisher.impl.RehashClusterPublisherManagerTest.testClassFinished
> java.lang.AssertionError: Leaked threads:
> {transport-thread-FunctionalScatteredInMemoryTest-NodeA-p43135-t3: possible sources [org.infinispan.functional.FunctionalScatteredInMemoryTest[bias=ON_WRITE], org.infinispan.statetransfer.ClusterTopologyManagerTest[SCATTERED_SYNC, tx=false], org.infinispan.functional.FunctionalCachestoreTest[passivation=true], org.infinispan.functional.distribution.rehash.FunctionalNonTxBackupOwnerBecomingPrimaryOwnerTest, org.infinispan.functional.distribution.rehash.FunctionalNonTxJoinerBecomingBackupOwnerTest, org.infinispan.api.mvcc.PutForExternalReadTest[REPL_SYNC, tx=false], org.infinispan.functional.distribution.rehash.FunctionalTxTest, org.infinispan.functional.FunctionalEncodingTypeTest[tx=true]]}
> at org.infinispan.commons.test.ThreadLeakChecker.performCheck(ThreadLeakChecker.java:148) ~[infinispan-commons-test-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT]
> at org.infinispan.commons.test.ThreadLeakChecker.testFinished(ThreadLeakChecker.java:109) ~[infinispan-commons-test-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT]
> at org.infinispan.test.fwk.TestResourceTracker.testFinished(TestResourceTracker.java:112) ~[test-classes/:?]
> at org.infinispan.test.AbstractInfinispanTest.testClassFinished(AbstractInfinispanTest.java:142) ~[test-classes/:?]
> {noformat}
> The fix should address both the exclusive topology lock itself, by releasing it in a finally block, and the {{IllegalArgumentException}}, either by ignoring already cancelled transfers or by only cancelling transfers while holding {{transferMapsLock}}.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 1 month
[JBoss JIRA] (ISPN-3918) Inconsistent view of the cache with putIfAbsent in a non-tx cache during state transfer
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-3918?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-3918:
----------------------------------
Fix Version/s: 10.0.0.Final
(was: 9.4.8.Final)
> Inconsistent view of the cache with putIfAbsent in a non-tx cache during state transfer
> ---------------------------------------------------------------------------------------
>
> Key: ISPN-3918
> URL: https://issues.jboss.org/browse/ISPN-3918
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 6.0.0.Final
> Reporter: Dan Berindei
> Priority: Major
> Labels: consistency
> Fix For: 10.0.0.Final
>
> Attachments: NonTxPutIfAbsentDuringLeaveStressTest.testNodeLeavingDuringPutIfAbsent_8.log.gz, NonTxPutIfAbsentDuringRebalanceStressTest.testPutIfAbsentDuringJoin_1.log.gz, ntpiadjst.log.gz
>
>
> In a non-tx cache, sometimes it's possible for a {{get(k)}} to return {{null}} even though a previous {{putIfAbsent(k, v)}} returned a non-null value and the only concurrent operations on the cache are concurrent putIfAbsent calls.
> Say \[B, A, C] are the owners of k (C just joined)
> 1. A starts a {{putIfAbsent(k, v1)}} command, sends it to B
> 2. B forwards the command to A and C
> 3. C writes {{k=v1}}
> 4. C becomes the primary owner of k (owners are now \[C, A])
> 5. A/B see the new topology before committing and throw an outdatedTopologyException
> 6. A retries the command, sends it to C
> 7. C forwards the command to A, which writes {{k=v1}}
> 8. C doesn't have to update the entry, returns null
> If, between steps 3 and 7, another thread on A starts a {{putIfAbsent(k, v2)}} command, the command will fail and return {{v1}} (because the primary owner already has a value). However, a subsequent {{get(k)}} command will return {{null}}, because A is an owner and doesn't have the value.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 1 month
[JBoss JIRA] (ISPN-3835) Index Update command is processed before the registry listener is triggered
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-3835?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-3835:
----------------------------------
Fix Version/s: 10.0.0.Final
(was: 9.4.8.Final)
> Index Update command is processed before the registry listener is triggered
> ---------------------------------------------------------------------------
>
> Key: ISPN-3835
> URL: https://issues.jboss.org/browse/ISPN-3835
> Project: Infinispan
> Issue Type: Bug
> Components: Embedded Querying
> Affects Versions: 6.0.0.Final
> Reporter: Sanne Grinovero
> Priority: Critical
> Labels: 64QueryBlockers
> Fix For: 10.0.0.Final
>
>
> When using the InfinispanIndexManager backend the master node might receive an index update command about an index which it hasn't defined yet.
> Index definitions are triggered by the type registry, which in turn is driven by the ClusterRegistry and an event listener on the ClusterRegistry. It looks like slaves are sending update requests before the master has processed the configuration event.
> This leads to index update commands to be lost (with a stacktrace logged)
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 1 month
[JBoss JIRA] (ISPN-4817) Cluster Listener replication of listener to remote dist nodes needs privilege action
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-4817?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4817:
----------------------------------
Fix Version/s: 10.0.0.Final
(was: 9.4.8.Final)
> Cluster Listener replication of listener to remote dist nodes needs privilege action
> ------------------------------------------------------------------------------------
>
> Key: ISPN-4817
> URL: https://issues.jboss.org/browse/ISPN-4817
> Project: Infinispan
> Issue Type: Bug
> Components: Loaders and Stores
> Affects Versions: 7.0.0.CR1
> Reporter: William Burns
> Priority: Critical
> Fix For: 10.0.0.Final
>
>
> The ClusterListenerReplicateCallable needs to add the listener in a privileged block so that it can work properly.
> I was also thinking if we need to change all of our messages to send along the current Subject so it can be properly applied to remote invocations as well. In this case we wouldn't need to wrap the listener invocation.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 1 month
[JBoss JIRA] (ISPN-4624) Allow custom partition handling strategy
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-4624?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4624:
----------------------------------
Fix Version/s: 10.0.0.Final
(was: 9.4.8.Final)
> Allow custom partition handling strategy
> ----------------------------------------
>
> Key: ISPN-4624
> URL: https://issues.jboss.org/browse/ISPN-4624
> Project: Infinispan
> Issue Type: Feature Request
> Components: Core
> Affects Versions: 7.0.0.Beta1
> Reporter: Dan Berindei
> Priority: Major
> Labels: partition_handling
> Fix For: 10.0.0.Final
>
>
> Users should be able to configure a custom PartitionHandlingStrategy. It should be able to specify a behaviour on merge as well.
> We might want to merge this with the RebalancePolicy, which was also supposed to be configurable.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 1 month
[JBoss JIRA] (ISPN-4286) Two concurrent putIfAbsent operations can both return null during rebalance
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-4286?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4286:
----------------------------------
Fix Version/s: 10.0.0.Final
(was: 9.4.8.Final)
> Two concurrent putIfAbsent operations can both return null during rebalance
> ---------------------------------------------------------------------------
>
> Key: ISPN-4286
> URL: https://issues.jboss.org/browse/ISPN-4286
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Labels: consistency
> Fix For: 10.0.0.Final
>
>
> If the cache topology changes while executing a putIfAbsent operation, the old primary owner will throw an OutdatedTopologyException, and the originator will retry on the new owner.
> When retrying the PutKeyValueCommand on the new primary owner, we compare the current value with the command's new value. If they are equal, we assume that the initial command wrote the old value, and we return {{null}}.
> However, the value might have been written by another putIfAbsent operation. So we could have two {{putIfAbsent(k, v)}} operations, both returning {{null}}.
> {code}
> A is the originator, B is the primary owner, k = null
> A -> B: putIfAbsent(k, v1)
> B dies before writing v, C is now primary owner
> D -> C: putIfAbsent(k, v1) // another put operation from D, with the same value
> C -> D: null // correct
> A -> C: retry_putIfAbsent(k, v1)
> C -> A: null // C assumes A is overwriting its own value, so it's also returning null
> {code}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 1 month