[JBoss JIRA] (ISPN-10239) FineGrainedAtomicMap.clear() concurrency issues
by Dan Berindei (Jira)
Dan Berindei created ISPN-10239:
-----------------------------------
Summary: FineGrainedAtomicMap.clear() concurrency issues
Key: ISPN-10239
URL: https://issues.jboss.org/browse/ISPN-10239
Project: Infinispan
Issue Type: Bug
Components: Core, Test Suite - Core
Affects Versions: 9.4.14.Final, 10.0.0.Beta3
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 10.0.0.Beta4
ISPN-7889 fixed the distribution interceptors to always access the invocation context from a single thread at a time. However, this only covers a single command invocation: it is still possible for an application to start multiple asynchronous operations in the same transaction and accessing the same {{AbstractCacheTransaction.lookedUpEntries}} map.
{{FineGrainedAtomicMap.clear()}} does exactly that:
{code}
Cache<Object, Object> noReturn = cache.getAdvancedCache().withFlags(Flag.IGNORE_RETURN_VALUES);
for (K key : keys) {
cfs.add(noReturn.removeAsync(new AtomicKeySetImpl.Key<>(group, key)));
}
{code}
And it's causing random failures in {{DistFineGrainedAtomicMapAPITest.testReplicationPutAndClearCommit}} when two commands insert an entry in the {{lookedUpEntries}} {{HashMap}} in parallel and one of them disappears:
{noformat}
13:08:49,676 TRACE (testng-Test:[]) [InvocationContextInterceptor] Invoked with command RemoveCommand{key=AtomicKeySetImpl.Key{group=map, key=existing}, value=null, metadata=null, flags=[IGNORE_RETURN_VALUES], commandInvocationId=, valueMatcher=MATCH_ALWAYS, topologyId=-1} and InvocationContext [org.infinispan.context.impl.LocalTxInvocationContext@7ce5d574]
13:08:49,676 TRACE (testng-Test:[]) [JGroupsTransport] Test-NodeB-49901 sending request 102 to [Test-NodeA-29107]: LockControlCommand{cache=atomic, keys=[AtomicKeySetImpl.Key{group=map, key=existing}], flags=[IGNORE_RETURN_VALUES], unlock=false, gtx=GlobalTx:Test-NodeB-49901:55098}
13:08:49,677 TRACE (testng-Test:[]) [InvocationContextInterceptor] Invoked with command RemoveCommand{key=AtomicKeySetImpl.Key{group=map, key=blah}, value=null, metadata=null, flags=[IGNORE_RETURN_VALUES], commandInvocationId=, valueMatcher=MATCH_ALWAYS, topologyId=-1} and InvocationContext [org.infinispan.context.impl.LocalTxInvocationContext@11d7f886]
13:08:49,677 TRACE (testng-Test:[]) [JGroupsTransport] Test-NodeB-49901 sending request 103 to [Test-NodeA-29107]: LockControlCommand{cache=atomic, keys=[AtomicKeySetImpl.Key{group=map, key=blah}], flags=[IGNORE_RETURN_VALUES], unlock=false, gtx=GlobalTx:Test-NodeB-49901:55098}
13:08:49,678 TRACE (jgroups-8,Test-NodeB-49901:[]) [JGroupsTransport] Test-NodeB-49901 received response for request 103 from Test-NodeA-29107: SuccessfulResponse(true)
13:08:49,678 TRACE (jgroups-4,Test-NodeB-49901:[]) [JGroupsTransport] Test-NodeB-49901 received response for request 102 from Test-NodeA-29107: SuccessfulResponse(true)
13:08:49,678 TRACE (jgroups-4,Test-NodeB-49901:[]) [QueueAsyncInvocationStage] Resuming invocation of command LockControlCommand{cache=atomic, keys=[AtomicKeySetImpl.Key{group=map, key=existing}], flags=[IGNORE_RETURN_VALUES], unlock=false, gtx=GlobalTx:Test-NodeB-49901:55098} with 1 handlers
13:08:49,678 TRACE (jgroups-8,Test-NodeB-49901:[]) [QueueAsyncInvocationStage] Resuming invocation of command LockControlCommand{cache=atomic, keys=[AtomicKeySetImpl.Key{group=map, key=blah}], flags=[IGNORE_RETURN_VALUES], unlock=false, gtx=GlobalTx:Test-NodeB-49901:55098} with 1 handlers
13:08:49,678 TRACE (jgroups-8,Test-NodeB-49901:[]) [EntryFactoryImpl] Updated context entry null -> RepeatableReadEntry(68ce9a19){key=AtomicKeySetImpl.Key{group=map, key=blah}, value=null, isCreated=false, isChanged=false, isRemoved=false, isExpired=false, skipLookup=false, metadata=null}
13:08:49,678 TRACE (jgroups-4,Test-NodeB-49901:[]) [EntryFactoryImpl] Updated context entry null -> RepeatableReadEntry(69aa0b31){key=AtomicKeySetImpl.Key{group=map, key=existing}, value=null, isCreated=false, isChanged=false, isRemoved=false, isExpired=false, skipLookup=false, metadata=null}
13:08:49,678 TRACE (jgroups-4,Test-NodeB-49901:[]) [CallInterceptor] Invoking: RemoveCommand
13:08:49,678 TRACE (jgroups-8,Test-NodeB-49901:[]) [CallInterceptor] Invoking: RemoveCommand
13:08:49,678 TRACE (jgroups-8,Test-NodeB-49901:[]) [EntryWrappingInterceptor] The return value is null
13:08:49,679 ERROR (jgroups-4,Test-NodeB-49901:[]) [InvocationContextInterceptor] ISPN000136: Error executing command RemoveCommand on Cache 'atomic', writing keys [AtomicKeySetImpl.Key{group=map, key=existing}]
java.lang.NullPointerException: null
13:08:49,681 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.atomic.DistFineGrainedAtomicMapAPITest.testReplicationPutAndClearCommit
java.lang.NullPointerException: null
at org.infinispan.commands.write.RemoveCommand.perform(RemoveCommand.java:75) ~[classes/:?]
at org.infinispan.interceptors.impl.CallInterceptor.visitCommand(CallInterceptor.java:29) ~[classes/:?]
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndFinally(BaseAsyncInterceptor.java:152) ~[classes/:?]
at org.infinispan.interceptors.distribution.TxDistributionInterceptor.handleTxWriteCommand(TxDistributionInterceptor.java:427) ~[classes/:?]
at org.infinispan.interceptors.distribution.TxDistributionInterceptor.visitRemoveCommand(TxDistributionInterceptor.java:138) ~[classes/:?]
at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:64) ~[classes/:?]
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenAccept(BaseAsyncInterceptor.java:98) ~[classes/:?]
at org.infinispan.interceptors.impl.EntryWrappingInterceptor.setSkipRemoteGetsAndInvokeNextForDataCommand(EntryWrappingInterceptor.java:671) ~[classes/:?]
at org.infinispan.interceptors.impl.EntryWrappingInterceptor.visitRemoveCommand(EntryWrappingInterceptor.java:356) ~[classes/:?]
at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:64) ~[classes/:?]
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[classes/:?]
at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) ~[classes/:?]
at org.infinispan.interceptors.DDAsyncInterceptor.visitRemoveCommand(DDAsyncInterceptor.java:65) ~[classes/:?]
at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:64) ~[classes/:?]
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[classes/:?]
at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) ~[classes/:?]
at org.infinispan.interceptors.DDAsyncInterceptor.visitRemoveCommand(DDAsyncInterceptor.java:65) ~[classes/:?]
at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:64) ~[classes/:?]
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[classes/:?]
at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lambda$new$0(AbstractTxLockingInterceptor.java:34) ~[classes/:?]
at org.infinispan.interceptors.InvocationSuccessFunction.apply(InvocationSuccessFunction.java:25) ~[classes/:?]
at org.infinispan.interceptors.SyncInvocationStage.addCallback(SyncInvocationStage.java:42) ~[classes/:?]
at org.infinispan.interceptors.InvocationStage.thenApply(InvocationStage.java:45) ~[classes/:?]
at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.acquireLocalLockAndInvokeNext(PessimisticLockingInterceptor.java:283) ~[classes/:?]
at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.lambda$visitDataWriteCommand$3(PessimisticLockingInterceptor.java:154) ~[classes/:?]
at org.infinispan.interceptors.InvocationSuccessFunction.apply(InvocationSuccessFunction.java:25) ~[classes/:?]
at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.invokeQueuedHandlers(QueueAsyncInvocationStage.java:118) ~[classes/:?]
{noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
4 years, 10 months
[JBoss JIRA] (ISPN-10236) Hotrod client error releasing channel after server cache stop
by Dan Berindei (Jira)
[ https://issues.jboss.org/browse/ISPN-10236?page=com.atlassian.jira.plugin... ]
Dan Berindei updated ISPN-10236:
--------------------------------
Summary: Hotrod client error releasing channel after server cache stop (was: Hotrod client error releasing channel after cache stop)
> Hotrod client error releasing channel after server cache stop
> -------------------------------------------------------------
>
> Key: ISPN-10236
> URL: https://issues.jboss.org/browse/ISPN-10236
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Server
> Affects Versions: 10.0.0.Beta3
> Reporter: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
> Fix For: 10.0.0.Beta4
>
> Attachments: ISPN-10137_package_private_scope_20190524-1732_ServerFailureRetryTest-infinispan-client-hotrod.log.gz
>
>
> Random failure in {{ServerFailureRetryTest.testRetryCacheStopped}} caused by an assert statement in {{ChannelPool.release()}}.
> {noformat}
> 17:37:36,562 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.client.hotrod.retry.ServerFailureRetryTest.testRetryCacheStopped
> java.lang.AssertionError: Error releasing [id: 0x5d9755e6, L:/127.0.0.1:42472 ! R:127.0.0.1/127.0.0.1:44865]
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelPool.release(ChannelPool.java:170) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.releaseChannel(ChannelFactory.java:309) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.operations.HotRodOperation.releaseChannel(HotRodOperation.java:105) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.invoke(RetryOnFailureOperation.java:80) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelPool.activateChannel(ChannelPool.java:217) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelPool.acquire(ChannelPool.java:86) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:259) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:297) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.operations.AbstractKeyOperation.fetchChannelAndInvoke(AbstractKeyOperation.java:41) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:61) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.RemoteCacheImpl.putAsync(RemoteCacheImpl.java:366) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.RemoteCacheImpl.put(RemoteCacheImpl.java:334) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.RemoteCacheSupport.put(RemoteCacheSupport.java:79) ~[classes/:?]
> at org.infinispan.client.hotrod.retry.ServerFailureRetryTest.testRetryCacheStopped(ServerFailureRetryTest.java:63) ~[test-classes/:?]
> {noformat}
> I investigated a bit and I couldn't find an obvious mistake in the way {{ChannelPool.created}} is incremented and decremented, but I think it would help if access to it and {{ChannelPool.active}} were centralized in a smaller number of methods.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
4 years, 10 months
[JBoss JIRA] (ISPN-10238) RemoteCacheManager.stop() hangs if a client thread is waiting for a server response
by Dan Berindei (Jira)
Dan Berindei created ISPN-10238:
-----------------------------------
Summary: RemoteCacheManager.stop() hangs if a client thread is waiting for a server response
Key: ISPN-10238
URL: https://issues.jboss.org/browse/ISPN-10238
Project: Infinispan
Issue Type: Bug
Components: Server, Test Suite - Server
Affects Versions: 9.4.14.Final, 10.0.0.Beta3
Reporter: Dan Berindei
Fix For: 10.0.0.Beta4
One of our integration tests performs a blocking {{RemoteCache.size()}} operation on the thread where another asynchronous operation was completed (a {{HotRod-client-async-pool}} thread):
{code:title=EvictionIT}
CompletableFuture res = rc.putAllAsync(entries);
res.thenRun(() -> assertEquals(3, rc.size()));
{code}
The test then finishes, but doesn't stop the {{RemoteCacheManager}}. When I changed the test to stop the {{RemoteCacheManager}}, the test started hanging:
{noformat}
"HotRod-client-async-pool-139-1" #2880 daemon prio=5 os_prio=0 cpu=434.56ms elapsed=1621.24s tid=0x00007f43a6b99800 nid=0x19c0 waiting on condition [0x00007f42ec9fd000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base(a)11.0.3/Native Method)
- parking to wait for <0x00000000d3321350> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.3/LockSupport.java:234)
at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.3/CompletableFuture.java:1798)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.3/ForkJoinPool.java:3128)
at java.util.concurrent.CompletableFuture.timedGet(java.base@11.0.3/CompletableFuture.java:1868)
at java.util.concurrent.CompletableFuture.get(java.base@11.0.3/CompletableFuture.java:2021)
at org.infinispan.client.hotrod.impl.Util.await(Util.java:46)
at org.infinispan.client.hotrod.impl.RemoteCacheImpl.size(RemoteCacheImpl.java:307)
at org.infinispan.server.test.eviction.EvictionIT.lambda$testPutAllAsyncEviction$0(EvictionIT.java:73)
at org.infinispan.server.test.eviction.EvictionIT$$Lambda$347/0x000000010074a440.run(Unknown Source)
at java.util.concurrent.CompletableFuture$UniRun.tryFire(java.base@11.0.3/CompletableFuture.java:783)
at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.3/CompletableFuture.java:506)
at java.util.concurrent.CompletableFuture.complete(java.base@11.0.3/CompletableFuture.java:2073)
at org.infinispan.client.hotrod.impl.operations.HotRodOperation.complete(HotRodOperation.java:162)
at org.infinispan.client.hotrod.impl.operations.PutAllOperation.acceptResponse(PutAllOperation.java:83)
at org.infinispan.client.hotrod.impl.transport.netty.HeaderDecoder.decode(HeaderDecoder.java:144)
at org.infinispan.client.hotrod.impl.transport.netty.HintedReplayingDecoder.callDecode(HintedReplayingDecoder.java:94)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:421)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:321)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.3/Thread.java:834)
Locked ownable synchronizers:
- <0x00000000ca248c30> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"main" #1 prio=5 os_prio=0 cpu=37300.10ms elapsed=2911.99s tid=0x00007f43a4023000 nid=0x37f7 in Object.wait() [0x00007f43a9c21000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(java.base(a)11.0.3/Native Method)
- waiting on <no object reference available>
at java.lang.Object.wait(java.base@11.0.3/Object.java:328)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:231)
- waiting to re-lock in wait() <0x00000000ca174af8> (a io.netty.util.concurrent.DefaultPromise)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:33)
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:32)
at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.destroy(ChannelFactory.java:216)
at org.infinispan.client.hotrod.RemoteCacheManager.stop(RemoteCacheManager.java:365)
at org.infinispan.client.hotrod.RemoteCacheManager.close(RemoteCacheManager.java:513)
at org.infinispan.commons.junit.ClassResource.lambda$new$0(ClassResource.java:24)
at org.infinispan.commons.junit.ClassResource$$Lambda$286/0x0000000100573040.accept(Unknown Source)
at org.infinispan.commons.junit.ClassResource.after(ClassResource.java:41)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:50)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.jboss.arquillian.junit.Arquillian.run(Arquillian.java:167)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55)
at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137)
at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:107)
at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:83)
at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75)
at org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:158)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
Locked ownable synchronizers:
- None
{noformat}
{{HotRod-client-async-pool}} threads are not appropriate for doing blocking cache operations at any time, but we need to do more than just change the test:
* We need an asynchronous {{RemoteCache.size()}} alternative
* Currently blocking operations like {{size()}} wait for a response from the server for 1 day, they should wait for a much smaller (and configurable) timeout.
* {{RemoteCacheManager.stop()}} should have a timeout as well, but more importantly it should cancel any pending operation.
* We should consider running all application code on a separate thread pool.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
4 years, 10 months