[JBoss JIRA] (ISPN-8889) Data race in NonTxInvocationContext
by Peter Levart (JIRA)
[ https://issues.jboss.org/browse/ISPN-8889?page=com.atlassian.jira.plugin.... ]
Peter Levart edited comment on ISPN-8889 at 2/28/18 8:26 AM:
-------------------------------------------------------------
So you think that only responses to remoteGet may be concurrent among themselves for same context and that they never run concurrently with other possibly only read-only accesses to the same context?
The instrumented experiment that I tried (above) shows that PutMapCommand processing for example, accesses the context in read-only fashion (lookupEntry):
{noformat}
java.lang.IllegalThreadStateException: Thread[remote-thread--p2-t4,5,main]: data race detected with writer of state: 271f7814a414 - see suppressed for writer thread stack trace
at org.infinispan.context.impl.DataRacer.detect(DataRacer.java:37) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.context.impl.NonTxInvocationContext.lookupEntry(NonTxInvocationContext.java:39) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.addRemoteGet(NonTxDistributionInterceptor.java:405) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleRemoteReadWriteManyCommand(NonTxDistributionInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleReadWriteManyCommand(NonTxDistributionInterceptor.java:303) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.visitPutMapCommand(NonTxDistributionInterceptor.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
...
{noformat}
While the response to to remoteGet is making a read-write access to the same context:
{noformat}
Suppressed: org.infinispan.context.impl.DataRacer$StackTrace: Thread[jgroups-13,tbd2-40757,5,main], state: 271f7814a414
at org.infinispan.context.impl.DataRacer.write(DataRacer.java:59) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.context.impl.DataRacer.detectAndWrite(DataRacer.java:49) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:51) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
...
{noformat}
The synchronization you suggested won't help this case. If there is a possibility of a concurrent write, then not only concurrent writes must be synchronized, but also all reads that are possible to run concurrently with writes. Perhaps with a ReadWriteLock lock (or preferably with StampedLock if there's no possibility of re-entrance from same thread).
I think that read-write accessess (such as EntryFactory.wrapXXX) should be protected with a write lock, while read only accesses and traversals of entries with a read lock.
Unless the logic can tolerate inconsistent traversals (all calls to context.getLookedUpEntries() that iterate or stream entries) in which case, plain HashMap(s) could simply be replaced with ConcurrentHashMap(s) in various InvocationContext(s) (read accesses to CHM are on par with HM and space overhead of CHM is on par with HM too). Read-write accesses could then either be partially synchronized as in your proposed patch (just those reads-writes that may run concurrently) so that atomicity of read-compute-write operations (such as EntryFactory.wrapXXX) can be maintained or perhaps the InvocationContext (EntryLookup) API could be changed a bit.
Instead of EntryLookup.putLookedUpEntry method, there could be a method like:
CacheEntry computeLookedUpEntry(Object key, BiFunction<? super Object, ? super CacheEntry, ? extends CacheEntry> remappingFunction)
which would be implemented in various InvocationContext(s) in terms of ConcurrentHashMap.compute(). This method would them be used for all read-compute-write accesses to lookup entries in InvocationContext(s). Trivial implementations are possible on an InvocationContext basis that don't use synchronization where there's no need for it.
What do you think?
was (Author: plevart_mar):
So you think that only responses to remoteGet may be concurrent among themselves for same context and that they never run concurrently with other possibly only read-only accesses to the same context?
The instrumented experiment that I tried (above) shows that PutMapCommand processing for example, accesses the context in read-only fashion (lookupEntry):
{noformat}
java.lang.IllegalThreadStateException: Thread[remote-thread--p2-t4,5,main]: data race detected with writer of state: 271f7814a414 - see suppressed for writer thread stack trace
at org.infinispan.context.impl.DataRacer.detect(DataRacer.java:37) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.context.impl.NonTxInvocationContext.lookupEntry(NonTxInvocationContext.java:39) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.addRemoteGet(NonTxDistributionInterceptor.java:405) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleRemoteReadWriteManyCommand(NonTxDistributionInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleReadWriteManyCommand(NonTxDistributionInterceptor.java:303) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.visitPutMapCommand(NonTxDistributionInterceptor.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
...
{noformat}
While the response to to remoteGet is making a read-write access to the same context:
{noformat}
Suppressed: org.infinispan.context.impl.DataRacer$StackTrace: Thread[jgroups-13,tbd2-40757,5,main], state: 271f7814a414
at org.infinispan.context.impl.DataRacer.write(DataRacer.java:59) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.context.impl.DataRacer.detectAndWrite(DataRacer.java:49) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:51) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
...
{noformat}
The synchronization you suggested won't help this case. If there is a possibility of a concurrent write, then not only concurrent writes must be synchronized, but also all reads that are possible to run concurrently with writes. Perhaps with a ReadWriteLock lock (or preferably with StampedLock if there's no possibility of re-entrance from same thread).
I think that read-write accessess (such as EntryFactory.wrapXXX) should be protected with a write lock, while read only accesses and traversals of entries with a read lock.
Unless the logic can tolerate inconsistent traversals (all calls to context.getLookedUpEntries() that iterate or stream entries) in which case, plain HashMap(s) could simply be replaced with ConcurrentHashMap(s) in various InvocationContext(s) (read accesses to CHM are on par with HM and space overhead of CHM is on par with HM too). Read-write accesses could then either be partially synchronized as in your proposed patch (just those reads-writes that may run concurrently) so that atomicity of read-compute-write operations (such as EntryFactory.wrapXXX) can be maintained or perhaps the InvocationContext (EntryLookup) API could be changed a bit.
Instead of EntryLookup.putLookedUpEntry method, there could be a method like:
void computeLookedUpEntry(Object key, BiFunction<? super Object, ? super CacheEntry, ? extends CacheEntry> remappingFunction)
which would be implemented in various InvocationContext(s) in terms of ConcurrentHashMap.compute(). This method would them be used for all read-compute-write accesses to lookup entries in InvocationContext(s). Trivial implementations are possible on an InvocationContext basis that don't use synchronization where there's no need for it.
What do you think?
> Data race in NonTxInvocationContext
> -----------------------------------
>
> Key: ISPN-8889
> URL: https://issues.jboss.org/browse/ISPN-8889
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.0.CR2, 9.1.6.Final
> Environment: Java 8 (Oracle JDK 8), Solaris SPARC
> Reporter: Peter Levart
> Assignee: Radim Vansa
> Labels: data-race
> Attachments: DataRacer.java, DataRacer.java
>
>
> Got the following exceptions starting up an Infinispan node while joining the cluster:
> {noformat}
> 17:10:59.012 [remote-thread--p2-t8] ERROR org.infinispan.interceptors.impl.InvocationContextInterceptor - ISPN000136: Error executing command PutMapCommand, writing keys [11906696627, 11906696626, 11906696625, 11906696624, 11906696631, 11906696630, 11906696629, 11906696628...<9992 other elements>]
> java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
> at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1832) ~[?:1.8.0_162]
> at java.util.HashMap$TreeNode.treeify(HashMap.java:1949) ~[?:1.8.0_162]
> at java.util.HashMap$TreeNode.split(HashMap.java:2175) ~[?:1.8.0_162]
> at java.util.HashMap.resize(HashMap.java:714) ~[?:1.8.0_162]
> at java.util.HashMap.putVal(HashMap.java:663) ~[?:1.8.0_162]
> at java.util.HashMap.put(HashMap.java:612) ~[?:1.8.0_162]
> at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:48) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
> at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.StaggeredRequest.onResponse(StaggeredRequest.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1302) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1205) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1340) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.jgroups.JChannel.up(JChannel.java:819) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.Discovery.up(Discovery.java:267) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> {noformat}
> Immediately suspected that plain HashMap is being modified from multiple threads without proper synchronization. But since I'm new to Infinispan, I can't know whether the usage of plain HashMap in NonTxInvocationContext is intentional and Infinispan code is relying on some external synchronization to be performed by other parts of system that calls NonTxInvocationContext.[lookupEntry|removeLookedUpEntry|putLookedUpEntry|getLookedUpEntries]. In any way, if this external synchronization is taking place, it has a bug and allows calling these methods concurrently without proper synchronization.
> I did some experiments to prove that and also to obtain stack traces of at least two of involved threads, but trivial things that I tried instrumenting those methods all involved some kind of synchronization and the symptom disappeared. I had to be extra smart to obtain the stack traces of two threads without inter-thread synchronization so that I could detect the data race. Here's how I instrumented NonTxInvocationContext code:
> {noformat}
> Index: core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> --- core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java (revision 6014e751c20daba1f00e23168281e02afb209905)
> +++ core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java (date 1519744397000)
> @@ -22,6 +22,7 @@
> private Set<Object> lockedKeys;
> private Object lockOwner;
>
> + private final DataRacer dataRacer = new DataRacer();
>
> public NonTxInvocationContext(int numEntries, Address origin) {
> super(origin);
> @@ -35,22 +36,26 @@
>
> @Override
> public CacheEntry lookupEntry(Object k) {
> + dataRacer.detect();
> return lookedUpEntries.get(k);
> }
>
> @Override
> public void removeLookedUpEntry(Object key) {
> + dataRacer.detectAndWrite();
> lookedUpEntries.remove(key);
> }
>
> @Override
> public void putLookedUpEntry(Object key, CacheEntry e) {
> + dataRacer.detectAndWrite();
> lookedUpEntries.put(key, e);
> }
>
> @Override
> @SuppressWarnings("unchecked")
> public Map<Object, CacheEntry> getLookedUpEntries() {
> + dataRacer.detect();
> return (Map<Object, CacheEntry>)
> (lookedUpEntries == null ?
> Collections.emptyMap() : lookedUpEntries);
> {noformat}
> Attached, you will find the DataRacer source. While running with such patched Infinispan, I got some of the following:
> {noformat}
> java.lang.IllegalThreadStateException: Thread[remote-thread--p2-t4,5,main]: data race detected with writer of state: 271f7814a414 - see suppressed for writer thread stack trace
> at org.infinispan.context.impl.DataRacer.detect(DataRacer.java:37) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.NonTxInvocationContext.lookupEntry(NonTxInvocationContext.java:39) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.addRemoteGet(NonTxDistributionInterceptor.java:405) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleRemoteReadWriteManyCommand(NonTxDistributionInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleReadWriteManyCommand(NonTxDistributionInterceptor.java:303) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.visitPutMapCommand(NonTxDistributionInterceptor.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenAccept(BaseAsyncInterceptor.java:98) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.setSkipRemoteGetsAndInvokeNextForManyEntriesCommand(EntryWrappingInterceptor.java:614) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.visitPutMapCommand(EntryWrappingInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.handleWriteManyCommand(NonTransactionalLockingInterceptor.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitPutMapCommand(AbstractLockingInterceptor.java:179) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:306) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleWriteCommand(StateTransferInterceptor.java:252) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.visitPutMapCommand(StateTransferInterceptor.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.CacheMgmtInterceptor.visitPutMapCommand(CacheMgmtInterceptor.java:154) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.visitPutMapCommand(DDAsyncInterceptor.java:90) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommandAsync(BaseRpcInvokingCommand.java:63) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.remote.SingleRpcCommand.invokeAsync(SingleRpcCommand.java:57) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> Suppressed: org.infinispan.context.impl.DataRacer$StackTrace: Thread[jgroups-13,tbd2-40757,5,main], state: 271f7814a414
> at org.infinispan.context.impl.DataRacer.write(DataRacer.java:59) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.DataRacer.detectAndWrite(DataRacer.java:49) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:51) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
> at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.StaggeredRequest.onResponse(StaggeredRequest.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1302) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1205) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1340) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.jgroups.JChannel.up(JChannel.java:819) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.Discovery.up(Discovery.java:267) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> {noformat}
> So here you have it. Either using plain HashMap in NonTxInvocationContext is wrong and ConcurrentHashMap should be used instead or some external synchronization has a bug and is inadequate. I hope this helps to fix it.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 4 months
[JBoss JIRA] (ISPN-8889) Data race in NonTxInvocationContext
by Peter Levart (JIRA)
[ https://issues.jboss.org/browse/ISPN-8889?page=com.atlassian.jira.plugin.... ]
Peter Levart commented on ISPN-8889:
------------------------------------
So you think that only responses to remoteGet may be concurrent among themselves for same context and that they never run concurrently with other possibly only read-only accesses to the same context?
The instrumented experiment that I tried (above) shows that PutMapCommand processing for example, accesses the context in read-only fashion (lookupEntry):
{noformat}
java.lang.IllegalThreadStateException: Thread[remote-thread--p2-t4,5,main]: data race detected with writer of state: 271f7814a414 - see suppressed for writer thread stack trace
at org.infinispan.context.impl.DataRacer.detect(DataRacer.java:37) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.context.impl.NonTxInvocationContext.lookupEntry(NonTxInvocationContext.java:39) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.addRemoteGet(NonTxDistributionInterceptor.java:405) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleRemoteReadWriteManyCommand(NonTxDistributionInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleReadWriteManyCommand(NonTxDistributionInterceptor.java:303) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.visitPutMapCommand(NonTxDistributionInterceptor.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
...
{noformat}
While the response to to remoteGet is making a read-write access to the same context:
{noformat}
Suppressed: org.infinispan.context.impl.DataRacer$StackTrace: Thread[jgroups-13,tbd2-40757,5,main], state: 271f7814a414
at org.infinispan.context.impl.DataRacer.write(DataRacer.java:59) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.context.impl.DataRacer.detectAndWrite(DataRacer.java:49) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:51) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
...
{noformat}
The synchronization you suggested won't help this case. If there is a possibility of a concurrent write, then not only concurrent writes must be synchronized, but also all reads that are possible to run concurrently with writes. Perhaps with a ReadWriteLock lock (or preferably with StampedLock if there's no possibility of re-entrance from same thread).
I think that read-write accessess (such as EntryFactory.wrapXXX) should be protected with a write lock, while read only accesses and traversals of entries with a read lock.
Unless the logic can tolerate inconsistent traversals (all calls to context.getLookedUpEntries() that iterate or stream entries) in which case, plain HashMap(s) could simply be replaced with ConcurrentHashMap(s) in various InvocationContext(s) (read accesses to CHM are on par with HM and space overhead of CHM is on par with HM too). Read-write accesses could then either be partially synchronized as in your proposed patch (just those reads-writes that may run concurrently) so that atomicity of read-compute-write operations (such as EntryFactory.wrapXXX) can be maintained or perhaps the InvocationContext (EntryLookup) API could be changed a bit.
Instead of EntryLookup.putLookedUpEntry method, there could be a method like:
void computeLookedUpEntry(Object key, BiFunction<? super Object, ? super CacheEntry, ? extends CacheEntry> remappingFunction)
which would be implemented in various InvocationContext(s) in terms of ConcurrentHashMap.compute(). This method would them be used for all read-compute-write accesses to lookup entries in InvocationContext(s). Trivial implementations are possible on an InvocationContext basis that don't use synchronization where there's no need for it.
What do you think?
> Data race in NonTxInvocationContext
> -----------------------------------
>
> Key: ISPN-8889
> URL: https://issues.jboss.org/browse/ISPN-8889
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.0.CR2, 9.1.6.Final
> Environment: Java 8 (Oracle JDK 8), Solaris SPARC
> Reporter: Peter Levart
> Assignee: Radim Vansa
> Labels: data-race
> Attachments: DataRacer.java, DataRacer.java
>
>
> Got the following exceptions starting up an Infinispan node while joining the cluster:
> {noformat}
> 17:10:59.012 [remote-thread--p2-t8] ERROR org.infinispan.interceptors.impl.InvocationContextInterceptor - ISPN000136: Error executing command PutMapCommand, writing keys [11906696627, 11906696626, 11906696625, 11906696624, 11906696631, 11906696630, 11906696629, 11906696628...<9992 other elements>]
> java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
> at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1832) ~[?:1.8.0_162]
> at java.util.HashMap$TreeNode.treeify(HashMap.java:1949) ~[?:1.8.0_162]
> at java.util.HashMap$TreeNode.split(HashMap.java:2175) ~[?:1.8.0_162]
> at java.util.HashMap.resize(HashMap.java:714) ~[?:1.8.0_162]
> at java.util.HashMap.putVal(HashMap.java:663) ~[?:1.8.0_162]
> at java.util.HashMap.put(HashMap.java:612) ~[?:1.8.0_162]
> at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:48) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
> at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.StaggeredRequest.onResponse(StaggeredRequest.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1302) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1205) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1340) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.jgroups.JChannel.up(JChannel.java:819) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.Discovery.up(Discovery.java:267) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> {noformat}
> Immediately suspected that plain HashMap is being modified from multiple threads without proper synchronization. But since I'm new to Infinispan, I can't know whether the usage of plain HashMap in NonTxInvocationContext is intentional and Infinispan code is relying on some external synchronization to be performed by other parts of system that calls NonTxInvocationContext.[lookupEntry|removeLookedUpEntry|putLookedUpEntry|getLookedUpEntries]. In any way, if this external synchronization is taking place, it has a bug and allows calling these methods concurrently without proper synchronization.
> I did some experiments to prove that and also to obtain stack traces of at least two of involved threads, but trivial things that I tried instrumenting those methods all involved some kind of synchronization and the symptom disappeared. I had to be extra smart to obtain the stack traces of two threads without inter-thread synchronization so that I could detect the data race. Here's how I instrumented NonTxInvocationContext code:
> {noformat}
> Index: core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> --- core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java (revision 6014e751c20daba1f00e23168281e02afb209905)
> +++ core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java (date 1519744397000)
> @@ -22,6 +22,7 @@
> private Set<Object> lockedKeys;
> private Object lockOwner;
>
> + private final DataRacer dataRacer = new DataRacer();
>
> public NonTxInvocationContext(int numEntries, Address origin) {
> super(origin);
> @@ -35,22 +36,26 @@
>
> @Override
> public CacheEntry lookupEntry(Object k) {
> + dataRacer.detect();
> return lookedUpEntries.get(k);
> }
>
> @Override
> public void removeLookedUpEntry(Object key) {
> + dataRacer.detectAndWrite();
> lookedUpEntries.remove(key);
> }
>
> @Override
> public void putLookedUpEntry(Object key, CacheEntry e) {
> + dataRacer.detectAndWrite();
> lookedUpEntries.put(key, e);
> }
>
> @Override
> @SuppressWarnings("unchecked")
> public Map<Object, CacheEntry> getLookedUpEntries() {
> + dataRacer.detect();
> return (Map<Object, CacheEntry>)
> (lookedUpEntries == null ?
> Collections.emptyMap() : lookedUpEntries);
> {noformat}
> Attached, you will find the DataRacer source. While running with such patched Infinispan, I got some of the following:
> {noformat}
> java.lang.IllegalThreadStateException: Thread[remote-thread--p2-t4,5,main]: data race detected with writer of state: 271f7814a414 - see suppressed for writer thread stack trace
> at org.infinispan.context.impl.DataRacer.detect(DataRacer.java:37) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.NonTxInvocationContext.lookupEntry(NonTxInvocationContext.java:39) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.addRemoteGet(NonTxDistributionInterceptor.java:405) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleRemoteReadWriteManyCommand(NonTxDistributionInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleReadWriteManyCommand(NonTxDistributionInterceptor.java:303) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.visitPutMapCommand(NonTxDistributionInterceptor.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenAccept(BaseAsyncInterceptor.java:98) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.setSkipRemoteGetsAndInvokeNextForManyEntriesCommand(EntryWrappingInterceptor.java:614) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.visitPutMapCommand(EntryWrappingInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.handleWriteManyCommand(NonTransactionalLockingInterceptor.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitPutMapCommand(AbstractLockingInterceptor.java:179) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:306) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleWriteCommand(StateTransferInterceptor.java:252) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.visitPutMapCommand(StateTransferInterceptor.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.CacheMgmtInterceptor.visitPutMapCommand(CacheMgmtInterceptor.java:154) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.visitPutMapCommand(DDAsyncInterceptor.java:90) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommandAsync(BaseRpcInvokingCommand.java:63) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.remote.SingleRpcCommand.invokeAsync(SingleRpcCommand.java:57) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> Suppressed: org.infinispan.context.impl.DataRacer$StackTrace: Thread[jgroups-13,tbd2-40757,5,main], state: 271f7814a414
> at org.infinispan.context.impl.DataRacer.write(DataRacer.java:59) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.DataRacer.detectAndWrite(DataRacer.java:49) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:51) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
> at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.StaggeredRequest.onResponse(StaggeredRequest.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1302) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1205) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1340) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.jgroups.JChannel.up(JChannel.java:819) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.Discovery.up(Discovery.java:267) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> {noformat}
> So here you have it. Either using plain HashMap in NonTxInvocationContext is wrong and ConcurrentHashMap should be used instead or some external synchronization has a bug and is inadequate. I hope this helps to fix it.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 4 months
[JBoss JIRA] (ISPN-7793) Expose Infinispan cluster deployed in OpenShift
by Sebastian Łaskawiec (JIRA)
[ https://issues.jboss.org/browse/ISPN-7793?page=com.atlassian.jira.plugin.... ]
Sebastian Łaskawiec commented on ISPN-7793:
-------------------------------------------
{quote}
Each datagrid node should get assigned a resolvable hostname which is used as external-host in the hot rod server configuration.
{quote}
This is actually pretty hard thing to do. Whenever you scale up your deployment, a new LB Service needs to be created. This LB Service is vendor dependent and it takes quite a while to create it (5 min at most on GCP). After the service gets its External Address (IP or a address), you need to update Infinispan server configuration and restart it. Note, that sometimes, users might embed the configuration directly into their container, so sometimes it just can not be done.
So as you see, the main problem is feeding the information about the External IP (or address) back to the server.
The only way I can see this working is having an external discovery mechanism - w custom container with a customized DNS Server (let's call it {{Discovery Service}} for now). Each server could publish its hostname (and internal address - the one that is not routable?) to this service. In the next step the Discovery Service would scan Kubernetes API and create a matching LB Service. Once it's up, it could expose it via DNS.
Once we have internal hostname (and internal IP address) and external IP mapping, we are ready to do the translation.
{quote}
Client nodes will receive topology information through Hot Rod containing these host names.
Client nodes which have direct visibility of the datagrid nodes will resolve their addresses directly.
Client nodes which do not have direct visibility should have their DNS configured to remap the datagrid node names to the proxy addresses.
{quote}
This could work. Although, I'm not sure if need a project (namespace) prefix or not. Also, the client code would need to be modified to tolerate non-existing servers.
> Expose Infinispan cluster deployed in OpenShift
> -----------------------------------------------
>
> Key: ISPN-7793
> URL: https://issues.jboss.org/browse/ISPN-7793
> Project: Infinispan
> Issue Type: Feature Request
> Components: Cloud
> Reporter: Sebastian Łaskawiec
> Assignee: Sebastian Łaskawiec
>
> Find a way to expose an existing cluster deployed on Kubernetes to the outside world.
> Links:
> * https://github.com/slaskawi/external-ip-proxy
> * https://blog.fabric8.io/additional-kubernetes-controllers-from-fabric8-yo...
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 4 months
[JBoss JIRA] (ISPN-8889) Data race in NonTxInvocationContext
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-8889?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant reassigned ISPN-8889:
-------------------------------------
Assignee: Radim Vansa
> Data race in NonTxInvocationContext
> -----------------------------------
>
> Key: ISPN-8889
> URL: https://issues.jboss.org/browse/ISPN-8889
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.0.CR2, 9.1.6.Final
> Environment: Java 8 (Oracle JDK 8), Solaris SPARC
> Reporter: Peter Levart
> Assignee: Radim Vansa
> Labels: data-race
> Attachments: DataRacer.java, DataRacer.java
>
>
> Got the following exceptions starting up an Infinispan node while joining the cluster:
> {noformat}
> 17:10:59.012 [remote-thread--p2-t8] ERROR org.infinispan.interceptors.impl.InvocationContextInterceptor - ISPN000136: Error executing command PutMapCommand, writing keys [11906696627, 11906696626, 11906696625, 11906696624, 11906696631, 11906696630, 11906696629, 11906696628...<9992 other elements>]
> java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
> at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1832) ~[?:1.8.0_162]
> at java.util.HashMap$TreeNode.treeify(HashMap.java:1949) ~[?:1.8.0_162]
> at java.util.HashMap$TreeNode.split(HashMap.java:2175) ~[?:1.8.0_162]
> at java.util.HashMap.resize(HashMap.java:714) ~[?:1.8.0_162]
> at java.util.HashMap.putVal(HashMap.java:663) ~[?:1.8.0_162]
> at java.util.HashMap.put(HashMap.java:612) ~[?:1.8.0_162]
> at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:48) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
> at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.StaggeredRequest.onResponse(StaggeredRequest.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1302) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1205) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1340) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.jgroups.JChannel.up(JChannel.java:819) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.Discovery.up(Discovery.java:267) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> {noformat}
> Immediately suspected that plain HashMap is being modified from multiple threads without proper synchronization. But since I'm new to Infinispan, I can't know whether the usage of plain HashMap in NonTxInvocationContext is intentional and Infinispan code is relying on some external synchronization to be performed by other parts of system that calls NonTxInvocationContext.[lookupEntry|removeLookedUpEntry|putLookedUpEntry|getLookedUpEntries]. In any way, if this external synchronization is taking place, it has a bug and allows calling these methods concurrently without proper synchronization.
> I did some experiments to prove that and also to obtain stack traces of at least two of involved threads, but trivial things that I tried instrumenting those methods all involved some kind of synchronization and the symptom disappeared. I had to be extra smart to obtain the stack traces of two threads without inter-thread synchronization so that I could detect the data race. Here's how I instrumented NonTxInvocationContext code:
> {noformat}
> Index: core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> --- core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java (revision 6014e751c20daba1f00e23168281e02afb209905)
> +++ core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java (date 1519744397000)
> @@ -22,6 +22,7 @@
> private Set<Object> lockedKeys;
> private Object lockOwner;
>
> + private final DataRacer dataRacer = new DataRacer();
>
> public NonTxInvocationContext(int numEntries, Address origin) {
> super(origin);
> @@ -35,22 +36,26 @@
>
> @Override
> public CacheEntry lookupEntry(Object k) {
> + dataRacer.detect();
> return lookedUpEntries.get(k);
> }
>
> @Override
> public void removeLookedUpEntry(Object key) {
> + dataRacer.detectAndWrite();
> lookedUpEntries.remove(key);
> }
>
> @Override
> public void putLookedUpEntry(Object key, CacheEntry e) {
> + dataRacer.detectAndWrite();
> lookedUpEntries.put(key, e);
> }
>
> @Override
> @SuppressWarnings("unchecked")
> public Map<Object, CacheEntry> getLookedUpEntries() {
> + dataRacer.detect();
> return (Map<Object, CacheEntry>)
> (lookedUpEntries == null ?
> Collections.emptyMap() : lookedUpEntries);
> {noformat}
> Attached, you will find the DataRacer source. While running with such patched Infinispan, I got some of the following:
> {noformat}
> java.lang.IllegalThreadStateException: Thread[remote-thread--p2-t4,5,main]: data race detected with writer of state: 271f7814a414 - see suppressed for writer thread stack trace
> at org.infinispan.context.impl.DataRacer.detect(DataRacer.java:37) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.NonTxInvocationContext.lookupEntry(NonTxInvocationContext.java:39) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.addRemoteGet(NonTxDistributionInterceptor.java:405) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleRemoteReadWriteManyCommand(NonTxDistributionInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleReadWriteManyCommand(NonTxDistributionInterceptor.java:303) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.visitPutMapCommand(NonTxDistributionInterceptor.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenAccept(BaseAsyncInterceptor.java:98) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.setSkipRemoteGetsAndInvokeNextForManyEntriesCommand(EntryWrappingInterceptor.java:614) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.visitPutMapCommand(EntryWrappingInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.handleWriteManyCommand(NonTransactionalLockingInterceptor.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitPutMapCommand(AbstractLockingInterceptor.java:179) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:306) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleWriteCommand(StateTransferInterceptor.java:252) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.visitPutMapCommand(StateTransferInterceptor.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.CacheMgmtInterceptor.visitPutMapCommand(CacheMgmtInterceptor.java:154) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.visitPutMapCommand(DDAsyncInterceptor.java:90) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommandAsync(BaseRpcInvokingCommand.java:63) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.remote.SingleRpcCommand.invokeAsync(SingleRpcCommand.java:57) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> Suppressed: org.infinispan.context.impl.DataRacer$StackTrace: Thread[jgroups-13,tbd2-40757,5,main], state: 271f7814a414
> at org.infinispan.context.impl.DataRacer.write(DataRacer.java:59) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.DataRacer.detectAndWrite(DataRacer.java:49) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:51) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
> at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.StaggeredRequest.onResponse(StaggeredRequest.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1302) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1205) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1340) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.jgroups.JChannel.up(JChannel.java:819) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.Discovery.up(Discovery.java:267) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> {noformat}
> So here you have it. Either using plain HashMap in NonTxInvocationContext is wrong and ConcurrentHashMap should be used instead or some external synchronization has a bug and is inadequate. I hope this helps to fix it.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 4 months
[JBoss JIRA] (ISPN-8889) Data race in NonTxInvocationContext
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-8889?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-8889:
-----------------------------------
Yes, in most cases the InvocationContext is accessed by single thread only at any time; you've found an exception to that rule. Synchronizing the collection everywhere would be overkill IMO.
Your approach is correct, putAll should give you somewhat better throughput if you're loading the cache from single thread using sync operations. An alternative would be using putAsync(), throttling the number of concurrent requests to reasonable amounts, or (worse) loading from multiple threads to cover for the network latency.
Quick fix should be this: https://github.com/rvansa/infinispan/tree/ISPN-8889 - I need to write a reproducer for the issue in order to provide a full patch.
> Data race in NonTxInvocationContext
> -----------------------------------
>
> Key: ISPN-8889
> URL: https://issues.jboss.org/browse/ISPN-8889
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.0.CR2, 9.1.6.Final
> Environment: Java 8 (Oracle JDK 8), Solaris SPARC
> Reporter: Peter Levart
> Labels: data-race
> Attachments: DataRacer.java, DataRacer.java
>
>
> Got the following exceptions starting up an Infinispan node while joining the cluster:
> {noformat}
> 17:10:59.012 [remote-thread--p2-t8] ERROR org.infinispan.interceptors.impl.InvocationContextInterceptor - ISPN000136: Error executing command PutMapCommand, writing keys [11906696627, 11906696626, 11906696625, 11906696624, 11906696631, 11906696630, 11906696629, 11906696628...<9992 other elements>]
> java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
> at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1832) ~[?:1.8.0_162]
> at java.util.HashMap$TreeNode.treeify(HashMap.java:1949) ~[?:1.8.0_162]
> at java.util.HashMap$TreeNode.split(HashMap.java:2175) ~[?:1.8.0_162]
> at java.util.HashMap.resize(HashMap.java:714) ~[?:1.8.0_162]
> at java.util.HashMap.putVal(HashMap.java:663) ~[?:1.8.0_162]
> at java.util.HashMap.put(HashMap.java:612) ~[?:1.8.0_162]
> at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:48) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
> at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.StaggeredRequest.onResponse(StaggeredRequest.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1302) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1205) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1340) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.jgroups.JChannel.up(JChannel.java:819) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.Discovery.up(Discovery.java:267) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> {noformat}
> Immediately suspected that plain HashMap is being modified from multiple threads without proper synchronization. But since I'm new to Infinispan, I can't know whether the usage of plain HashMap in NonTxInvocationContext is intentional and Infinispan code is relying on some external synchronization to be performed by other parts of system that calls NonTxInvocationContext.[lookupEntry|removeLookedUpEntry|putLookedUpEntry|getLookedUpEntries]. In any way, if this external synchronization is taking place, it has a bug and allows calling these methods concurrently without proper synchronization.
> I did some experiments to prove that and also to obtain stack traces of at least two of involved threads, but trivial things that I tried instrumenting those methods all involved some kind of synchronization and the symptom disappeared. I had to be extra smart to obtain the stack traces of two threads without inter-thread synchronization so that I could detect the data race. Here's how I instrumented NonTxInvocationContext code:
> {noformat}
> Index: core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> --- core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java (revision 6014e751c20daba1f00e23168281e02afb209905)
> +++ core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java (date 1519744397000)
> @@ -22,6 +22,7 @@
> private Set<Object> lockedKeys;
> private Object lockOwner;
>
> + private final DataRacer dataRacer = new DataRacer();
>
> public NonTxInvocationContext(int numEntries, Address origin) {
> super(origin);
> @@ -35,22 +36,26 @@
>
> @Override
> public CacheEntry lookupEntry(Object k) {
> + dataRacer.detect();
> return lookedUpEntries.get(k);
> }
>
> @Override
> public void removeLookedUpEntry(Object key) {
> + dataRacer.detectAndWrite();
> lookedUpEntries.remove(key);
> }
>
> @Override
> public void putLookedUpEntry(Object key, CacheEntry e) {
> + dataRacer.detectAndWrite();
> lookedUpEntries.put(key, e);
> }
>
> @Override
> @SuppressWarnings("unchecked")
> public Map<Object, CacheEntry> getLookedUpEntries() {
> + dataRacer.detect();
> return (Map<Object, CacheEntry>)
> (lookedUpEntries == null ?
> Collections.emptyMap() : lookedUpEntries);
> {noformat}
> Attached, you will find the DataRacer source. While running with such patched Infinispan, I got some of the following:
> {noformat}
> java.lang.IllegalThreadStateException: Thread[remote-thread--p2-t4,5,main]: data race detected with writer of state: 271f7814a414 - see suppressed for writer thread stack trace
> at org.infinispan.context.impl.DataRacer.detect(DataRacer.java:37) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.NonTxInvocationContext.lookupEntry(NonTxInvocationContext.java:39) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.addRemoteGet(NonTxDistributionInterceptor.java:405) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleRemoteReadWriteManyCommand(NonTxDistributionInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleReadWriteManyCommand(NonTxDistributionInterceptor.java:303) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.visitPutMapCommand(NonTxDistributionInterceptor.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenAccept(BaseAsyncInterceptor.java:98) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.setSkipRemoteGetsAndInvokeNextForManyEntriesCommand(EntryWrappingInterceptor.java:614) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.visitPutMapCommand(EntryWrappingInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.handleWriteManyCommand(NonTransactionalLockingInterceptor.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitPutMapCommand(AbstractLockingInterceptor.java:179) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:306) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleWriteCommand(StateTransferInterceptor.java:252) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.visitPutMapCommand(StateTransferInterceptor.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.CacheMgmtInterceptor.visitPutMapCommand(CacheMgmtInterceptor.java:154) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.visitPutMapCommand(DDAsyncInterceptor.java:90) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommandAsync(BaseRpcInvokingCommand.java:63) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.remote.SingleRpcCommand.invokeAsync(SingleRpcCommand.java:57) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> Suppressed: org.infinispan.context.impl.DataRacer$StackTrace: Thread[jgroups-13,tbd2-40757,5,main], state: 271f7814a414
> at org.infinispan.context.impl.DataRacer.write(DataRacer.java:59) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.DataRacer.detectAndWrite(DataRacer.java:49) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:51) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
> at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.StaggeredRequest.onResponse(StaggeredRequest.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1302) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1205) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1340) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.jgroups.JChannel.up(JChannel.java:819) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.Discovery.up(Discovery.java:267) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> {noformat}
> So here you have it. Either using plain HashMap in NonTxInvocationContext is wrong and ConcurrentHashMap should be used instead or some external synchronization has a bug and is inadequate. I hope this helps to fix it.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 4 months
[JBoss JIRA] (ISPN-8889) Data race in NonTxInvocationContext
by Peter Levart (JIRA)
[ https://issues.jboss.org/browse/ISPN-8889?page=com.atlassian.jira.plugin.... ]
Peter Levart commented on ISPN-8889:
------------------------------------
Hi,
I won't pretend that I understand all that. I'm new to Infinispan. I'm migrating an application that uses Oracle Coherence to Infinispan. For pre-loading entries into a cache, I have been using Map.putAll(chunk) with chunks of entries because that was a quicker way to put entries into a cache on Coherence than putting individual key/value pairs. I suspect this holds for Infinispan too, since it uses special command for that.
I don't quite understand your last sentence: "Luckily, adding synchronization on the context for the remoteGet handling should be easy."
Are you thinking of putting-in some synchronization just for remoteGet ? Where would you put it ? Would puting synchronization on NonTxInvocationContext.[lookupEntry|removeLookedUpEntry|putLookedUpEntry|getLookedUpEntries] be to strong and not needed?
Regards, Peter
> Data race in NonTxInvocationContext
> -----------------------------------
>
> Key: ISPN-8889
> URL: https://issues.jboss.org/browse/ISPN-8889
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.0.CR2, 9.1.6.Final
> Environment: Java 8 (Oracle JDK 8), Solaris SPARC
> Reporter: Peter Levart
> Labels: data-race
> Attachments: DataRacer.java, DataRacer.java
>
>
> Got the following exceptions starting up an Infinispan node while joining the cluster:
> {noformat}
> 17:10:59.012 [remote-thread--p2-t8] ERROR org.infinispan.interceptors.impl.InvocationContextInterceptor - ISPN000136: Error executing command PutMapCommand, writing keys [11906696627, 11906696626, 11906696625, 11906696624, 11906696631, 11906696630, 11906696629, 11906696628...<9992 other elements>]
> java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
> at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1832) ~[?:1.8.0_162]
> at java.util.HashMap$TreeNode.treeify(HashMap.java:1949) ~[?:1.8.0_162]
> at java.util.HashMap$TreeNode.split(HashMap.java:2175) ~[?:1.8.0_162]
> at java.util.HashMap.resize(HashMap.java:714) ~[?:1.8.0_162]
> at java.util.HashMap.putVal(HashMap.java:663) ~[?:1.8.0_162]
> at java.util.HashMap.put(HashMap.java:612) ~[?:1.8.0_162]
> at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:48) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
> at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.StaggeredRequest.onResponse(StaggeredRequest.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1302) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1205) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1340) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.jgroups.JChannel.up(JChannel.java:819) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.Discovery.up(Discovery.java:267) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> {noformat}
> Immediately suspected that plain HashMap is being modified from multiple threads without proper synchronization. But since I'm new to Infinispan, I can't know whether the usage of plain HashMap in NonTxInvocationContext is intentional and Infinispan code is relying on some external synchronization to be performed by other parts of system that calls NonTxInvocationContext.[lookupEntry|removeLookedUpEntry|putLookedUpEntry|getLookedUpEntries]. In any way, if this external synchronization is taking place, it has a bug and allows calling these methods concurrently without proper synchronization.
> I did some experiments to prove that and also to obtain stack traces of at least two of involved threads, but trivial things that I tried instrumenting those methods all involved some kind of synchronization and the symptom disappeared. I had to be extra smart to obtain the stack traces of two threads without inter-thread synchronization so that I could detect the data race. Here's how I instrumented NonTxInvocationContext code:
> {noformat}
> Index: core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> --- core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java (revision 6014e751c20daba1f00e23168281e02afb209905)
> +++ core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java (date 1519744397000)
> @@ -22,6 +22,7 @@
> private Set<Object> lockedKeys;
> private Object lockOwner;
>
> + private final DataRacer dataRacer = new DataRacer();
>
> public NonTxInvocationContext(int numEntries, Address origin) {
> super(origin);
> @@ -35,22 +36,26 @@
>
> @Override
> public CacheEntry lookupEntry(Object k) {
> + dataRacer.detect();
> return lookedUpEntries.get(k);
> }
>
> @Override
> public void removeLookedUpEntry(Object key) {
> + dataRacer.detectAndWrite();
> lookedUpEntries.remove(key);
> }
>
> @Override
> public void putLookedUpEntry(Object key, CacheEntry e) {
> + dataRacer.detectAndWrite();
> lookedUpEntries.put(key, e);
> }
>
> @Override
> @SuppressWarnings("unchecked")
> public Map<Object, CacheEntry> getLookedUpEntries() {
> + dataRacer.detect();
> return (Map<Object, CacheEntry>)
> (lookedUpEntries == null ?
> Collections.emptyMap() : lookedUpEntries);
> {noformat}
> Attached, you will find the DataRacer source. While running with such patched Infinispan, I got some of the following:
> {noformat}
> java.lang.IllegalThreadStateException: Thread[remote-thread--p2-t4,5,main]: data race detected with writer of state: 271f7814a414 - see suppressed for writer thread stack trace
> at org.infinispan.context.impl.DataRacer.detect(DataRacer.java:37) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.NonTxInvocationContext.lookupEntry(NonTxInvocationContext.java:39) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.addRemoteGet(NonTxDistributionInterceptor.java:405) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleRemoteReadWriteManyCommand(NonTxDistributionInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleReadWriteManyCommand(NonTxDistributionInterceptor.java:303) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.visitPutMapCommand(NonTxDistributionInterceptor.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenAccept(BaseAsyncInterceptor.java:98) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.setSkipRemoteGetsAndInvokeNextForManyEntriesCommand(EntryWrappingInterceptor.java:614) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.visitPutMapCommand(EntryWrappingInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.handleWriteManyCommand(NonTransactionalLockingInterceptor.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitPutMapCommand(AbstractLockingInterceptor.java:179) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:306) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleWriteCommand(StateTransferInterceptor.java:252) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.visitPutMapCommand(StateTransferInterceptor.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.CacheMgmtInterceptor.visitPutMapCommand(CacheMgmtInterceptor.java:154) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.visitPutMapCommand(DDAsyncInterceptor.java:90) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommandAsync(BaseRpcInvokingCommand.java:63) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.remote.SingleRpcCommand.invokeAsync(SingleRpcCommand.java:57) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> Suppressed: org.infinispan.context.impl.DataRacer$StackTrace: Thread[jgroups-13,tbd2-40757,5,main], state: 271f7814a414
> at org.infinispan.context.impl.DataRacer.write(DataRacer.java:59) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.DataRacer.detectAndWrite(DataRacer.java:49) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:51) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
> at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.StaggeredRequest.onResponse(StaggeredRequest.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1302) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1205) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1340) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.jgroups.JChannel.up(JChannel.java:819) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.Discovery.up(Discovery.java:267) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> {noformat}
> So here you have it. Either using plain HashMap in NonTxInvocationContext is wrong and ConcurrentHashMap should be used instead or some external synchronization has a bug and is inadequate. I hope this helps to fix it.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 4 months
[JBoss JIRA] (ISPN-8890) TransactionXaAdapter does not implement Geronimo's NamedXaResource
by Dan Berindei (JIRA)
Dan Berindei created ISPN-8890:
----------------------------------
Summary: TransactionXaAdapter does not implement Geronimo's NamedXaResource
Key: ISPN-8890
URL: https://issues.jboss.org/browse/ISPN-8890
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.2.0.CR2
Reporter: Dan Berindei
Fix For: 9.3.0.Final
Geronimo uses its own interface for logging transaction resources, {{NamedXAResource}}. Our {{TransactionXaAdapter}} doesn't implement it, so this error is logged for each transaction in Karaf:
{noformat}
2018-02-28T09:49:05,931 | ERROR | testng-TransactionsSpanningReplicatedCachesTest-9 | Transaction | 68 - org.apache.aries.transaction.manager - 1.3.3 | Please correct the integration and supply a NamedXAResource
java.lang.IllegalStateException: Cannot log transactions as TransactionXaAdapter{localTransaction=LocalXaTransaction{xid=[Xid:globalId=ffffff96ffffff9561ffffffdb611006f72672e6170616368652e61726965732e7472616e73616374696f6e0000000000000000000000000000,length=64,branchId=1000ffffff84ffffff9561ffffffdb611006170616368652e61726965732e7472616e73616374696f6e0000000000000000000000000000,length=64]} LocalTransaction{remoteLockedNodes=[TransactionsSpanningReplicatedCachesTest-NodeS-28818, TransactionsSpanningReplicatedCachesTest-NodeT-4724], isMarkedForRollback=false, lockedKeys=[], backupKeyLocks=[a], topologyId=5, stateTransferFlag=null} org.infinispan.transaction.xa.LocalXaTransaction@1f} is not a NamedXAResource.
at org.apache.geronimo.transaction.manager.TransactionImpl$TransactionBranch.getResourceName(TransactionImpl.java:781) ~[?:?]
at org.apache.geronimo.transaction.log.HOWLLog.prepare(HOWLLog.java:287) ~[?:?]
at org.apache.geronimo.transaction.manager.TransactionImpl.internalPrepare(TransactionImpl.java:467) ~[?:?]
at org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:312) ~[?:?]
at org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:252) ~[?:?]
at Proxy325c2276_ef50_4396_9e7d_5deaa002f72c.commit(Unknown Source) ~[?:?]
at org.infinispan.tx.TransactionsSpanningReplicatedCachesTest.runTest(TransactionsSpanningReplicatedCachesTest.java:260) ~[?:?]
at org.infinispan.tx.TransactionsSpanningReplicatedCachesTest.testDefaultCacheAndNamedCacheSameNode(TransactionsSpanningReplicatedCachesTest.java:240) ~[?:?]
at org.infinispan.it.osgi.tx.TransactionsSpanningReplicatedCachesTest.testDefaultCacheAndNamedCacheSameNode(TransactionsSpanningReplicatedCachesTest.java:91) ~[?:?]
{noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 4 months
[JBoss JIRA] (ISPN-8885) Multiple primary owners during conflict resolution
by Ryan Emerson (JIRA)
[ https://issues.jboss.org/browse/ISPN-8885?page=com.atlassian.jira.plugin.... ]
Ryan Emerson updated ISPN-8885:
-------------------------------
Status: Open (was: New)
> Multiple primary owners during conflict resolution
> --------------------------------------------------
>
> Key: ISPN-8885
> URL: https://issues.jboss.org/browse/ISPN-8885
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.0.CR3
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Fix For: 9.2.0.Final
>
>
> During the CONFLICT_RESOLUTION topology phase the pending CH is incorrectly set which results in the TriangleDistributionInterceptor (and other distributor interceptors) on multiple nodes thinking they are the primary owner of a key. Consequently, cache write commands during the CONFLICT_RESOLUTION phase are committed multiple times.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 4 months