November 2017 - infinispan-issues

[JBoss JIRA] (ISPN-8452) KUBE_PING doesn't work with JDG 7.2 OpenShift image

by Sebastian Łaskawiec (JIRA)

[ https://issues.jboss.org/browse/ISPN-8452?page=com.atlassian.jira.plugin.... ] Sebastian Łaskawiec updated ISPN-8452: -------------------------------------- Status: Pull Request Sent (was: Open) Git Pull Request: https://github.com/jboss-container-images/jboss-datagrid-7-openshift-imag... > KUBE_PING doesn't work with JDG 7.2 OpenShift image > --------------------------------------------------- > > Key: ISPN-8452 > URL: https://issues.jboss.org/browse/ISPN-8452 > Project: Infinispan > Issue Type: Bug > Components: Cloud Integrations > Reporter: Sebastian Łaskawiec > Assignee: Sebastian Łaskawiec > Priority: Blocker > > It seems that KUBE_PING doesn't work with the latest version of JDG 7.2 OpenShift image. The reason for this is that JDG 7.2 is shipped with upstream KUBE_PING which conflicts with the one provided in the openshift layer. > The easiest solution is to revert layer ordering {{layers.conf}} file: > {code} > echo "layers=base,openshift" > /opt/datagrid/modules/layers.conf > {code} > This solution has additional advantage - we can fallback to previous KUBE_PING version (the one provided by the CE Team) by just changing this small line. > However if the upstream KUBE_PING works fine and we are happy with it, we should remove the old version from the server. I think that's a solution we should take for JDG 8 or maybe sooner - 7.3. -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 1 month

1
0
0 / 0

[JBoss JIRA] (ISPN-8452) KUBE_PING doesn't work with JDG 7.2 OpenShift image

by Sebastian Łaskawiec (JIRA)

[ https://issues.jboss.org/browse/ISPN-8452?page=com.atlassian.jira.plugin.... ] Sebastian Łaskawiec updated ISPN-8452: -------------------------------------- Status: Open (was: New) > KUBE_PING doesn't work with JDG 7.2 OpenShift image > --------------------------------------------------- > > Key: ISPN-8452 > URL: https://issues.jboss.org/browse/ISPN-8452 > Project: Infinispan > Issue Type: Bug > Components: Cloud Integrations > Reporter: Sebastian Łaskawiec > Assignee: Sebastian Łaskawiec > Priority: Blocker > > It seems that KUBE_PING doesn't work with the latest version of JDG 7.2 OpenShift image. The reason for this is that JDG 7.2 is shipped with upstream KUBE_PING which conflicts with the one provided in the openshift layer. > The easiest solution is to revert layer ordering {{layers.conf}} file: > {code} > echo "layers=base,openshift" > /opt/datagrid/modules/layers.conf > {code} > This solution has additional advantage - we can fallback to previous KUBE_PING version (the one provided by the CE Team) by just changing this small line. > However if the upstream KUBE_PING works fine and we are happy with it, we should remove the old version from the server. I think that's a solution we should take for JDG 8 or maybe sooner - 7.3. -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 1 month

1
0
0 / 0

[JBoss JIRA] (ISPN-7070) CacheNotifierImplInitialTransferDistTest.testIterationBeganAndSegmentNotComplete random failures

by William Burns (JIRA)

[ https://issues.jboss.org/browse/ISPN-7070?page=com.atlassian.jira.plugin.... ] William Burns commented on ISPN-7070: ------------------------------------- So the log shows that the REMOVE event was sent back to the node where the listener lives, but for some reason it wasn't notified.... {code} 12:54:48,687 TRACE (remote-thread-Test-NodeA-p49731-t6:[]) [ClusterEventCallable] Received cluster event(s) [ClusterEvent {type=CACHE_ENTRY_REMOVED, cache=Cache 'DistInitialTransferListener'@Test-NodeA-2391, key=key-to-change-0, value=null, oldValue=initial-value, transaction=null, retryCommand=false, origin=Test-NodeB-8079}], notifying cluster listener with id a9ff1f8a-10de-4eab-a545-93da4e734e1a 12:54:48,688 TRACE (remote-thread-Test-NodeA-p49731-t6:[]) [JGroupsTransport] Test-NodeA-2391 sending response for request 9 to Test-NodeB-8079: SuccessfulResponse(null) {code} This shows that the remove command was enqueued into the listener as it should have been. Also looking at the trace I see that a large subset of segments were not completed before the test failed. I am not quite sure how this was possible though as the segments should have been all completed. This could be a timing issue with the test or a bug with the segment notifications. I will have to look further. Also I found a bug with the test itself. This test is designed to have the operation performed after the listener has been notified of the given entry. But somehow the operation can be fired before this notification and thus only providing 10 events instead of 12. In my testing this failure occurs much more frequently and shouldn't be that hard to fix. > CacheNotifierImplInitialTransferDistTest.testIterationBeganAndSegmentNotComplete random failures > ------------------------------------------------------------------------------------------------ > > Key: ISPN-7070 > URL: https://issues.jboss.org/browse/ISPN-7070 > Project: Infinispan > Issue Type: Bug > Components: Core, Test Suite - Core > Affects Versions: 9.0.0.Alpha4 > Reporter: Dan Berindei > Assignee: William Burns > Priority: Critical > Labels: testsuite_stability > Fix For: 9.0.0.CR4 > > Attachments: CacheNotifierImplInitialTransferDistTest_ISPN-5467_CompletableFuture-like_API_20161003.log, CacheNotifierImplInitialTransferDistTest_ISPN-7919_RpcManager_ResponseCollector_20171107.log.gz > > > Looks like sometimes the originator listener ignores the CACHE_ENTRY_REMOVED notification: > {noformat} > 09:58:57,877 INFO (testng-Test:[]) [TestSuiteProgress] Test starting: org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest.testRemoveAfterIterationBeganAndSegmentNotCompleteValueNonOwnerClustered > 09:58:58,913 DEBUG (testng-Test:[]) [Test] Found key key-to-change-0 with primary owner != Test-NodeA-7051, segment 60 > 09:58:58,930 TRACE (remote-thread-Test-NodeA-p23726-t6:[]) [ReadCommittedEntry] Updating entry (key=key-to-change-0 removed=false valid=true changed=true created=true value=initial-value metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, providedMetadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}) > 09:58:58,972 TRACE (remote-thread-Test-NodeC-p23887-t6:[]) [ReadCommittedEntry] Updating entry (key=key-to-change-0 removed=false valid=true changed=true created=true value=initial-value metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, providedMetadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}) > 09:58:59,014 TRACE (ForkThread-1,Test:[]) [Test] Adding segment listener 2a41ec3f-c79f-4b7d-bd0e-8754170d8fc1 : org.infinispan.notifications.cachelistener.DistributedQueueingSegmentListener@68007f8 > 09:58:59,014 TRACE (ForkThread-1,Test:[]) [CacheNotifierImpl] Replicating cluster listener to other nodes [Test-NodeA-7051, Test-NodeB-10318, Test-NodeC-52778] for cluster listener with id 2a41ec3f-c79f-4b7d-bd0e-8754170d8fc1 > 09:58:58,998 TRACE (testng-Test:[]) [CheckPoint] Waiting for event pre_complete_segment_invoked * 1 > 09:58:59,063 TRACE (remote-thread-Test-NodeC-p23887-t6:[]) [ClusterListenerReplicateCallable] Registered local cluster listener for remote cluster listener from origin Test-NodeA-7051 with id 2a41ec3f-c79f-4b7d-bd0e-8754170d8fc1 > 09:58:59,098 TRACE (remote-thread-Test-NodeB-p23775-t6:[]) [ClusterListenerReplicateCallable] Registered local cluster listener for remote cluster listener from origin Test-NodeA-7051 with id 2a41ec3f-c79f-4b7d-bd0e-8754170d8fc1 > 09:58:59,145 TRACE (ForkThread-1,Test:[]) [CacheNotifierImpl] Listener 2a41ec3f-c79f-4b7d-bd0e-8754170d8fc1 requests initial state for cache > 09:58:59,373 TRACE (ForkThread-1,Test:[]) [Test] Completed segments [48, 2, 229, 118, 71, 72, 233, 60] > 09:58:59,373 TRACE (ForkThread-1,Test:[]) [Test] Listener received event EventImpl{type=CACHE_ENTRY_CREATED, pre=false, cache=Cache 'DistInitialTransferListener'@Test-NodeA-7051, key=key-to-change-0, value=initial-value, oldValue=null, transaction=null, originLocal=true, transactionSuccessful=false, entries=null, created=false} > 09:58:59,373 TRACE (ForkThread-1,Test:[]) [Test] Notified for key key-to-change-0 > 09:58:59,373 TRACE (ForkThread-1,Test:[]) [CheckPoint] Triggering event pre_complete_segment_invoked * 1 (available = 1, total = 1) > 09:58:59,373 TRACE (ForkThread-1,Test:[]) [CheckPoint] Waiting for event pre_complete_segment_released * 1 > 09:58:59,373 TRACE (testng-Test:[]) [CheckPoint] Received event pre_complete_segment_invoked * 1 (available = 0, total = 1) > 09:58:59,577 TRACE (remote-thread-Test-NodeA-p23726-t6:[]) [ReadCommittedEntry] Updating entry (key=key-to-change-0 removed=true valid=false changed=true created=false value=initial-value metadata=EmbeddedMetadata{version=null}, providedMetadata=null) > 09:58:59,661 TRACE (remote-thread-Test-NodeC-p23887-t5:[]) [ReadCommittedEntry] Updating entry (key=key-to-change-0 removed=true valid=false changed=true created=false value=initial-value metadata=EmbeddedMetadata{version=null}, providedMetadata=null) > 09:58:59,661 TRACE (remote-thread-Test-NodeC-p23887-t5:[]) [RequestCorrelator] Test-NodeC-52778: invoking unicast RPC [req-id=91126] on Test-NodeA-7051 > 09:58:59,684 TRACE (OOB-2,Test-NodeA-7051:[]) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 91126 > 09:58:59,779 TRACE (remote-thread-Test-NodeA-p23726-t6:[]) [NonTotalOrderPerCacheInboundInvocationHandler] Calling perform() on DistributedExecuteCommand [cache=Cache 'DistInitialTransferListener'@Test-NodeA-7051, keys=[], callable=org.infinispan.notifications.cachelistener.cluster.ClusterEventCallable@640b0234] > 09:58:59,780 TRACE (remote-thread-Test-NodeA-p23726-t6:[]) [ClusterEventCallable] Received cluster event(s) [ClusterEvent {type=CACHE_ENTRY_REMOVED, cache=Cache 'DistInitialTransferListener'@Test-NodeA-7051, key=key-to-change-0, value=null, oldValue=initial-value, transaction=null, retryCommand=false, origin=Test-NodeC-52778}], notifying cluster listener with id 2a41ec3f-c79f-4b7d-bd0e-8754170d8fc1 > 09:58:59,837 TRACE (testng-Test:[]) [CheckPoint] Triggering event pre_complete_segment_released * 999999999 (available = 999999999, total = 999999999) > 09:58:59,837 TRACE (ForkThread-1,Test:[]) [CheckPoint] Received event pre_complete_segment_released * 1 (available = 999999998, total = 999999999) > 09:58:59,898 TRACE (ForkThread-1,Test:[]) [CacheNotifierImpl] Listener 2a41ec3f-c79f-4b7d-bd0e-8754170d8fc1 initial state for cache completed > 09:58:59,916 TRACE (remote-thread-Test-NodeB-p23775-t6:[]) [ClusterListenerRemoveCallable] Removing local cluster listener due to parent cluster listener was removed : 2a41ec3f-c79f-4b7d-bd0e-8754170d8fc1 > 09:58:59,921 TRACE (remote-thread-Test-NodeC-p23887-t5:[]) [ClusterListenerRemoveCallable] Removing local cluster listener due to parent cluster listener was removed : 2a41ec3f-c79f-4b7d-bd0e-8754170d8fc1 > 09:58:59,943 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest.testRemoveAfterIterationBeganAndSegmentNotCompleteValueNonOwnerClustered > java.lang.AssertionError: expected [12] but found [11] > {noformat} > {{CacheNotifierImplInitialTransferDistTest}} also had a different problem: it replaced the {{CacheNotifier}} component with a clone before adding the listener, but replaced it again with the original before removing the listener. Because the listener's uuid was only registered in the clone, {{removeListener}} couldn't find the uuid and disn't try to remove the listener on the other nodes. This meant the listener was leaked, resulting in logs like this: > {noformat} > 20:12:04,098 TRACE (remote-thread-Test-NodeB-p24127-t6:[]) [ReadCommittedEntry] Updating entry (key=key-to-change-0 removed=true valid=false changed=true created=false value=initial-value metadata=EmbeddedMetadata{version=null}, providedMetadata=null) > 20:12:04,099 TRACE (remote-thread-Test-NodeB-p24127-t6:[]) [RemoteClusterListener] Passing Event to manager EventImpl{type=CACHE_ENTRY_REMOVED, pre=false, cache=Cache 'DistInitialTransferListener'@Test-NodeB-56142, key=key-to-change-0, value=null, oldValue=initial-value, transaction=null, originLocal=false, transactionSuccessful=false, entries=null, created=false} to send to Test-NodeA-38047 > 20:12:04,100 TRACE (remote-thread-Test-NodeB-p24127-t6:[]) [RemoteClusterListener] Passing Event to manager EventImpl{type=CACHE_ENTRY_REMOVED, pre=false, cache=Cache 'DistInitialTransferListener'@Test-NodeB-56142, key=key-to-change-0, value=null, oldValue=initial-value, transaction=null, originLocal=false, transactionSuccessful=false, entries=null, created=false} to send to Test-NodeA-38047 > 20:12:04,100 TRACE (remote-thread-Test-NodeB-p24127-t6:[]) [RemoteClusterListener] Passing Event to manager EventImpl{type=CACHE_ENTRY_REMOVED, pre=false, cache=Cache 'DistInitialTransferListener'@Test-NodeB-56142, key=key-to-change-0, value=null, oldValue=initial-value, transaction=null, originLocal=false, transactionSuccessful=false, entries=null, created=false} to send to Test-NodeA-38047 > 20:12:04,100 TRACE (remote-thread-Test-NodeB-p24127-t6:[]) [RemoteClusterListener] Passing Event to manager EventImpl{type=CACHE_ENTRY_REMOVED, pre=false, cache=Cache 'DistInitialTransferListener'@Test-NodeB-56142, key=key-to-change-0, value=null, oldValue=initial-value, transaction=null, originLocal=false, transactionSuccessful=false, entries=null, created=false} to send to Test-NodeA-38047 > 20:12:04,100 TRACE (remote-thread-Test-NodeB-p24127-t6:[]) [RemoteClusterListener] Passing Event to manager EventImpl{type=CACHE_ENTRY_REMOVED, pre=false, cache=Cache 'DistInitialTransferListener'@Test-NodeB-56142, key=key-to-change-0, value=null, oldValue=initial-value, transaction=null, originLocal=false, transactionSuccessful=false, entries=null, created=false} to send to Test-NodeA-38047 > 20:12:04,106 TRACE (remote-thread-Test-NodeA-p24103-t6:[]) [MultiClusterEventCallable] Received multiple cluster event(s) {86f77630-b1bb-4a21-97bb-b03efeae39d6=[ClusterEvent {type=CACHE_ENTRY_REMOVED, cache=Cache 'DistInitialTransferListener'@Test-NodeA-38047, key=key-to-change-0, value=null, oldValue=initial-value, transaction=null, retryCommand=false, origin=Test-NodeB-56142}], 91d05a50-4a35-4e37-8c5c-bffbabc6154b=[ClusterEvent {type=CACHE_ENTRY_REMOVED, cache=Cache 'DistInitialTransferListener'@Test-NodeA-38047, key=key-to-change-0, value=null, oldValue=initial-value, transaction=null, retryCommand=false, origin=Test-NodeB-56142}], ba09480f-c698-431b-8fe6-213f1bdefc7b=[ClusterEvent {type=CACHE_ENTRY_REMOVED, cache=Cache 'DistInitialTransferListener'@Test-NodeA-38047, key=key-to-change-0, value=null, oldValue=initial-value, transaction=null, retryCommand=false, origin=Test-NodeB-56142}], 992da014-0516-4f2a-8093-50ba3b1f358a=[ClusterEvent {type=CACHE_ENTRY_REMOVED, cache=Cache 'DistInitialTransferListener'@Test-NodeA-38047, key=key-to-change-0, value=null, oldValue=initial-value, transaction=null, retryCommand=false, origin=Test-NodeB-56142}], b4eeb9b3-85c4-4ff9-865e-076b7cf93faa=[ClusterEvent {type=CACHE_ENTRY_REMOVED, cache=Cache 'DistInitialTransferListener'@Test-NodeA-38047, key=key-to-change-0, value=null, oldValue=initial-value, transaction=null, retryCommand=false, origin=Test-NodeB-56142}]} > {noformat} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 1 month

1
0
0 / 0

[JBoss JIRA] (ISPN-8540) Investigate off-heap and eviction memory usage

by Sebastian Łaskawiec (JIRA)

Sebastian Łaskawiec created ISPN-8540: ----------------------------------------- Summary: Investigate off-heap and eviction memory usage Key: ISPN-8540 URL: https://issues.jboss.org/browse/ISPN-8540 Project: Infinispan Issue Type: Task Components: Cloud Integrations Reporter: Sebastian Łaskawiec Assignee: William Burns With the [latest changes|https://github.com/jboss-container-images/jboss-dataservices-imag...], online services calibrate eviction as follows: * Set {{Xmx}} to 100 MB * Set Additional JVM overhead to 200 * Calculate {{-XX:MaxRAM}} as sum of the above * Add 30% of the container capacity overhead * Calculate eviction size as Container size - Xmx - Additional JVM overhead - 30% of container capacity. This renders the following configuration (this is just an example): {code} <subsystem xmlns="urn:infinispan:server:core:8.5"> <cache-container name="clustered" default-cache="default" statistics="true"> <transport lock-timeout="60000"/> <global-state/> <async-operations-thread-pool min-threads="1"/> <transport-thread-pool min-threads="1"/> <distributed-cache-configuration name="caching-service" owners="1"> <memory> <off-heap size="65011712" eviction="MEMORY"/> </memory> <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="REMOVE_ALL"/> </distributed-cache-configuration> <distributed-cache name="default" configuration="caching-service"/> <distributed-cache name="memcachedCache" configuration="caching-service"/> <distributed-cache-configuration name="default"/> <distributed-cache-configuration name="memcachedCache"/> </cache-container> </subsystem> {code} The configuration above doesn't seem to work very well. The results spreadsheet might be found [here|https://docs.google.com/spreadsheets/d/1HbOhwze_L4EQi06R1sq9brK5zlom...]. In particular when calibrated to 512 MB container it leaves 22 MB of free space in the container, similar configuration for 430 MB container leaves 33 MB. This trend goes even worse when it comes to smaller containers. What's even more surprising, the value calculated for total overhead (Xmx + JVM Overhead + 30% of container capacity) doesn't seem to be linear. It's quadratic (sort of). This results in very low amount of data we can actually store in the grid. Those figures seem quite low and before moving investing more time into calibrating this thing I would like to make sure there is no bug there and eviction works as expected. This can also be reproduced using local standalone Infinispan/JDG installation. All you need to do is to calculate values manually and figure out Xmx/MaxRAM parameters (full list of parameters used by the container might be found [here|https://gist.github.com/slaskawi/5c270d1ba3ec3e1752faf18bfd00a70c#fi...]. After booting up the server, run 1K key/val load against it (JMH works quite nicely). On the second terminal window, check the {{pmem -x}} output. The second column is RSS, which is quite similar to what CGroups see. Depending what you're testing, going beyond 512MB/1G at any point of time would result in killing a container in real world. -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 1 month

1
0
0 / 0

[JBoss JIRA] (ISPN-8537) Off-heap tx tests random failures

by William Burns (JIRA)

[ https://issues.jboss.org/browse/ISPN-8537?page=com.atlassian.jira.plugin.... ] William Burns updated ISPN-8537: -------------------------------- Status: Resolved (was: Pull Request Sent) Resolution: Done > Off-heap tx tests random failures > --------------------------------- > > Key: ISPN-8537 > URL: https://issues.jboss.org/browse/ISPN-8537 > Project: Infinispan > Issue Type: Bug > Components: Test Suite - Core > Affects Versions: 9.2.0.Beta1 > Reporter: Dan Berindei > Assignee: Dan Berindei > Labels: testsuite_stability > Fix For: 9.2.0.Beta2 > > > The new tests added in for ISPN-8428 define a factory method, but do not use it correctly: tests extending {{MultipleCacheManagerTest}} are supposed to override {{factory()}} without adding a {{@Factory}} annotation, and subclass are supposed to have their own override. > Because of this, subclasses actually create new instances of the base class. {{TestResourceTracker}} doesn't like it when multiple tests with the same name run in parallel, so this causes random failures: > {noformat} > 15:45:04,513 ERROR (testng-Test:[]) [TestSuiteProgress] Test setup failed: org.infinispan.tx.ContextAffectsTransactionReadCommittedTest.testClassStarted > java.lang.IllegalStateException: Two tests with the same name running in parallel: org.infinispan.tx.Test > at org.infinispan.test.fwk.TestResourceTracker.testStarted(TestResourceTracker.java:93) ~[test-classes/:?] > at org.infinispan.test.AbstractInfinispanTest.testClassStarted(AbstractInfinispanTest.java:112) ~[test-classes/:?] > {noformat} > {noformat} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 1 month

1
0
0 / 0

[JBoss JIRA] (ISPN-8500) Off-heap returns expired entries

by William Burns (JIRA)

[ https://issues.jboss.org/browse/ISPN-8500?page=com.atlassian.jira.plugin.... ] William Burns updated ISPN-8500: -------------------------------- Git Pull Request: https://github.com/infinispan/infinispan/pull/5567, https://github.com/infinispan/infinispan/pull/5583 (was: https://github.com/infinispan/infinispan/pull/5567) > Off-heap returns expired entries > -------------------------------- > > Key: ISPN-8500 > URL: https://issues.jboss.org/browse/ISPN-8500 > Project: Infinispan > Issue Type: Bug > Components: Expiration, Off Heap > Affects Versions: 9.1.0.Final, 9.2.0.Alpha2 > Reporter: Vojtech Juranek > Assignee: Vojtech Juranek > > When expiration is turned on and off-heap is used, data container returns also entries which should be expired. -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 1 month

1
0
0 / 0

[JBoss JIRA] (ISPN-8539) Off Heap Expiration doesn't work with non EmbeddedMetadata

by William Burns (JIRA)

William Burns created ISPN-8539: ----------------------------------- Summary: Off Heap Expiration doesn't work with non EmbeddedMetadata Key: ISPN-8539 URL: https://issues.jboss.org/browse/ISPN-8539 Project: Infinispan Issue Type: Bug Components: Off Heap Affects Versions: 9.1.3.Final Reporter: William Burns Currently off heap just writes the custom metadata as serialized bytes. Unfortunately this isn't sufficient as it also would have to write the time it was accessed or created when using either expiration mode. We also may want to find out if we even need to persist all custom metadata. For example memcached metadata seems fine to just treat as embedded. -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 1 month

1
0
0 / 0

[JBoss JIRA] (ISPN-8538) TwoWaySplitAndMerge[DIST_SYNC] deadlock in conflict resolution

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-8538?page=com.atlassian.jira.plugin.... ] Dan Berindei updated ISPN-8538: ------------------------------- Status: Open (was: New) > TwoWaySplitAndMerge[DIST_SYNC] deadlock in conflict resolution > -------------------------------------------------------------- > > Key: ISPN-8538 > URL: https://issues.jboss.org/browse/ISPN-8538 > Project: Infinispan > Issue Type: Bug > Components: Core, Test Suite - Core > Affects Versions: 9.2.0.Beta1 > Reporter: Dan Berindei > Assignee: Ryan Emerson > Labels: testsuite_stability > Fix For: 9.2.0.Beta2 > > Attachments: TwoWaySplitAndMergeTest_ISPN-factory_problems_20171115.log.gz, stack_20171115_1505.txt > > > This might be related to me having trace logging enabled. Main thread is stuck in > Main thread: > {noformat} > "testng-TwoWaySplitAndMergeTest[DIST_SYNC]" #27 prio=5 os_prio=0 tid=0x00007f52a4faa000 nid=0x3f93 waiting for monitor entry [0x00007f51feaf2000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.infinispan.topology.ClusterCacheStatus.doLeave(ClusterCacheStatus.java:723) > - waiting to lock <0x00000000e0d4c1a8> (a org.infinispan.topology.ClusterCacheStatus) > at org.infinispan.topology.ClusterTopologyManagerImpl.handleLeave(ClusterTopologyManagerImpl.java:250) > at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:186) > at org.infinispan.topology.CacheTopologyControlCommand.invokeAsync(CacheTopologyControlCommand.java:166) > at org.infinispan.commands.ReplicableCommand.invoke(ReplicableCommand.java:44) > at org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinator(LocalTopologyManagerImpl.java:692) > at org.infinispan.topology.LocalTopologyManagerImpl.leave(LocalTopologyManagerImpl.java:197) > at org.infinispan.statetransfer.StateTransferManagerImpl.stop(StateTransferManagerImpl.java:264) > at sun.reflect.GeneratedMethodAccessor222.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.infinispan.commons.util.SecurityActions.lambda$invokeAccessibly$0(SecurityActions.java:91) > at org.infinispan.commons.util.SecurityActions$$Lambda$159/273460605.run(Unknown Source) > at org.infinispan.commons.util.SecurityActions.doPrivileged(SecurityActions.java:83) > at org.infinispan.commons.util.SecurityActions.invokeAccessibly(SecurityActions.java:88) > at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:165) > at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:883) > at org.infinispan.factories.AbstractComponentRegistry.internalStop(AbstractComponentRegistry.java:684) > at org.infinispan.factories.AbstractComponentRegistry.stop(AbstractComponentRegistry.java:583) > - locked <0x00000000c83a6ad8> (a org.infinispan.factories.ComponentRegistry) > at org.infinispan.factories.ComponentRegistry.stop(ComponentRegistry.java:259) > at org.infinispan.cache.impl.CacheImpl.performImmediateShutdown(CacheImpl.java:1046) > at org.infinispan.cache.impl.CacheImpl.stop(CacheImpl.java:1010) > at org.infinispan.cache.impl.AbstractDelegatingCache.stop(AbstractDelegatingCache.java:420) > at org.infinispan.manager.DefaultCacheManager.terminate(DefaultCacheManager.java:687) > at org.infinispan.manager.DefaultCacheManager.stopCaches(DefaultCacheManager.java:727) > at org.infinispan.manager.DefaultCacheManager.stop(DefaultCacheManager.java:704) > at org.infinispan.test.TestingUtil.killCacheManagers(TestingUtil.java:774) > at org.infinispan.test.MultipleCacheManagersTest.clearContent(MultipleCacheManagersTest.java:146) > Locked ownable synchronizers: > - <0x00000000c464a9d8> (a java.util.concurrent.ThreadPoolExecutor$Worker) > - <0x00000000e12f5a60> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) > {noformat} > Two other threads are involved: > {noformat} > "transport-thread-TwoWaySplitAndMergeTest[DIST_SYNC]-NodeA-p26871-t1" #93288 daemon prio=5 os_prio=0 tid=0x00007f51cc03d800 nid=0x4ab0 waiting for monitor entry [0x00007f515dd70000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest.cancel(StateReceiverImpl.java:213) > - waiting to lock <0x00000000c7476a38> (a org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest) > at org.infinispan.conflict.impl.StateReceiverImpl.stop(StateReceiverImpl.java:76) > - locked <0x00000000e0d60fe8> (a org.infinispan.conflict.impl.StateReceiverImpl) > at org.infinispan.conflict.impl.DefaultConflictManager$ReplicaSpliterator.tryAdvance(DefaultConflictManager.java:463) > at java.util.Spliterator.forEachRemaining(Spliterator.java:326) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > at org.infinispan.conflict.impl.DefaultConflictManager.doResolveConflicts(DefaultConflictManager.java:265) > at org.infinispan.conflict.impl.DefaultConflictManager.resolveConflicts(DefaultConflictManager.java:252) > at org.infinispan.topology.ClusterCacheStatus.updateTopologiesAfterMerge(ClusterCacheStatus.java:193) > - locked <0x00000000e0d4c1a8> (a org.infinispan.topology.ClusterCacheStatus) > at org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.onPartitionMerge(PreferAvailabilityStrategy.java:197) > at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:597) > - locked <0x00000000e0d4c1a8> (a org.infinispan.topology.ClusterCacheStatus) > at org.infinispan.topology.ClusterTopologyManagerImpl.lambda$recoverClusterStatus$6(ClusterTopologyManagerImpl.java:528) > at org.infinispan.topology.ClusterTopologyManagerImpl$$Lambda$753/1411554646.run(Unknown Source) > at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144) > at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33) > at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: > - <0x00000000cef073e8> (a java.util.concurrent.ThreadPoolExecutor$Worker) > {noformat} > {noformat} > "remote-thread-TwoWaySplitAndMergeTest[DIST_SYNC]-NodeA-p26869-t6" #93412 daemon prio=5 os_prio=0 tid=0x00007f5238071000 nid=0x4b3c waiting for monitor entry [0x00007f51870ee000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest.clear(StateReceiverImpl.java:182) > - waiting to lock <0x00000000e0d60fe8> (a org.infinispan.conflict.impl.StateReceiverImpl) > - locked <0x00000000c7476a38> (a org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest) > at org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest.cancel(StateReceiverImpl.java:220) > - locked <0x00000000c7476a38> (a org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest) > at org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest.lambda$requestState$1(StateReceiverImpl.java:164) > at org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest$$Lambda$1997/1053695950.apply(Unknown Source) > at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) > at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) > at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at org.infinispan.statetransfer.InboundTransferTask.notifyCompletion(InboundTransferTask.java:245) > at org.infinispan.statetransfer.InboundTransferTask.onStateReceived(InboundTransferTask.java:238) > at org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest.receiveState(StateReceiverImpl.java:205) > - locked <0x00000000c7476a38> (a org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest) > at org.infinispan.conflict.impl.StateReceiverImpl.receiveState(StateReceiverImpl.java:108) > at org.infinispan.statetransfer.StateResponseCommand.invokeAsync(StateResponseCommand.java:90) > at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:102) > at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) > at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) > at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: > - <0x00000000cf105ce8> (a java.util.concurrent.ThreadPoolExecutor$Worker) > {noformat} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 1 month

1
0
0 / 0

[JBoss JIRA] (ISPN-8537) Off-heap tx tests random failures

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-8537?page=com.atlassian.jira.plugin.... ] Dan Berindei updated ISPN-8537: ------------------------------- Status: Open (was: New) > Off-heap tx tests random failures > --------------------------------- > > Key: ISPN-8537 > URL: https://issues.jboss.org/browse/ISPN-8537 > Project: Infinispan > Issue Type: Bug > Components: Test Suite - Core > Affects Versions: 9.2.0.Beta1 > Reporter: Dan Berindei > Assignee: Dan Berindei > Labels: testsuite_stability > Fix For: 9.2.0.Beta2 > > > The new tests added in for ISPN-8428 define a factory method, but do not use it correctly: tests extending {{MultipleCacheManagerTest}} are supposed to override {{factory()}} without adding a {{@Factory}} annotation, and subclass are supposed to have their own override. > Because of this, subclasses actually create new instances of the base class. {{TestResourceTracker}} doesn't like it when multiple tests with the same name run in parallel, so this causes random failures: > {noformat} > 15:45:04,513 ERROR (testng-Test:[]) [TestSuiteProgress] Test setup failed: org.infinispan.tx.ContextAffectsTransactionReadCommittedTest.testClassStarted > java.lang.IllegalStateException: Two tests with the same name running in parallel: org.infinispan.tx.Test > at org.infinispan.test.fwk.TestResourceTracker.testStarted(TestResourceTracker.java:93) ~[test-classes/:?] > at org.infinispan.test.AbstractInfinispanTest.testClassStarted(AbstractInfinispanTest.java:112) ~[test-classes/:?] > {noformat} > {noformat} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 1 month

1
0
0 / 0

[JBoss JIRA] (ISPN-8537) Off-heap tx tests random failures

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-8537?page=com.atlassian.jira.plugin.... ] Dan Berindei updated ISPN-8537: ------------------------------- Status: Pull Request Sent (was: Open) Git Pull Request: https://github.com/infinispan/infinispan/pull/5591 > Off-heap tx tests random failures > --------------------------------- > > Key: ISPN-8537 > URL: https://issues.jboss.org/browse/ISPN-8537 > Project: Infinispan > Issue Type: Bug > Components: Test Suite - Core > Affects Versions: 9.2.0.Beta1 > Reporter: Dan Berindei > Assignee: Dan Berindei > Labels: testsuite_stability > Fix For: 9.2.0.Beta2 > > > The new tests added in for ISPN-8428 define a factory method, but do not use it correctly: tests extending {{MultipleCacheManagerTest}} are supposed to override {{factory()}} without adding a {{@Factory}} annotation, and subclass are supposed to have their own override. > Because of this, subclasses actually create new instances of the base class. {{TestResourceTracker}} doesn't like it when multiple tests with the same name run in parallel, so this causes random failures: > {noformat} > 15:45:04,513 ERROR (testng-Test:[]) [TestSuiteProgress] Test setup failed: org.infinispan.tx.ContextAffectsTransactionReadCommittedTest.testClassStarted > java.lang.IllegalStateException: Two tests with the same name running in parallel: org.infinispan.tx.Test > at org.infinispan.test.fwk.TestResourceTracker.testStarted(TestResourceTracker.java:93) ~[test-classes/:?] > at org.infinispan.test.AbstractInfinispanTest.testClassStarted(AbstractInfinispanTest.java:112) ~[test-classes/:?] > {noformat} > {noformat} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 1 month

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-issues November 2017