[JBoss JIRA] (ISPN-11679) Bad uneven segment distributions can happen after the rebalancing with SyncConsistentHashFactory
by Masafumi Miura (Jira)
[ https://issues.redhat.com/browse/ISPN-11679?page=com.atlassian.jira.plugi... ]
Masafumi Miura commented on ISPN-11679:
---------------------------------------
Here is the example result of the unit test with 22 nodes:
{code}
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.infinispan.distribution.ch.SyncConsistentHashFactoryKeyDistributionTest
[OK: 0, KO: 0, SKIP: 0] Test starting: org.infinispan.distribution.ch.SyncConsistentHashFactoryKeyDistributionTest.testRebalanceDistribution
Distribution for 22 nodes (relative to the average)
===
Segments = 200 220 400 440 600 660 720 800 880 1000 1100
Any owner: % < 0.60 = 1.46% 0.00% 0.12% 0.00% 0.03% 0.02% 0.01% 0.00% 0.00% 0.00% 0.00%
Any owner: % < 0.70 = 4.54% 0.00% 0.91% 0.00% 0.33% 0.13% 0.12% 0.07% 0.07% 0.03% 0.01%
Any owner: % < 0.80 = 11.69% 0.02% 3.12% 0.00% 2.85% 1.73% 1.66% 1.19% 0.81% 1.03% 0.70%
Any owner: % < 0.90 = 23.57% 3.80% 11.91% 0.45% 12.28% 10.06% 8.65% 8.26% 6.84% 10.01% 8.64%
Any owner: % > 1.10 = 42.02% 0.72% 7.25% 0.15% 7.82% 0.43% 1.80% 1.55% 0.06% 2.04% 0.17%
Any owner: % > 1.15 = 14.77% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Any owner: % > 1.30 = 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Any owner: % > 1.50 = 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Any owner: 99.90% percentile = 1.167 1.150 1.139 1.125 1.130 1.117 1.123 1.125 1.100 1.122 1.110
Any owner: max = 1.167 1.150 1.139 1.125 1.130 1.117 1.123 1.125 1.113 1.122 1.110
Any owner: min = 0.222 0.800 0.444 0.850 0.500 0.550 0.523 0.639 0.650 0.622 0.700
Primary: % < 0.60 = 1.68% 0.00% 0.10% 0.00% 0.03% 0.02% 0.01% 0.00% 0.00% 0.00% 0.00%
Primary: % < 0.70 = 5.32% 0.00% 0.70% 0.00% 0.23% 0.17% 0.11% 0.10% 0.05% 0.04% 0.01%
Primary: % < 0.80 = 13.24% 0.00% 3.56% 0.00% 2.42% 1.96% 1.02% 1.01% 0.89% 1.20% 0.75%
Primary: % < 0.90 = 26.48% 0.01% 13.25% 0.01% 13.74% 11.12% 5.67% 8.96% 7.39% 9.02% 9.20%
Primary: % > 1.10 = 56.32% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Primary: % > 1.15 = 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Primary: % > 1.30 = 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Primary: % > 1.50 = 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Primary: 99.90% percentile = 1.111 1.000 1.056 1.000 1.074 1.033 1.063 1.056 1.025 1.067 1.040
Primary: max = 1.111 1.000 1.056 1.000 1.074 1.033 1.063 1.056 1.025 1.067 1.040
Primary: min = 0.222 1.000 0.444 1.000 0.519 0.567 0.531 0.611 0.675 0.622 0.700
Segments per node - max/min ratio = 5.250 1.438 2.500 1.294 2.222 1.970 2.088 1.717 1.673 1.804 1.549
[OK: 1, KO: 0, SKIP: 0] Test succeeded: org.infinispan.distribution.ch.SyncConsistentHashFactoryKeyDistributionTest.testRebalanceDistribution
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 71.962 s - in org.infinispan.distribution.ch.SyncConsistentHashFactoryKeyDistributionTest
{code}
If the number of the segment is configured as the exact same to "20 * number of nodes" (= 440 in the above example), the test result looks slightly better than other cases. Is this an expected result?
> Bad uneven segment distributions can happen after the rebalancing with SyncConsistentHashFactory
> ------------------------------------------------------------------------------------------------
>
> Key: ISPN-11679
> URL: https://issues.redhat.com/browse/ISPN-11679
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.4.19.Final, 10.1.6.Final
> Reporter: Masafumi Miura
> Priority: Major
>
> Uneven segment distribution can happen after rebalancing with SyncConsistentHashFactory.
> Even if a larger number of segments than "20 * number of nodes" is configured, some nodes have only around 50-60% number of segments than other nodes after the rebalance in the worst case.
> I understand the documentation (like the following) states about the possibility of slight uneven distribution as a potential drawback of SyncConsistentHashFactory.
> [Infinispan 10 configuration guide|https://infinispan.org/docs/stable/titles/configuring/configuring.h...] says:
> {quote}
> This implementation does have some negative points in that the load distribution is slightly uneven. It also moves more segments than strictly necessary on a join or leave.
> {quote}
> [JDG 7 Developer Guide|https://access.redhat.com/documentation/en-us/red_hat_data_grid/7.2...] says:
> {quote}
> It has a potential drawback in that it can move a greater number of segments than necessary during re-balancing. This can be mitigated by using a larger number of segments.
> Another potential drawback is that the segments are not distributed as evenly as possible, and actually using a very large number of segments can make the distribution of segments worse.
> {quote}
> However, only half segments than others is not a slight uneven.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 11 months
[JBoss JIRA] (ISPN-11679) Bad uneven segment distributions can happen after the rebalancing with SyncConsistentHashFactory
by Masafumi Miura (Jira)
[ https://issues.redhat.com/browse/ISPN-11679?page=com.atlassian.jira.plugi... ]
Masafumi Miura updated ISPN-11679:
----------------------------------
Steps to Reproduce:
- Add a unit test "testRebalanceDistribution" in SyncConsistentHashFactoryKeyDistributionTest.
https://github.com/infinispan/infinispan/compare/master...msfm:master_Syn...
- Execute unit test
{code}
$ mvn -pl core test -Dtest=distribution.ch.SyncConsistentHashFactoryKeyDistributionTest#testRebalanceDistribution
{code}
- Check the test result, especially "Primary: max", "Primary: min" and "Segments per node - max/min ratio". You will see "Primary: min" that indicates around half (around 0.500-0.600) and "Segments per node - max/min ratio" indicates around two times (around 1.7-2.0).
> Bad uneven segment distributions can happen after the rebalancing with SyncConsistentHashFactory
> ------------------------------------------------------------------------------------------------
>
> Key: ISPN-11679
> URL: https://issues.redhat.com/browse/ISPN-11679
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.4.19.Final, 10.1.6.Final
> Reporter: Masafumi Miura
> Priority: Major
>
> Uneven segment distribution can happen after rebalancing with SyncConsistentHashFactory.
> Even if a larger number of segments than "20 * number of nodes" is configured, some nodes have only around 50-60% number of segments than other nodes after the rebalance in the worst case.
> I understand the documentation (like the following) states about the possibility of slight uneven distribution as a potential drawback of SyncConsistentHashFactory.
> [Infinispan 10 configuration guide|https://infinispan.org/docs/stable/titles/configuring/configuring.h...] says:
> {quote}
> This implementation does have some negative points in that the load distribution is slightly uneven. It also moves more segments than strictly necessary on a join or leave.
> {quote}
> [JDG 7 Developer Guide|https://access.redhat.com/documentation/en-us/red_hat_data_grid/7.2...] says:
> {quote}
> It has a potential drawback in that it can move a greater number of segments than necessary during re-balancing. This can be mitigated by using a larger number of segments.
> Another potential drawback is that the segments are not distributed as evenly as possible, and actually using a very large number of segments can make the distribution of segments worse.
> {quote}
> However, only half segments than others is not a slight uneven.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 11 months
[JBoss JIRA] (ISPN-11679) Bad uneven segment distributions can happen after the rebalancing with SyncConsistentHashFactory
by Masafumi Miura (Jira)
Masafumi Miura created ISPN-11679:
-------------------------------------
Summary: Bad uneven segment distributions can happen after the rebalancing with SyncConsistentHashFactory
Key: ISPN-11679
URL: https://issues.redhat.com/browse/ISPN-11679
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 10.1.6.Final, 9.4.19.Final
Reporter: Masafumi Miura
Uneven segment distribution can happen after rebalancing with SyncConsistentHashFactory.
Even if a larger number of segments than "20 * number of nodes" is configured, some nodes have only around 50-60% number of segments than other nodes after the rebalance in the worst case.
I understand the documentation (like the following) states about the possibility of slight uneven distribution as a potential drawback of SyncConsistentHashFactory.
[Infinispan 10 configuration guide|https://infinispan.org/docs/stable/titles/configuring/configuring.h...] says:
{quote}
This implementation does have some negative points in that the load distribution is slightly uneven. It also moves more segments than strictly necessary on a join or leave.
{quote}
[JDG 7 Developer Guide|https://access.redhat.com/documentation/en-us/red_hat_data_grid/7.2...] says:
{quote}
It has a potential drawback in that it can move a greater number of segments than necessary during re-balancing. This can be mitigated by using a larger number of segments.
Another potential drawback is that the segments are not distributed as evenly as possible, and actually using a very large number of segments can make the distribution of segments worse.
{quote}
However, only half segments than others is not a slight uneven.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 11 months
[JBoss JIRA] (ISPN-11677) SharedStoreInvalidationDuringRehashTest[SCATTERED] random failure
by Will Burns (Jira)
Will Burns created ISPN-11677:
---------------------------------
Summary: SharedStoreInvalidationDuringRehashTest[SCATTERED] random failure
Key: ISPN-11677
URL: https://issues.redhat.com/browse/ISPN-11677
Project: Infinispan
Issue Type: Bug
Reporter: Will Burns
The failure can be found at https://ci.infinispan.org/job/Infinispan/job/PR-8217/1/testReport/junit/o...
This is probably caused by scattered state transfer is blocking still in some parts
Stack trace in case if the failure gets deleted.
{code}
java.lang.AssertionError: Blocking call! jdk.internal.misc.Unsafe#park on thread Thread[non-blocking-thread-SharedStoreInvalidationDuringRehashTest-NodeC-p6713-t4,5,ISPN-non-blocking-thread-group]
at org.infinispan.util.CoreTestBlockHoundIntegration.lambda$applyTo$0(CoreTestBlockHoundIntegration.java:43)
at reactor.blockhound.BlockHound$Builder.lambda$install$8(BlockHound.java:383)
at reactor.blockhound.BlockHoundRuntime.checkBlocking(BlockHoundRuntime.java:89)
at java.base/jdk.internal.misc.Unsafe.park(Unsafe.java)
at java.base/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
at java.base/java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1798)
at java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3128)
at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1868)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:125)
at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.get(SimpleAsyncInvocationStage.java:36)
at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:246)
at org.infinispan.scattered.impl.ScatteredStateConsumerImpl.applyValues(ScatteredStateConsumerImpl.java:512)
at org.infinispan.scattered.impl.ScatteredStateConsumerImpl.lambda$getValuesAndApply$10(ScatteredStateConsumerImpl.java:471)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883)
at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2251)
at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:143)
at org.infinispan.scattered.impl.ScatteredStateConsumerImpl.getValuesAndApply(ScatteredStateConsumerImpl.java:466)
at org.infinispan.scattered.impl.ScatteredStateConsumerImpl.onTaskCompletion(ScatteredStateConsumerImpl.java:330)
at org.infinispan.scattered.impl.ScatteredStateConsumerImpl.lambda$requestKeyTransfer$1(ScatteredStateConsumerImpl.java:204)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at org.infinispan.statetransfer.InboundTransferTask.notifyCompletion(InboundTransferTask.java:267)
at org.infinispan.statetransfer.InboundTransferTask.onStateReceived(InboundTransferTask.java:260)
at org.infinispan.statetransfer.StateConsumerImpl.lambda$applyChunk$10(StateConsumerImpl.java:645)
at java.base/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:753)
at java.base/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:731)
at java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2108)
at java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:143)
at org.infinispan.statetransfer.StateConsumerImpl.applyChunk(StateConsumerImpl.java:644)
at org.infinispan.statetransfer.StateConsumerImpl.applyStateIteration(StateConsumerImpl.java:617)
at org.infinispan.statetransfer.StateConsumerImpl.lambda$applyStateIteration$8(StateConsumerImpl.java:623)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at org.infinispan.util.concurrent.CompletionStages$AbstractAggregateCompletionStage.complete(CompletionStages.java:290)
at org.infinispan.util.concurrent.CompletionStages$AbstractAggregateCompletionStage.accept(CompletionStages.java:258)
at org.infinispan.util.concurrent.CompletionStages$AbstractAggregateCompletionStage.accept(CompletionStages.java:242)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.invokeQueuedHandlers(QueueAsyncInvocationStage.java:113)
at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.accept(QueueAsyncInvocationStage.java:88)
at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.accept(QueueAsyncInvocationStage.java:33)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:840)
at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
{code}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 11 months
[JBoss JIRA] (ISPN-10373) Store/Loader Non blocking SPI
by Will Burns (Jira)
[ https://issues.redhat.com/browse/ISPN-10373?page=com.atlassian.jira.plugi... ]
Will Burns updated ISPN-10373:
------------------------------
Status: Open (was: New)
> Store/Loader Non blocking SPI
> -----------------------------
>
> Key: ISPN-10373
> URL: https://issues.redhat.com/browse/ISPN-10373
> Project: Infinispan
> Issue Type: Feature Request
> Components: Loaders and Stores
> Reporter: Will Burns
> Priority: Major
>
> We need to add and use a non blocking SPI internally for our stores/loaders. We added ISPN-9722, which is a good step and refactored all of our internal code to use "non blocking" stores. However the stores themselves are all inherently sync even if the store itself could be non blocking. We would have to add a new SPI interface to allow for such non blocking operations. We would then remove all the explicit threading added in ISPN-9722 and move it to a wrapper around a currently sync loader instead. This way an invoking thread doesn't need to do a context switch or anything if invoking just a non blocking store operation.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 11 months
[JBoss JIRA] (ISPN-10373) Store/Loader Non blocking SPI
by Will Burns (Jira)
[ https://issues.redhat.com/browse/ISPN-10373?page=com.atlassian.jira.plugi... ]
Work on ISPN-10373 started by Will Burns.
-----------------------------------------
> Store/Loader Non blocking SPI
> -----------------------------
>
> Key: ISPN-10373
> URL: https://issues.redhat.com/browse/ISPN-10373
> Project: Infinispan
> Issue Type: Feature Request
> Components: Loaders and Stores
> Reporter: Will Burns
> Assignee: Will Burns
> Priority: Major
>
> We need to add and use a non blocking SPI internally for our stores/loaders. We added ISPN-9722, which is a good step and refactored all of our internal code to use "non blocking" stores. However the stores themselves are all inherently sync even if the store itself could be non blocking. We would have to add a new SPI interface to allow for such non blocking operations. We would then remove all the explicit threading added in ISPN-9722 and move it to a wrapper around a currently sync loader instead. This way an invoking thread doesn't need to do a context switch or anything if invoking just a non blocking store operation.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 11 months