[JBoss JIRA] (ISPN-2581) StateTransferManagerImpl.waitForInitialStateTransferToComplete() returns too soon
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-2581?page=com.atlassian.jira.plugin.... ]
Dan Berindei reassigned ISPN-2581:
----------------------------------
Assignee: Dan Berindei (was: Adrian Nistor)
> StateTransferManagerImpl.waitForInitialStateTransferToComplete() returns too soon
> ---------------------------------------------------------------------------------
>
> Key: ISPN-2581
> URL: https://issues.jboss.org/browse/ISPN-2581
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.2.0.Beta5
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 5.2.0.Final
>
>
> StateTransferManagerImpl.waitForInitialStateTransferToComplete() returns as soon as a joining node confirmed to the coordinator that it received all the data it needed (see STMI.notifyEndOfTopologyUpdate()).
> It should return only after the coordinator has confirmed the end of the rebalance with a new topology update (see STMI.doTopologyUpdate()).
> This should make it more likely for the tests suite clusters to be in a stable state by the time the test starts, and should help with the random state transfer-related failures in non-state transfer tests.
> Instead we should make sure that we do have tests that check forwarding behaviour explicitly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 11 months
[JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-2697:
------------------------------------
@Radim, I don't know the STABLE code very well, but I think STABILITY messages are sent in response to STABLE_GOSSIP messages, with a fixed (well, random, but with a fixed upper limit) delay. So if the STABLE_GOSSIP rate stays constant, the STABILITY rate will stay constant as well.
I did overlook the STABLE.stability_delay setting, so we should probably require/advise that sync.replTimeout > 2 * STABLE.desired_avg_gossip + STABLE.stability_delay.
> HotRodServer startup fails when its record cannot be inserted into topology cache
> ---------------------------------------------------------------------------------
>
> Key: ISPN-2697
> URL: https://issues.jboss.org/browse/ISPN-2697
> Project: Infinispan
> Issue Type: Bug
> Components: Remote protocols
> Affects Versions: 5.2.0.Beta6
> Reporter: Radim Vansa
> Assignee: Galder Zamarreño
> Priority: Critical
> Fix For: 5.2.0.Final
>
>
> When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}).
> However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 11 months
[JBoss JIRA] (ISPN-2714) org.infinispan.distexec.mapreduce.TopologyAwareTwoNodesMapReduceTest.testInvokeMapperCancellation test fails randomly
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-2714?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño updated ISPN-2714:
-----------------------------------
Status: Resolved (was: Pull Request Sent)
Fix Version/s: 5.2.0.Final
Resolution: Done
> org.infinispan.distexec.mapreduce.TopologyAwareTwoNodesMapReduceTest.testInvokeMapperCancellation test fails randomly
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-2714
> URL: https://issues.jboss.org/browse/ISPN-2714
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 5.2.0.CR1
> Reporter: Anna Manukyan
> Assignee: Anna Manukyan
> Labels: testsuite_stability
> Fix For: 5.2.0.Final
>
>
> The test org.infinispan.distexec.mapreduce.TopologyAwareTwoNodesMapReduceTest.testInvokeMapperCancellation fails randomly on all environments.
> The error log is:
> {code}
> Error Message
> Expected exception java.util.concurrent.CancellationException but got java.lang.AssertionError: Mapper not cancelled, root cause org.jgroups.TimeoutException: timeout sending message to TopologyAwareTwoNodesMapReduceTest-NodeB-22523(test2)
> Stacktrace
> org.testng.TestException:
> Expected exception java.util.concurrent.CancellationException but got java.lang.AssertionError: Mapper not cancelled, root cause org.jgroups.TimeoutException: timeout sending message to TopologyAwareTwoNodesMapReduceTest-NodeB-22523(test2)
> at org.testng.internal.Invoker.handleInvocationResults(Invoker.java:1503)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:764)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:907)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1237)
> at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
> at org.testng.TestRunner.privateRun(TestRunner.java:767)
> at org.testng.TestRunner.run(TestRunner.java:617)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:37)
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:368)
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.AssertionError: Mapper not cancelled, root cause org.jgroups.TimeoutException: timeout sending message to TopologyAwareTwoNodesMapReduceTest-NodeB-22523(test2)
> at org.infinispan.distexec.mapreduce.SimpleTwoNodesMapReduceTest.testInvokeMapperCancellation(SimpleTwoNodesMapReduceTest.java:106)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:715)
> ... 15 more
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 11 months
[JBoss JIRA] (ISPN-2344) StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusy
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-2344?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño resolved ISPN-2344.
------------------------------------
Fix Version/s: (was: 5.2.0.Final)
Resolution: Cannot Reproduce Bug
This is an old failure from September time when state transfer code was changing in order to accomodate non-blocking state transfer.
> StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusy
> ---------------------------------------------------------------------------
>
> Key: ISPN-2344
> URL: https://issues.jboss.org/browse/ISPN-2344
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Reporter: Galder Zamarreño
> Assignee: Galder Zamarreño
> Attachments: testStateTransferWithNodeRestartedAndBusy-0.tgz
>
>
> {code}java.lang.AssertionError
> at org.infinispan.statetransfer.StateTransferReplicationQueueTest.thirdWritingCacheTest(StateTransferReplicationQueueTest.java:146)
> at org.infinispan.statetransfer.StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusy(StateTransferReplicationQueueTest.java:108){code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 11 months
[JBoss JIRA] (ISPN-2750) Uneven request balancing via hotrod
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-2750?page=com.atlassian.jira.plugin.... ]
Dan Berindei resolved ISPN-2750.
--------------------------------
Resolution: Won't Fix
Looks like a configuration problem again: numSegments is only 40, and there are 32 nodes, which means the segments are not evenly divided between the cache members.
Here's an ASCII "graph" to that shows how many segments are owned by each node in a sample consistent hash ('=' means it's a primary owner, '+' means it's a backup owner):
{noformat}
+ + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
= + + = = = = + = + + + + = + + + + + + + + = + + + + + + + + +
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
{noformat}
You can see from the graph that before the ISPN-2643 fix, when the HotRod client would contact a random owner, the load was balanced more evenly. After that fix however, the HotRod client only contacts the primary owner, and there is a clear difference in load between the nodes who primary-own 2 segments and the nodes who primary-own only 1 segment.
> Uneven request balancing via hotrod
> -----------------------------------
>
> Key: ISPN-2750
> URL: https://issues.jboss.org/browse/ISPN-2750
> Project: Infinispan
> Issue Type: Bug
> Components: Server
> Affects Versions: 5.2.0.CR2
> Reporter: Michal Linhard
> Assignee: Dan Berindei
> Fix For: 5.2.0.Final
>
>
> The load sent to servers in the cluster isn't balanced
> tried in 32 node resilience tests:
> http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0035-resi-3...
> http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0036-resi-3...
> this differs from ISPN-2632 in that the load is unbalanced from the beginning of the test.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 11 months