January 2013 - infinispan-issues - Jboss List Archives

[JBoss JIRA] (ISPN-2632) Uneven request balancing after node crash

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-2632?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-2632: ----------------------------------------------- Michal Linhard <mlinhard(a)redhat.com> made a comment on [bug 886549|https://bugzilla.redhat.com/show_bug.cgi?id=886549] I'll run all the DIST mode elasticity/resilience tests with proper numSegments and mark this as verfied if they're clean. > Uneven request balancing after node crash > ----------------------------------------- > > Key: ISPN-2632 > URL: https://issues.jboss.org/browse/ISPN-2632 > Project: Infinispan > Issue Type: Bug > Components: Remote protocols > Affects Versions: 5.2.0.CR1 > Reporter: Michal Linhard > Assignee: Dan Berindei > Priority: Blocker > Fix For: 5.2.0.CR2, 5.2.0.Final > > > This is a new manifestation of ISPN-1995, but in this case this happens after killing only one node: the hot rod requests aren't very well balanced. > these runs still manifest also ISPN-2550 and it may be cause of this bug. > The uneven balancing of requests can be seen here: > https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/EDG-REPOR... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2581) StateTransferManagerImpl.waitForInitialStateTransferToComplete() returns too soon

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-2581?page=com.atlassian.jira.plugin.... ] Dan Berindei reassigned ISPN-2581: ---------------------------------- Assignee: Dan Berindei (was: Adrian Nistor) > StateTransferManagerImpl.waitForInitialStateTransferToComplete() returns too soon > --------------------------------------------------------------------------------- > > Key: ISPN-2581 > URL: https://issues.jboss.org/browse/ISPN-2581 > Project: Infinispan > Issue Type: Bug > Components: State transfer > Affects Versions: 5.2.0.Beta5 > Reporter: Dan Berindei > Assignee: Dan Berindei > Fix For: 5.2.0.Final > > > StateTransferManagerImpl.waitForInitialStateTransferToComplete() returns as soon as a joining node confirmed to the coordinator that it received all the data it needed (see STMI.notifyEndOfTopologyUpdate()). > It should return only after the coordinator has confirmed the end of the rebalance with a new topology update (see STMI.doTopologyUpdate()). > This should make it more likely for the tests suite clusters to be in a stable state by the time the test starts, and should help with the random state transfer-related failures in non-state transfer tests. > Instead we should make sure that we do have tests that check forwarding behaviour explicitly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2750) Uneven request balancing via hotrod

by Michal Linhard (JIRA)

[ https://issues.jboss.org/browse/ISPN-2750?page=com.atlassian.jira.plugin.... ] Michal Linhard commented on ISPN-2750: -------------------------------------- http://www.qa.jboss.com/~mlinhard/hyperion3/run0041-resi-32-31-32-ER9/rep... with numSegments=320 the results are much nicer. Sorry for false alarm. > Uneven request balancing via hotrod > ----------------------------------- > > Key: ISPN-2750 > URL: https://issues.jboss.org/browse/ISPN-2750 > Project: Infinispan > Issue Type: Bug > Components: Server > Affects Versions: 5.2.0.CR2 > Reporter: Michal Linhard > Assignee: Dan Berindei > Fix For: 5.2.0.Final > > > The load sent to servers in the cluster isn't balanced > tried in 32 node resilience tests: > http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0035-resi-3... > http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0036-resi-3... > this differs from ISPN-2632 in that the load is unbalanced from the beginning of the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.... ] Dan Berindei commented on ISPN-2697: ------------------------------------ @Radim, I don't know the STABLE code very well, but I think STABILITY messages are sent in response to STABLE_GOSSIP messages, with a fixed (well, random, but with a fixed upper limit) delay. So if the STABLE_GOSSIP rate stays constant, the STABILITY rate will stay constant as well. I did overlook the STABLE.stability_delay setting, so we should probably require/advise that sync.replTimeout > 2 * STABLE.desired_avg_gossip + STABLE.stability_delay. > HotRodServer startup fails when its record cannot be inserted into topology cache > --------------------------------------------------------------------------------- > > Key: ISPN-2697 > URL: https://issues.jboss.org/browse/ISPN-2697 > Project: Infinispan > Issue Type: Bug > Components: Remote protocols > Affects Versions: 5.2.0.Beta6 > Reporter: Radim Vansa > Assignee: Galder Zamarreño > Priority: Critical > Fix For: 5.2.0.Final > > > When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}). > However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2738) Joining node ignored by hotrod clients in REPL clustering mode

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-2738?page=com.atlassian.jira.plugin.... ] Dan Berindei updated ISPN-2738: ------------------------------- Status: Pull Request Sent (was: Coding In Progress) Git Pull Request: https://github.com/infinispan/infinispan/pull/1604 Skip the topology update if the cache members aren't all in the address cache. Do the check in AbstractEncoder1x.generateTopologyResponse, so that it works for all topology types (i.e. also for replicated caches). I added a new replicated-mode test, but it still doesn't cover this case. > Joining node ignored by hotrod clients in REPL clustering mode > -------------------------------------------------------------- > > Key: ISPN-2738 > URL: https://issues.jboss.org/browse/ISPN-2738 > Project: Infinispan > Issue Type: Bug > Affects Versions: 5.2.0.CR2 > Reporter: Michal Linhard > Assignee: Dan Berindei > Priority: Blocker > Fix For: 5.2.0.Final > > > resilience 4-3-4 REPL mode for JDG 6.1.0.ER9 (infinispan 5.2.0.CR2): > https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/EDG-REPOR... > after rejoin of killed node the load is not redistributed to all three nodes again -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2714) org.infinispan.distexec.mapreduce.TopologyAwareTwoNodesMapReduceTest.testInvokeMapperCancellation test fails randomly

by Galder Zamarreño (JIRA)

[ https://issues.jboss.org/browse/ISPN-2714?page=com.atlassian.jira.plugin.... ] Galder Zamarreño updated ISPN-2714: ----------------------------------- Status: Resolved (was: Pull Request Sent) Fix Version/s: 5.2.0.Final Resolution: Done > org.infinispan.distexec.mapreduce.TopologyAwareTwoNodesMapReduceTest.testInvokeMapperCancellation test fails randomly > --------------------------------------------------------------------------------------------------------------------- > > Key: ISPN-2714 > URL: https://issues.jboss.org/browse/ISPN-2714 > Project: Infinispan > Issue Type: Bug > Components: Distributed Execution and Map/Reduce > Affects Versions: 5.2.0.CR1 > Reporter: Anna Manukyan > Assignee: Anna Manukyan > Labels: testsuite_stability > Fix For: 5.2.0.Final > > > The test org.infinispan.distexec.mapreduce.TopologyAwareTwoNodesMapReduceTest.testInvokeMapperCancellation fails randomly on all environments. > The error log is: > {code} > Error Message > Expected exception java.util.concurrent.CancellationException but got java.lang.AssertionError: Mapper not cancelled, root cause org.jgroups.TimeoutException: timeout sending message to TopologyAwareTwoNodesMapReduceTest-NodeB-22523(test2) > Stacktrace > org.testng.TestException: > Expected exception java.util.concurrent.CancellationException but got java.lang.AssertionError: Mapper not cancelled, root cause org.jgroups.TimeoutException: timeout sending message to TopologyAwareTwoNodesMapReduceTest-NodeB-22523(test2) > at org.testng.internal.Invoker.handleInvocationResults(Invoker.java:1503) > at org.testng.internal.Invoker.invokeMethod(Invoker.java:764) > at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:907) > at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1237) > at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127) > at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111) > at org.testng.TestRunner.privateRun(TestRunner.java:767) > at org.testng.TestRunner.run(TestRunner.java:617) > at org.testng.SuiteRunner.runTest(SuiteRunner.java:334) > at org.testng.SuiteRunner.access$000(SuiteRunner.java:37) > at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:368) > at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Caused by: java.lang.AssertionError: Mapper not cancelled, root cause org.jgroups.TimeoutException: timeout sending message to TopologyAwareTwoNodesMapReduceTest-NodeB-22523(test2) > at org.infinispan.distexec.mapreduce.SimpleTwoNodesMapReduceTest.testInvokeMapperCancellation(SimpleTwoNodesMapReduceTest.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80) > at org.testng.internal.Invoker.invokeMethod(Invoker.java:715) > ... 15 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2745) RHQ plugin is missing several operations (e.g. RecoveryAdmin) and parameter names

by Galder Zamarreño (JIRA)

[ https://issues.jboss.org/browse/ISPN-2745?page=com.atlassian.jira.plugin.... ] Galder Zamarreño updated ISPN-2745: ----------------------------------- Fix Version/s: 5.2.0.Final > RHQ plugin is missing several operations (e.g. RecoveryAdmin) and parameter names > --------------------------------------------------------------------------------- > > Key: ISPN-2745 > URL: https://issues.jboss.org/browse/ISPN-2745 > Project: Infinispan > Issue Type: Bug > Components: JMX, reporting and management > Affects Versions: 5.2.0.CR2 > Reporter: Tristan Tarrant > Assignee: Tristan Tarrant > Fix For: 5.2.0.Final > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2750) Uneven request balancing via hotrod

by Michal Linhard (JIRA)

[ https://issues.jboss.org/browse/ISPN-2750?page=com.atlassian.jira.plugin.... ] Michal Linhard commented on ISPN-2750: -------------------------------------- Damn, right. I'll try to run the 32-31-32 test with numSegments=320 > Uneven request balancing via hotrod > ----------------------------------- > > Key: ISPN-2750 > URL: https://issues.jboss.org/browse/ISPN-2750 > Project: Infinispan > Issue Type: Bug > Components: Server > Affects Versions: 5.2.0.CR2 > Reporter: Michal Linhard > Assignee: Dan Berindei > Fix For: 5.2.0.Final > > > The load sent to servers in the cluster isn't balanced > tried in 32 node resilience tests: > http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0035-resi-3... > http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0036-resi-3... > this differs from ISPN-2632 in that the load is unbalanced from the beginning of the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2344) StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusy

by Galder Zamarreño (JIRA)

[ https://issues.jboss.org/browse/ISPN-2344?page=com.atlassian.jira.plugin.... ] Galder Zamarreño resolved ISPN-2344. ------------------------------------ Fix Version/s: (was: 5.2.0.Final) Resolution: Cannot Reproduce Bug This is an old failure from September time when state transfer code was changing in order to accomodate non-blocking state transfer. > StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusy > --------------------------------------------------------------------------- > > Key: ISPN-2344 > URL: https://issues.jboss.org/browse/ISPN-2344 > Project: Infinispan > Issue Type: Bug > Components: State transfer > Reporter: Galder Zamarreño > Assignee: Galder Zamarreño > Attachments: testStateTransferWithNodeRestartedAndBusy-0.tgz > > > {code}java.lang.AssertionError > at org.infinispan.statetransfer.StateTransferReplicationQueueTest.thirdWritingCacheTest(StateTransferReplicationQueueTest.java:146) > at org.infinispan.statetransfer.StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusy(StateTransferReplicationQueueTest.java:108){code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2750) Uneven request balancing via hotrod

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-2750?page=com.atlassian.jira.plugin.... ] Dan Berindei resolved ISPN-2750. -------------------------------- Resolution: Won't Fix Looks like a configuration problem again: numSegments is only 40, and there are 32 nodes, which means the segments are not evenly divided between the cache members. Here's an ASCII "graph" to that shows how many segments are owned by each node in a sample consistent hash ('=' means it's a primary owner, '+' means it's a backup owner): {noformat} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + = + + = = = = + = + + + + = + + + + + + + + = + + + + + + + + + = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = {noformat} You can see from the graph that before the ISPN-2643 fix, when the HotRod client would contact a random owner, the load was balanced more evenly. After that fix however, the HotRod client only contacts the primary owner, and there is a clear difference in load between the nodes who primary-own 2 segments and the nodes who primary-own only 1 segment. > Uneven request balancing via hotrod > ----------------------------------- > > Key: ISPN-2750 > URL: https://issues.jboss.org/browse/ISPN-2750 > Project: Infinispan > Issue Type: Bug > Components: Server > Affects Versions: 5.2.0.CR2 > Reporter: Michal Linhard > Assignee: Dan Berindei > Fix For: 5.2.0.Final > > > The load sent to servers in the cluster isn't balanced > tried in 32 node resilience tests: > http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0035-resi-3... > http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0036-resi-3... > this differs from ISPN-2632 in that the load is unbalanced from the beginning of the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 4 months

1
0
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-issues January 2013