[infinispan-issues] [JBoss JIRA] (ISPN-2632) Uneven request balancing after node crash

Mon Jan 28 11:01:47 EST 2013

    [ https://issues.jboss.org/browse/ISPN-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750812#comment-12750812 ] 

RH Bugzilla Integration commented on ISPN-2632:
-----------------------------------------------

Dan Berindei <dberinde at redhat.com> made a comment on [bug 886549|https://bugzilla.redhat.com/show_bug.cgi?id=886549]

Michal, the throughput can vary a log during the test even if the distribution of the entries stays the same. E.g. between minutes 9 and 13 in http://www.qa.jboss.com/~mlinhard/hyperion3/run0043-resi-8-6-8-ER9/report/stats-throughput.png, the entry distribution doesn't change, but the throughput seems to vary by ~ 5%.

So I think it would be more fair to compare the average throughput of each node for a longer period. By this measure, I would expect differences to be ~ 10% (as there are ~10 segments per node, and while there are 6/7 nodes alive some of the nodes will have 1 extra "primary" segment).

You could also try to extract the current ConsistentHash from a client/server and log the primary owner of each segment (or rather the number of segments/hash space points primary-owned by each node). This is still a proxy measurement, but the hashing function is pretty uniform so it should be more accurate than the throughput number. On the other hand, it would be more complicated, and I'm not sure it would bring any new information compared to the throughput graph.

> Uneven request balancing after node crash
> -----------------------------------------
>
>                 Key: ISPN-2632
>                 URL: https://issues.jboss.org/browse/ISPN-2632
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Remote protocols
>    Affects Versions: 5.2.0.CR1
>            Reporter: Michal Linhard
>            Assignee: Dan Berindei
>            Priority: Blocker
>             Fix For: 5.2.0.CR2, 5.2.0.Final
>
>
> This is a new manifestation of ISPN-1995, but in this case this happens after killing only one node: the hot rod requests aren't very well balanced.
> these runs still manifest also ISPN-2550 and it may be cause of this bug.
> The uneven balancing of requests can be seen here:
> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-resilience-dist-4-3/59/artifact/report/stats-throughput.png

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira