[infinispan-issues] [JBoss JIRA] (ISPN-2750) Uneven request balancing via hotrod

Thu Jan 24 04:53:47 EST 2013

    [ https://issues.jboss.org/browse/ISPN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750145#comment-12750145 ] 

Michal Linhard commented on ISPN-2750:
--------------------------------------

I've traced resilience test runs 8-7-8 and 16-15-16 with 10 clients, but the chart doesn't give me the same look as in 32-31-32 test,
at least I can't be sure for such small absolute values (under 20 ops/sec)
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0039-resi-08-ER9-trace/report/stats-throughput.png
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0040-resi-16-ER9-trace/report/stats-throughput.png
the original chart that showed the problem:
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0036-resi-32-31-32-ER9/report/stats-throughput.png
had the higher throughput values (around 250 ops/sec per node)

In all cases I can see topology info updated in each client:
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0039-resi-08-ER9-trace/report/loganalysis/client-topology-info/
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0040-resi-16-ER9-trace/report/loganalysis/client-topology-info/
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0036-resi-32-31-32-ER9/report/loganalysis/client-topology-info/
the category "INFO New topology received Full Before" and "INFO New topology received Full After" has an entry for each client thread.

In the 32-31-32 run where the problem manifests, all threads received the same topology id=62 before crash and id=67 after rejoin.

Hmm, just checked the entry distribution in the failing test:
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0036-resi-32-31-32-ER9/report/cache_entries.png

vs the traced runs:
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0039-resi-08-ER9-trace/report/cache_entries.png
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0040-resi-16-ER9-trace/report/cache_entries.png

so it seems like the hotrod servers really are following the cache topology distribution and its the cache topology itself that's weird.

the trace logs for 8-7-8 and 16-15-16 runs can be found here:
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0039-resi-08-ER9-trace/report/clientlogs.zip
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0039-resi-08-ER9-trace/report/serverlogs.zip
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0040-resi-16-ER9-trace/report/clientlogs.zip
http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0040-resi-16-ER9-trace/report/serverlogs.zip

> Uneven request balancing via hotrod
> -----------------------------------
>
>                 Key: ISPN-2750
>                 URL: https://issues.jboss.org/browse/ISPN-2750
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Server
>    Affects Versions: 5.2.0.CR2
>            Reporter: Michal Linhard
>            Assignee: Dan Berindei
>             Fix For: 5.2.0.Final
>
>
> The load sent to servers in the cluster isn't balanced
> tried in 32 node resilience tests:
> http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0035-resi-32-28-32-ER9/report/stats-throughput.png
> http://dev39.mw.lab.eng.bos.redhat.com/~mlinhard/hyperion3/run0036-resi-32-31-32-ER9/report/stats-throughput.png
> this differs from ISPN-2632 in that the load is unbalanced from the beginning of the test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira