[infinispan-issues] [JBoss JIRA] (ISPN-1995) Uneven request balancing after node restore

Fri May 4 12:38:18 EDT 2012

    [ https://issues.jboss.org/browse/ISPN-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690495#comment-12690495 ] 

RH Bugzilla Integration commented on ISPN-1995:
-----------------------------------------------

Michal Linhard <mlinhard at redhat.com> made a comment on [bug 809631|https://bugzilla.redhat.com/show_bug.cgi?id=809631]

OK. I've reran in hyperion, reproduced again:
http://www.qa.jboss.com/~mlinhard/hyperion/run85-resi-dist-16/report/stats-throughput.png

this time with some log analysis:
http://www.qa.jboss.com/~mlinhard/hyperion/run85-resi-dist-16/report/loganalysis/client/

Config: 16 nodes, DIST mode, numOwners 3, crashing 2 nodes

The steps of the resilience test is as follows:

1. Start complete cluster node0001 - node0016, wait till it forms (View 6 around 02:44:32,594)
2. Wait 5 min
3. Kill node0002, node0003, wait till survivor cluster forms (View 7 around 02:50:54,200)
4. Wait 5 min
5. Restore node0002, node0003, wait till complete cluster forms again (View 9 around 02:56:10,164)

These views are created:
http://www.qa.jboss.com/~mlinhard/hyperion/run85-resi-dist-16/report/loganalysis/views.html

There are two anomalies during the test:

2 clients (333 and 459) when the nodes are killed they remove them from topology, add them and remove again within few seconds. (that's why we're seeing 502 node adds and removes in the client logs even though there are only 500 clients)

After the two nodes were restored the nodes start to obtain the topology information
first about the node0003 (starting around 02:55:57,302) and then about node0002 (starting around 02:56:03,581)

However in 165 cases the clients don't obtain the information about node0002 being added which is 33% of the nodes, 
which corresponds to the load (throughput) of the node0002 being cca 33% lower than of other nodes (in the throughput chart)

> Uneven request balancing after node restore
> -------------------------------------------
>
>                 Key: ISPN-1995
>                 URL: https://issues.jboss.org/browse/ISPN-1995
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Cache Server
>    Affects Versions: 5.1.4.CR1
>            Reporter: Tristan Tarrant
>            Assignee: Galder Zamarreño
>             Fix For: 5.1.x, 5.2.0.ALPHA1, 5.2.0.FINAL
>
>
> After a node crashes and rejoins the cluster, it does not receive client load at the same level as the other nodes.
> This issue does not affect data integrity and distribution in the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira