[jboss-jira] [JBoss JIRA] (JGRP-1570) STABLE: desired_avg_gossip leads to long intervals between reception of STABILITY messages in large clusters

Tue Mar 5 07:06:56 EST 2013

    [ https://issues.jboss.org/browse/JGRP-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758449#comment-12758449 ] 

Bela Ban commented on JGRP-1570:
--------------------------------

Here are some numbers for 4 nodes and MPerf having every node send 1 million 1K messages.
Old = existing, new = JGRP-1570 implemented (no scaling)

||-||STABLE sent||STABLE received||STABILITY sent||STABILITY received||
|A (old)| 96|236|11|64|
|A (new)|436|484|27|130|
|B (old)|97|236|15|64|
|B (new)|434|484|32|130|
|C (old)|95|236|17|64|
|C (new)|437|484|34|130|
|D (old)|96|235|21|64|
|D (new)|431|484|37|130|

The numbers show that every node send way more STABLE messages without scaling, and receives about twice as many. Also, twice as many STABILITY message are sent and received.
However, this is not necessarily a bad thing, as more STABILITY messages mean a quicker purging of messages in the sender's (and possibly receiver's) caches, leading to better memory use.
Performance was about the same between old and new.

> STABLE: desired_avg_gossip leads to long intervals between reception of STABILITY messages in large clusters
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: JGRP-1570
>                 URL: https://issues.jboss.org/browse/JGRP-1570
>             Project: JGroups
>          Issue Type: Feature Request
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.3
>
>
> The time computed for the sending of STABLE is desired_avg_gossip * cluster-size *2. While this is OK for small clusters, it may be too big for large clusters.
> On the other hand, if every member simply multicasts a STABLE message every (say) 30 seconds on average, then the number of messages sent grows with increasing cluster size.
> Investigate a way to set a lower and upper limit for the making and delivery of *STABILITY* messages, e.g. the goal is to receive 1 stability message every 60s.
> Besides increased traffic, however, this requires everyone to have a TCP connection to everybody else in the cluster in case of a TCP transport.
> A better solution might be to have only a dedicated member (the coord) periodically multicast a STABLE message. Everyone replies with a (unicast) STABLE message and when the coord has received STABLE replies from everyone, it multicasts a STABILITY message. This would only require a multicast from the coord to everyone, establishing TCP connections from the coord to everyone (usually already exists because of the VIEW-CHANGE multicast), but everyone would reuse the same TCP connection to send the reply.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira