[
https://issues.jboss.org/browse/JGRP-2167?page=com.atlassian.jira.plugin....
]
Bela Ban commented on JGRP-2167:
--------------------------------
The problem with setting {{resend_last_seqno_max_time}} to a high value optimizes for a
case that almost never happens, and causes unnecessary traffic and thread activity (in
most cases).
The last view lost will eventually be delivered, when either (1) the view sender sends
another multicast or (2) STABLE kicks in. However, (1) might never happen and (2) takes
time, based on STABLE's configuration.
There are ways to improve this, but I'm not sure I like any of them:
1. Have the last message sender task get acks for its highest seqno from all cluster
members
2. Let the receiver continue asking the sender for retransmission until it gets that last
seqno, or until higher seqnos from the sender are seen
#1 causes additional traffic that's a function of the cluster size and the frequency
of sending. E.g. if a sender sends a multicast every 2 seconds, this most likely
(depending on the xmit_interval config) causes another multicast to be sent (last-seqno),
plus N unicast acks to be received.
This also duplicates part of the functionality of STABLE.
#2 If the last-seqno message is lost, this won't help. Also, it leads to (unicast)
unnecessary traffic as well.
I think the best solution in such an edge case is to reduce the timeouts in STABLE itself
and let it run its course.
Highest seqno is not resent nor recorded on receivers
-----------------------------------------------------
Key: JGRP-2167
URL:
https://issues.jboss.org/browse/JGRP-2167
Project: JGroups
Issue Type: Bug
Affects Versions: 4.0.1
Reporter: Radim Vansa
Assignee: Bela Ban
Priority: Minor
Fix For: 4.0.3
I am investigating an issue in a stress test which leads me to a situation where in a
TCP-based configuration a {{GMS[VIEW]}} is broadcast to all nodes, but it is not received
by some of them. Soon after that there's a {{NAKACK2.HIGHEST_SEQNO}} that causes the
node that is missing the last seqno to resend it, but the retransmit is not received
either. There are no further retries, and generally no NAKACK2 activity until about 30
seconds later (when another node leaves after some timeout in the test).
The receiver does not keep asking for retransmissions until it gets them, but it seems
that {{NAKACK2.handleHighestSeqno}} doesn't update {{Table.hr}} (not sure if having
highest received set to non-received msg would be legal, though).
The sender uses default value {{NAKACK2.resend_last_seqno_max_times=1}}, and as there are
no further mcast messages, the highest sent seqno does not change on sender.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)