[jboss-jira] [JBoss JIRA] (JGRP-2167) Highest seqno is not resent nor recorded on receivers

Radim Vansa (JIRA) issues at jboss.org
Wed May 10 09:52:00 EDT 2017


    [ https://issues.jboss.org/browse/JGRP-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404397#comment-13404397 ] 

Radim Vansa commented on JGRP-2167:
-----------------------------------

I am not asking to 'optimize' something, I just need reliable multicast framework. Reliable means that the messages/events will reach nodes in reasonable time.

The eventual delivery won't happen, because
1) The application is in a 'blocked' state (it waits until all nodes perform some action based on the new view). You cannot assume that the application will send mcasts all the time.
2) IIUC STABLE is designed to not increase number of messages in a cluster by downscaling the period on each node as the cluster scales up. In this case, STABLE was set to 5 seconds, but the situation was not resolved within 30 seconds. I could lower the period even further, but that wouldn't solve the issue in a bigger cluster due to the downscaling.

I don't recall the issue in detail now, but I perceive as a flaw that a node already had the information about a missing message but it does not try to get it hard enough.

> Highest seqno is not resent nor recorded on receivers
> -----------------------------------------------------
>
>                 Key: JGRP-2167
>                 URL: https://issues.jboss.org/browse/JGRP-2167
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.0.1
>            Reporter: Radim Vansa
>            Assignee: Bela Ban
>            Priority: Minor
>             Fix For: 4.0.4
>
>
> I am investigating an issue in a stress test which leads me to a situation where in a TCP-based configuration a {{GMS[VIEW]}} is broadcast to all nodes, but it is not received by some of them. Soon after that there's a {{NAKACK2.HIGHEST_SEQNO}} that causes the node that is missing the last seqno to resend it, but the retransmit is not received either. There are no further retries, and generally no NAKACK2 activity until about 30 seconds later (when another node leaves after some timeout in the test).
> The receiver does not keep asking for retransmissions until it gets them, but it seems that {{NAKACK2.handleHighestSeqno}} doesn't update {{Table.hr}} (not sure if having highest received set to non-received msg would be legal, though).
> The sender uses default value {{NAKACK2.resend_last_seqno_max_times=1}}, and as there are no further mcast messages, the highest sent seqno does not change on sender. 



--
This message was sent by Atlassian JIRA
(v7.2.3#72005)


More information about the jboss-jira mailing list