[jboss-jira] [JBoss JIRA] (JGRP-2474) Messages about dropped queued message when using IpAddressUUID

Thu May 7 12:15:01 EDT 2020

    [ https://issues.redhat.com/browse/JGRP-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082718#comment-14082718 ] 

Mirko Streckenbach commented on JGRP-2474:
------------------------------------------

I just mentioned that in the last paragraph (no separate issue). I was unsure if this is really an issue, it is just something I observed.

This is as following (slight modified version of the attached program with UUID addresses).

1. First channel is created and becomes coordinator
2. Second channel is created and used connect()
3. viewAccepted is called for both with a view with both channnels
4. After a few desconds, disconnect() is called on the second channel calls disconnect()
5. viewAccepted for the first channel is called with a new view with only the first channels
6. disconnect() returns

After that, the coordinator still tries to send messages to the second channel. I have several messages in the log of the first channel:

FINER: i1:1609 --> i2:1709: resending(#2)
Mai 07, 2020 6:02:21 PM org.jgroups.protocols.TP down
FINER: i1:1609: sending msg to i2:1709, src=i1:1609, headers are GMS: GmsHeader[LEAVE_RSP], UNICAST3: DATA, seqno=2, TP: [cluster=cluster]
Mai 07, 2020 6:02:21 PM org.jgroups.protocols.UNICAST3 retransmit

This goes on for some time.

With 4.1.2 this did not happen. After the disconect() of the second channel, there were no further messages send. Starting with 4.1.3 these retransmits occur.

I've attached the modified program (JGR2.java) and the output (log2.txt)

> Messages about dropped queued message when using IpAddressUUID
> --------------------------------------------------------------
>
>                 Key: JGRP-2474
>                 URL: https://issues.redhat.com/browse/JGRP-2474
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.2.3
>            Reporter: Mirko Streckenbach
>            Assignee: Bela Ban
>            Priority: Major
>         Attachments: JGR.java, JGR2.java, log-fail-1.txt, log2.txt
>
>
> We upgraded from 4.0.14 to 4.1.8 and ever since then, we had some messages like 
> {code}
> Apr 27, 2020 10:30:54 AM org.jgroups.protocols.UNICAST3 addQueuedMessages
> WARNING: i2:1709: dropped queued message i1:1609#2 as its conn_id (0) did not match (entry.conn_id=1)
> {code}
> when ever an application is restarted. Our setup is as follows (most due to network restrictions):
> * Fixed port numbers
> * JDBC_PING
> * We use IpAddressUUID in order to have a "readable" information in the jgroupsping table
> I could track this down to 4.1.2 / 4.1.3: 4.1.2 works as expected, from 4.1.3 I'm seeing the effect observed above.
> I attached a simple example that demonstrates the problem: starts two stacks, shuts down the second (non-coordindator) and starts it again after a couple of seconds. With 4.1.2 this works as expected (no warnings), but 4.1.3 and more recent versions (including 4.2.3) produce warnings. The exact behavior is not completely consistent: in most cases, starting the second app again results in some timeouts and the second app becomes a coordinator itself and a merge view is established later (log attached). In some cases it only creates the warnings shown above (this is what we observe in our real application) and in some cases everything works fine.
> I don't have any warnings in the log if I don't set an AddressGenerator, but I'd like to avoid this.
> While running this on higher debug levels, I observed the following: 4.1.2 will not require an 
> ACK for the LEAVE_RSP message. 4.1.3 will. The second app sends the ACK, but the coordinator does not seem to receive or process it properly and retransmits the LEAVE_RSP message again and again. This is independent of the AddressGenerator used,

--
This message was sent by Atlassian Jira
(v7.13.8#713008)