[jboss-jira] [JBoss JIRA] (JGRP-2162) Failed to send broadcast when opening the connection

Radim Vansa (JIRA) issues at jboss.org
Wed May 10 08:58:01 EDT 2017


    [ https://issues.jboss.org/browse/JGRP-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404326#comment-13404326 ] 

Radim Vansa commented on JGRP-2162:
-----------------------------------

[~belaban]:
1. may not be the option (and no documentation does not suggest that you need all nodes in the initial hosts list for proper functionality)
2. works (at least the attached reproducer - haven't tried the flaky tests), though it requires a modification of configuration. 
3. ? If TCPPING doesn't work, we shouldn't recommend its use. I would prefer to keep this guy = fix it.

Could you set {{send_cache_on_join}} to true by default for those static protocols?

> Failed to send broadcast when opening the connection
> ----------------------------------------------------
>
>                 Key: JGRP-2162
>                 URL: https://issues.jboss.org/browse/JGRP-2162
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Radim Vansa
>            Assignee: Bela Ban
>             Fix For: 4.0.4
>
>         Attachments: TcpNio2McastTest.java, infinispan_2.log.gz
>
>
> IRC discussion:
> {quote}
>  bela_: Hi Bela, I have a weird failure in one test that seem to be rooted in JGroups. TCP_NIO2 is in charge, and there's a broadcast message to all nodes, but it seems it's not received on the other side.
> <bela_> rvansa: reproducible?
> <rvansa> bela_: it happens when the connection to a node is just being opened: I have added some trace logs and just a moment before writing to the NioConnection.send_buf it was in state "connection pending"
> <rvansa> bela_: sort of, after tens of runs of that test (on my machine) - and I've seen it first time in CI, so it could be
> <bela_> rvansa: NioConnection buffers writes up to a certain extent, then  discards anything over the buffer limit
> <bela_> rvansa: max_send_buffers (default: 10). But retransmission should fix this, unless you don’t wait long enough
> <rvansa> bela_: I don't think it should go over the limit
> <rvansa> bela_: the test is not doing anything else, just sending CommitCommand (that should be couple hundred bytes at most) and then waiting
> <rvansa> bela_: according to the traces I've added, Buffers.write returned false when writing the local address, and then true when writing the actual message
> {quote}
> I have been trying to write a reproducer, and found that it's related to the fact that the failing test uses custom (fake) discovery protocol, that doesn't open the connection during startup. In my ~reproducer I had to modify tcp-nio.xml to use TCPPING with only the first node in hosts list (localhost[7800]):
> {code:xml}
> <TCPPING async_discovery="true" initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800]}" port_range="0"/>
> {code}
> This causes that the physical connection is not opened by discovery. However, the reproducer suffers from (always reproducible) flaw - it does not send the message to third node at all (and the test fails, therefore).
> Note that increasing the timeout in request options does not help.



--
This message was sent by Atlassian JIRA
(v7.2.3#72005)



More information about the jboss-jira mailing list