[jboss-jira] [JBoss JIRA] (JGRP-2162) Failed to send broadcast when opening the connection

Thu May 11 04:08:00 EDT 2017

    [ https://issues.jboss.org/browse/JGRP-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404858#comment-13404858 ] 

Bela Ban edited comment on JGRP-2162 at 5/11/17 4:07 AM:
---------------------------------------------------------

3. I never recommend TCPPING, except when other (more dynamic) discovery protocols cannot be used. If TCPPING is to be used, I always recommend to list all members.

Re set {{send_cache_on_join}} to true by default: hmm, this increases traffic on joins... The default value would have to be set in superclass {{Discovery}}, and that's not something I want to do, as most discovery protocols are dynamic.

I also don't want to override this attribute in TCPPING as I don't like silently changing attributes.

Why don't you set either {{send_cache_on_join}} or {{return_entire_cache}} in the Infinispan configuration to true by default instead?

was (Author: belaban):
3. I never recommend TCPPING, except when other (more dynamic) discovery protocols cannot be used. If TCPPING is to be used, I always recommend to list all members.

Re set {{send_cache_on_join}} to true by default: hmm, this increases traffic on joins... OK, I'll change this for TCPPING only for now...

> Failed to send broadcast when opening the connection
> ----------------------------------------------------
>
>                 Key: JGRP-2162
>                 URL: https://issues.jboss.org/browse/JGRP-2162
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Radim Vansa
>            Assignee: Bela Ban
>             Fix For: 4.0.4
>
>         Attachments: TcpNio2McastTest.java, infinispan_2.log.gz
>
>
> IRC discussion:
> {quote}
>  bela_: Hi Bela, I have a weird failure in one test that seem to be rooted in JGroups. TCP_NIO2 is in charge, and there's a broadcast message to all nodes, but it seems it's not received on the other side.
> <bela_> rvansa: reproducible?
> <rvansa> bela_: it happens when the connection to a node is just being opened: I have added some trace logs and just a moment before writing to the NioConnection.send_buf it was in state "connection pending"
> <rvansa> bela_: sort of, after tens of runs of that test (on my machine) - and I've seen it first time in CI, so it could be
> <bela_> rvansa: NioConnection buffers writes up to a certain extent, then  discards anything over the buffer limit
> <bela_> rvansa: max_send_buffers (default: 10). But retransmission should fix this, unless you don’t wait long enough
> <rvansa> bela_: I don't think it should go over the limit
> <rvansa> bela_: the test is not doing anything else, just sending CommitCommand (that should be couple hundred bytes at most) and then waiting
> <rvansa> bela_: according to the traces I've added, Buffers.write returned false when writing the local address, and then true when writing the actual message
> {quote}
> I have been trying to write a reproducer, and found that it's related to the fact that the failing test uses custom (fake) discovery protocol, that doesn't open the connection during startup. In my ~reproducer I had to modify tcp-nio.xml to use TCPPING with only the first node in hosts list (localhost[7800]):
> {code:xml}
> <TCPPING async_discovery="true" initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800]}" port_range="0"/>
> {code}
> This causes that the physical connection is not opened by discovery. However, the reproducer suffers from (always reproducible) flaw - it does not send the message to third node at all (and the test fails, therefore).
> Note that increasing the timeout in request options does not help.

--
This message was sent by Atlassian JIRA
(v7.2.3#72005)