[
https://issues.jboss.org/browse/JGRP-2162?page=com.atlassian.jira.plugin....
]
Radim Vansa updated JGRP-2162:
------------------------------
Description:
IRC discussion:
{quote}
bela_: Hi Bela, I have a weird failure in one test that seem to be rooted in JGroups.
TCP_NIO2 is in charge, and there's a broadcast message to all nodes, but it seems
it's not received on the other side.
<bela_> rvansa: reproducible?
<rvansa> bela_: it happens when the connection to a node is just being opened: I
have added some trace logs and just a moment before writing to the NioConnection.send_buf
it was in state "connection pending"
<rvansa> bela_: sort of, after tens of runs of that test (on my machine) - and
I've seen it first time in CI, so it could be
<bela_> rvansa: NioConnection buffers writes up to a certain extent, then discards
anything over the buffer limit
<bela_> rvansa: max_send_buffers (default: 10). But retransmission should fix this,
unless you don’t wait long enough
<rvansa> bela_: I don't think it should go over the limit
<rvansa> bela_: the test is not doing anything else, just sending CommitCommand
(that should be couple hundred bytes at most) and then waiting
<rvansa> bela_: according to the traces I've added, Buffers.write returned false
when writing the local address, and then true when writing the actual message
{quote}
I have been trying to write a reproducer, and found that it's related to the fact that
the failing test uses custom (fake) discovery protocol, that doesn't open the
connection during startup. In my ~reproducer I had to modify tcp-nio.xml to use TCPPING
with only the first node in hosts list (localhost[7800]):
{code:xml}
<TCPPING async_discovery="true"
initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800]}"
port_range="0"/>
{code}
This causes that the physical connection is not open. However, the reproducer suffers from
(always reproducible) flaw, not sending the message to third node at all.
was:
IRC disucssion:
{quote}
bela_: Hi Bela, I have a weird failure in one test that seem to be rooted in JGroups.
TCP_NIO2 is in charge, and there's a broadcast message to all nodes, but it seems
it's not received on the other side.
<bela_> rvansa: reproducible?
<rvansa> bela_: it happens when the connection to a node is just being opened: I
have added some trace logs and just a moment before writing to the NioConnection.send_buf
it was in state "connection pending"
<rvansa> bela_: sort of, after tens of runs of that test (on my machine) - and
I've seen it first time in CI, so it could be
<bela_> rvansa: NioConnection buffers writes up to a certain extent, then discards
anything over the buffer limit
<bela_> rvansa: max_send_buffers (default: 10). But retransmission should fix this,
unless you don’t wait long enough
<rvansa> bela_: I don't think it should go over the limit
<rvansa> bela_: the test is not doing anything else, just sending CommitCommand
(that should be couple hundred bytes at most) and then waiting
<rvansa> bela_: according to the traces I've added, Buffers.write returned false
when writing the local address, and then true when writing the actual message
{quote}
I was trying to write a reproducer, and found that it's related to the fact that the
failing test uses custom (fake) discovery protocol, that doesn't open the connection
during startup. In my ~reproducer I had to modify tcp-nio.xml to use TCP_PING with only
the first node in hosts list (localhost[7800]) - this causes that the physical connection
is not open. However, the reproducer suffers from (always reproducible) flaw, not sending
the message to third node at all.
Failed to send broadcast when opening the connection
----------------------------------------------------
Key: JGRP-2162
URL:
https://issues.jboss.org/browse/JGRP-2162
Project: JGroups
Issue Type: Bug
Reporter: Radim Vansa
Assignee: Bela Ban
Attachments: TcpNio2McastTest.java
IRC discussion:
{quote}
bela_: Hi Bela, I have a weird failure in one test that seem to be rooted in JGroups.
TCP_NIO2 is in charge, and there's a broadcast message to all nodes, but it seems
it's not received on the other side.
<bela_> rvansa: reproducible?
<rvansa> bela_: it happens when the connection to a node is just being opened: I
have added some trace logs and just a moment before writing to the NioConnection.send_buf
it was in state "connection pending"
<rvansa> bela_: sort of, after tens of runs of that test (on my machine) - and
I've seen it first time in CI, so it could be
<bela_> rvansa: NioConnection buffers writes up to a certain extent, then discards
anything over the buffer limit
<bela_> rvansa: max_send_buffers (default: 10). But retransmission should fix this,
unless you don’t wait long enough
<rvansa> bela_: I don't think it should go over the limit
<rvansa> bela_: the test is not doing anything else, just sending CommitCommand
(that should be couple hundred bytes at most) and then waiting
<rvansa> bela_: according to the traces I've added, Buffers.write returned
false when writing the local address, and then true when writing the actual message
{quote}
I have been trying to write a reproducer, and found that it's related to the fact
that the failing test uses custom (fake) discovery protocol, that doesn't open the
connection during startup. In my ~reproducer I had to modify tcp-nio.xml to use TCPPING
with only the first node in hosts list (localhost[7800]):
{code:xml}
<TCPPING async_discovery="true"
initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800]}"
port_range="0"/>
{code}
This causes that the physical connection is not open. However, the reproducer suffers
from (always reproducible) flaw, not sending the message to third node at all.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)