[jboss-jira] [JBoss JIRA] (JGRP-2162) Failed to send broadcast when opening the connection

Radim Vansa (JIRA) issues at jboss.org
Thu May 11 11:45:00 EDT 2017


    [ https://issues.jboss.org/browse/JGRP-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405295#comment-13405295 ] 

Radim Vansa commented on JGRP-2162:
-----------------------------------

{{Discovery}} could define {{getDefaultSendCacheOnJoin() { return false; } }}, that way TCPPING would override it cleanly. But I agree that overriding parent defaults isn't perfect.

You're right that the recommendation is in docs, maybe you could add that to schema docs, too. Also, you could mention that {{send_cache_on_join}} should be set on if you don't provide all hosts.

Still, in ideal case, when a node does not find a mapping between the logical and physical address, it should try to find it. Cache is, by nature, something expendable.

I'll modify Infinispan configs to get less flaky testsuite in the meantime anyway.

> Failed to send broadcast when opening the connection
> ----------------------------------------------------
>
>                 Key: JGRP-2162
>                 URL: https://issues.jboss.org/browse/JGRP-2162
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Radim Vansa
>            Assignee: Bela Ban
>             Fix For: 4.0.4
>
>         Attachments: TcpNio2McastTest.java, infinispan_2.log.gz
>
>
> IRC discussion:
> {quote}
>  bela_: Hi Bela, I have a weird failure in one test that seem to be rooted in JGroups. TCP_NIO2 is in charge, and there's a broadcast message to all nodes, but it seems it's not received on the other side.
> <bela_> rvansa: reproducible?
> <rvansa> bela_: it happens when the connection to a node is just being opened: I have added some trace logs and just a moment before writing to the NioConnection.send_buf it was in state "connection pending"
> <rvansa> bela_: sort of, after tens of runs of that test (on my machine) - and I've seen it first time in CI, so it could be
> <bela_> rvansa: NioConnection buffers writes up to a certain extent, then  discards anything over the buffer limit
> <bela_> rvansa: max_send_buffers (default: 10). But retransmission should fix this, unless you don’t wait long enough
> <rvansa> bela_: I don't think it should go over the limit
> <rvansa> bela_: the test is not doing anything else, just sending CommitCommand (that should be couple hundred bytes at most) and then waiting
> <rvansa> bela_: according to the traces I've added, Buffers.write returned false when writing the local address, and then true when writing the actual message
> {quote}
> I have been trying to write a reproducer, and found that it's related to the fact that the failing test uses custom (fake) discovery protocol, that doesn't open the connection during startup. In my ~reproducer I had to modify tcp-nio.xml to use TCPPING with only the first node in hosts list (localhost[7800]):
> {code:xml}
> <TCPPING async_discovery="true" initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800]}" port_range="0"/>
> {code}
> This causes that the physical connection is not opened by discovery. However, the reproducer suffers from (always reproducible) flaw - it does not send the message to third node at all (and the test fails, therefore).
> Note that increasing the timeout in request options does not help.



--
This message was sent by Atlassian JIRA
(v7.2.3#72005)



More information about the jboss-jira mailing list