[JBoss JIRA] (JGRP-2168) JChannel(Collection<Protocol>) constructor clears protocol properties with non-default converter whose associated system property is not defined
by Paul Ferraro (JIRA)
[ https://issues.jboss.org/browse/JGRP-2168?page=com.atlassian.jira.plugin.... ]
Paul Ferraro commented on JGRP-2168:
------------------------------------
No problem - it's not urgent. The primary issue is that the workaround in WF11 (which uses 3.6.x) relies on JChannel.setProtocolStack(...) which doesn't exist in JGroups 4.0.x - so this won't become urgent until WF12.
> JChannel(Collection<Protocol>) constructor clears protocol properties with non-default converter whose associated system property is not defined
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: JGRP-2168
> URL: https://issues.jboss.org/browse/JGRP-2168
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.13, 4.0.1
> Reporter: Paul Ferraro
> Assignee: Bela Ban
> Fix For: 4.0.3
>
>
> WildFly 11 recently started using the new JChannel(Protocol...) constructor for creating channels. This has resulted in the inability to configure certain protocol properties, most notably, initial_hosts for TCPPING.
> Because this constructor calls resolveAndAssignFields(...) with an empty map, if a property was explicitly set, and its associated system property does not exist, and that property uses a non-default converter, then it will have its value undefined (or, more specifically, set to whatever the converter does with a null value).
> Additionally, if the assocated system property did exist, it would take precedence over an explicitly set value. I don't think that's a good idea.
> Consider the following:
> {code:java}
> TCP transport = new TCP();
> transport.setBindAddress(InetAddress.getLocalHost());
> transport.setBindPort(9600);
> TCPPING ping = new TCPPING();
> ping.setInitialHosts(Collections.singletonList(new IpAddress(transport.getBindAddress(), transport.getBindPort())));
> JChannel channel = new JChannel(transport, ping);
> assert !ping.getInitialHosts().isEmpty() : "No initial hosts!";
> {code}
> Side note: new JChannel(Collection<Protocol>) should really be new JChannel(List<Protocol>), since the collection should be ordered.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
7 years, 8 months
[JBoss JIRA] (JGRP-2162) Failed to send broadcast when opening the connection
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2162?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2162:
--------------------------------
The second issue with {{Buffers.write()}} returning false might be caused by the first one; namely the transport not being able to send the message as no physical address for it is available.
[~rvansa] Is this issue still present when one of the suggested fixes is in place?
> Failed to send broadcast when opening the connection
> ----------------------------------------------------
>
> Key: JGRP-2162
> URL: https://issues.jboss.org/browse/JGRP-2162
> Project: JGroups
> Issue Type: Bug
> Reporter: Radim Vansa
> Assignee: Bela Ban
> Fix For: 4.0.3
>
> Attachments: TcpNio2McastTest.java, infinispan_2.log.gz
>
>
> IRC discussion:
> {quote}
> bela_: Hi Bela, I have a weird failure in one test that seem to be rooted in JGroups. TCP_NIO2 is in charge, and there's a broadcast message to all nodes, but it seems it's not received on the other side.
> <bela_> rvansa: reproducible?
> <rvansa> bela_: it happens when the connection to a node is just being opened: I have added some trace logs and just a moment before writing to the NioConnection.send_buf it was in state "connection pending"
> <rvansa> bela_: sort of, after tens of runs of that test (on my machine) - and I've seen it first time in CI, so it could be
> <bela_> rvansa: NioConnection buffers writes up to a certain extent, then discards anything over the buffer limit
> <bela_> rvansa: max_send_buffers (default: 10). But retransmission should fix this, unless you don’t wait long enough
> <rvansa> bela_: I don't think it should go over the limit
> <rvansa> bela_: the test is not doing anything else, just sending CommitCommand (that should be couple hundred bytes at most) and then waiting
> <rvansa> bela_: according to the traces I've added, Buffers.write returned false when writing the local address, and then true when writing the actual message
> {quote}
> I have been trying to write a reproducer, and found that it's related to the fact that the failing test uses custom (fake) discovery protocol, that doesn't open the connection during startup. In my ~reproducer I had to modify tcp-nio.xml to use TCPPING with only the first node in hosts list (localhost[7800]):
> {code:xml}
> <TCPPING async_discovery="true" initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800]}" port_range="0"/>
> {code}
> This causes that the physical connection is not opened by discovery. However, the reproducer suffers from (always reproducible) flaw - it does not send the message to third node at all (and the test fails, therefore).
> Note that increasing the timeout in request options does not help.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
7 years, 8 months
[JBoss JIRA] (JGRP-2162) Failed to send broadcast when opening the connection
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2162?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2162:
--------------------------------
This looks like 2 separate issues. Let me address the first one first.
When initial_hosts is A only, then the caches will be after B and C join
* A: ABC
* B: AB
* C: AC
When sending a multicast, A would succeed as it has all addresses of the other members, but B would fail sending the message to C and C would fail sending the message to B.
There are 3 ways to resolve this:
1. Include all hosts (or as many as possible) in {{TCPPING.initial_hosts}}
2. Set {{TCPPING.send_cache_on_join}} to {{true}}
3. Use a dynamic discovery protocol
Note that this is not an issue in {{UDP}} as a (group) multicast results in an IP multicast, whereas we have to send the same message multiple times in {{TCP_NIO2}}.
> Failed to send broadcast when opening the connection
> ----------------------------------------------------
>
> Key: JGRP-2162
> URL: https://issues.jboss.org/browse/JGRP-2162
> Project: JGroups
> Issue Type: Bug
> Reporter: Radim Vansa
> Assignee: Bela Ban
> Fix For: 4.0.3
>
> Attachments: TcpNio2McastTest.java, infinispan_2.log.gz
>
>
> IRC discussion:
> {quote}
> bela_: Hi Bela, I have a weird failure in one test that seem to be rooted in JGroups. TCP_NIO2 is in charge, and there's a broadcast message to all nodes, but it seems it's not received on the other side.
> <bela_> rvansa: reproducible?
> <rvansa> bela_: it happens when the connection to a node is just being opened: I have added some trace logs and just a moment before writing to the NioConnection.send_buf it was in state "connection pending"
> <rvansa> bela_: sort of, after tens of runs of that test (on my machine) - and I've seen it first time in CI, so it could be
> <bela_> rvansa: NioConnection buffers writes up to a certain extent, then discards anything over the buffer limit
> <bela_> rvansa: max_send_buffers (default: 10). But retransmission should fix this, unless you don’t wait long enough
> <rvansa> bela_: I don't think it should go over the limit
> <rvansa> bela_: the test is not doing anything else, just sending CommitCommand (that should be couple hundred bytes at most) and then waiting
> <rvansa> bela_: according to the traces I've added, Buffers.write returned false when writing the local address, and then true when writing the actual message
> {quote}
> I have been trying to write a reproducer, and found that it's related to the fact that the failing test uses custom (fake) discovery protocol, that doesn't open the connection during startup. In my ~reproducer I had to modify tcp-nio.xml to use TCPPING with only the first node in hosts list (localhost[7800]):
> {code:xml}
> <TCPPING async_discovery="true" initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800]}" port_range="0"/>
> {code}
> This causes that the physical connection is not opened by discovery. However, the reproducer suffers from (always reproducible) flaw - it does not send the message to third node at all (and the test fails, therefore).
> Note that increasing the timeout in request options does not help.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
7 years, 8 months
[JBoss JIRA] (JGRP-2168) JChannel(Collection<Protocol>) constructor clears protocol properties with non-default converter whose associated system property is not defined
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2168?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2168:
--------------------------------
[~pferraro] I'll try to release a 4.0.3 asap, but I'd like to resolve the other 2 issues first. As it seems you have a workaround, this is not super urgent, is it?
> JChannel(Collection<Protocol>) constructor clears protocol properties with non-default converter whose associated system property is not defined
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: JGRP-2168
> URL: https://issues.jboss.org/browse/JGRP-2168
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.13, 4.0.1
> Reporter: Paul Ferraro
> Assignee: Bela Ban
> Fix For: 4.0.3
>
>
> WildFly 11 recently started using the new JChannel(Protocol...) constructor for creating channels. This has resulted in the inability to configure certain protocol properties, most notably, initial_hosts for TCPPING.
> Because this constructor calls resolveAndAssignFields(...) with an empty map, if a property was explicitly set, and its associated system property does not exist, and that property uses a non-default converter, then it will have its value undefined (or, more specifically, set to whatever the converter does with a null value).
> Additionally, if the assocated system property did exist, it would take precedence over an explicitly set value. I don't think that's a good idea.
> Consider the following:
> {code:java}
> TCP transport = new TCP();
> transport.setBindAddress(InetAddress.getLocalHost());
> transport.setBindPort(9600);
> TCPPING ping = new TCPPING();
> ping.setInitialHosts(Collections.singletonList(new IpAddress(transport.getBindAddress(), transport.getBindPort())));
> JChannel channel = new JChannel(transport, ping);
> assert !ping.getInitialHosts().isEmpty() : "No initial hosts!";
> {code}
> Side note: new JChannel(Collection<Protocol>) should really be new JChannel(List<Protocol>), since the collection should be ordered.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
7 years, 8 months