[JBoss JIRA] (JGRP-2162) Failed to send broadcast when opening the connection
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2162?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-2162:
---------------------------
Fix Version/s: 4.0.4
(was: 4.0.3)
> Failed to send broadcast when opening the connection
> ----------------------------------------------------
>
> Key: JGRP-2162
> URL: https://issues.jboss.org/browse/JGRP-2162
> Project: JGroups
> Issue Type: Bug
> Reporter: Radim Vansa
> Assignee: Bela Ban
> Fix For: 4.0.4
>
> Attachments: TcpNio2McastTest.java, infinispan_2.log.gz
>
>
> IRC discussion:
> {quote}
> bela_: Hi Bela, I have a weird failure in one test that seem to be rooted in JGroups. TCP_NIO2 is in charge, and there's a broadcast message to all nodes, but it seems it's not received on the other side.
> <bela_> rvansa: reproducible?
> <rvansa> bela_: it happens when the connection to a node is just being opened: I have added some trace logs and just a moment before writing to the NioConnection.send_buf it was in state "connection pending"
> <rvansa> bela_: sort of, after tens of runs of that test (on my machine) - and I've seen it first time in CI, so it could be
> <bela_> rvansa: NioConnection buffers writes up to a certain extent, then discards anything over the buffer limit
> <bela_> rvansa: max_send_buffers (default: 10). But retransmission should fix this, unless you don’t wait long enough
> <rvansa> bela_: I don't think it should go over the limit
> <rvansa> bela_: the test is not doing anything else, just sending CommitCommand (that should be couple hundred bytes at most) and then waiting
> <rvansa> bela_: according to the traces I've added, Buffers.write returned false when writing the local address, and then true when writing the actual message
> {quote}
> I have been trying to write a reproducer, and found that it's related to the fact that the failing test uses custom (fake) discovery protocol, that doesn't open the connection during startup. In my ~reproducer I had to modify tcp-nio.xml to use TCPPING with only the first node in hosts list (localhost[7800]):
> {code:xml}
> <TCPPING async_discovery="true" initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800]}" port_range="0"/>
> {code}
> This causes that the physical connection is not opened by discovery. However, the reproducer suffers from (always reproducible) flaw - it does not send the message to third node at all (and the test fails, therefore).
> Note that increasing the timeout in request options does not help.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years
[JBoss JIRA] (JGRP-2167) Highest seqno is not resent nor recorded on receivers
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2167?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-2167:
---------------------------
Fix Version/s: 4.0.4
(was: 4.0.3)
> Highest seqno is not resent nor recorded on receivers
> -----------------------------------------------------
>
> Key: JGRP-2167
> URL: https://issues.jboss.org/browse/JGRP-2167
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.1
> Reporter: Radim Vansa
> Assignee: Bela Ban
> Priority: Minor
> Fix For: 4.0.4
>
>
> I am investigating an issue in a stress test which leads me to a situation where in a TCP-based configuration a {{GMS[VIEW]}} is broadcast to all nodes, but it is not received by some of them. Soon after that there's a {{NAKACK2.HIGHEST_SEQNO}} that causes the node that is missing the last seqno to resend it, but the retransmit is not received either. There are no further retries, and generally no NAKACK2 activity until about 30 seconds later (when another node leaves after some timeout in the test).
> The receiver does not keep asking for retransmissions until it gets them, but it seems that {{NAKACK2.handleHighestSeqno}} doesn't update {{Table.hr}} (not sure if having highest received set to non-received msg would be legal, though).
> The sender uses default value {{NAKACK2.resend_last_seqno_max_times=1}}, and as there are no further mcast messages, the highest sent seqno does not change on sender.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years
[JBoss JIRA] (JGRP-2167) Highest seqno is not resent nor recorded on receivers
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2167?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2167:
--------------------------------
The problem with setting {{resend_last_seqno_max_time}} to a high value optimizes for a case that almost never happens, and causes unnecessary traffic and thread activity (in most cases).
The last view lost will eventually be delivered, when either (1) the view sender sends another multicast or (2) STABLE kicks in. However, (1) might never happen and (2) takes time, based on STABLE's configuration.
There are ways to improve this, but I'm not sure I like any of them:
1. Have the last message sender task get acks for its highest seqno from all cluster members
2. Let the receiver continue asking the sender for retransmission until it gets that last seqno, or until higher seqnos from the sender are seen
#1 causes additional traffic that's a function of the cluster size and the frequency of sending. E.g. if a sender sends a multicast every 2 seconds, this most likely (depending on the xmit_interval config) causes another multicast to be sent (last-seqno), plus N unicast acks to be received.
This also duplicates part of the functionality of STABLE.
#2 If the last-seqno message is lost, this won't help. Also, it leads to (unicast) unnecessary traffic as well.
I think the best solution in such an edge case is to reduce the timeouts in STABLE itself and let it run its course.
> Highest seqno is not resent nor recorded on receivers
> -----------------------------------------------------
>
> Key: JGRP-2167
> URL: https://issues.jboss.org/browse/JGRP-2167
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.1
> Reporter: Radim Vansa
> Assignee: Bela Ban
> Priority: Minor
> Fix For: 4.0.3
>
>
> I am investigating an issue in a stress test which leads me to a situation where in a TCP-based configuration a {{GMS[VIEW]}} is broadcast to all nodes, but it is not received by some of them. Soon after that there's a {{NAKACK2.HIGHEST_SEQNO}} that causes the node that is missing the last seqno to resend it, but the retransmit is not received either. There are no further retries, and generally no NAKACK2 activity until about 30 seconds later (when another node leaves after some timeout in the test).
> The receiver does not keep asking for retransmissions until it gets them, but it seems that {{NAKACK2.handleHighestSeqno}} doesn't update {{Table.hr}} (not sure if having highest received set to non-received msg would be legal, though).
> The sender uses default value {{NAKACK2.resend_last_seqno_max_times=1}}, and as there are no further mcast messages, the highest sent seqno does not change on sender.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years
[JBoss JIRA] (JGRP-2162) Failed to send broadcast when opening the connection
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2162?page=com.atlassian.jira.plugin.... ]
Bela Ban edited comment on JGRP-2162 at 5/10/17 2:21 AM:
---------------------------------------------------------
This looks like 2 separate issues. Let me address the first one first.
When initial_hosts is A only, then the caches will be after B and C join
* A: ABC
* B: AB
* C: AC
When sending a multicast, A would succeed as it has all addresses of the other members, but B would fail sending the message to C and C would fail sending the message to B.
Also, NAKACK2 won't retransmit as the receivers (B or C) never receive C's or B' message, so they won't ask the sender for retransmission.
There are 3 ways to resolve this:
1. Include all hosts (or as many as possible) in {{TCPPING.initial_hosts}}
2. Set {{TCPPING.send_cache_on_join}} to {{true}}
3. Use a dynamic discovery protocol
Note that this is not an issue in {{UDP}} as a (group) multicast results in an IP multicast, whereas we have to send the same message multiple times in {{TCP_NIO2}}.
was (Author: belaban):
This looks like 2 separate issues. Let me address the first one first.
When initial_hosts is A only, then the caches will be after B and C join
* A: ABC
* B: AB
* C: AC
When sending a multicast, A would succeed as it has all addresses of the other members, but B would fail sending the message to C and C would fail sending the message to B.
There are 3 ways to resolve this:
1. Include all hosts (or as many as possible) in {{TCPPING.initial_hosts}}
2. Set {{TCPPING.send_cache_on_join}} to {{true}}
3. Use a dynamic discovery protocol
Note that this is not an issue in {{UDP}} as a (group) multicast results in an IP multicast, whereas we have to send the same message multiple times in {{TCP_NIO2}}.
> Failed to send broadcast when opening the connection
> ----------------------------------------------------
>
> Key: JGRP-2162
> URL: https://issues.jboss.org/browse/JGRP-2162
> Project: JGroups
> Issue Type: Bug
> Reporter: Radim Vansa
> Assignee: Bela Ban
> Fix For: 4.0.3
>
> Attachments: TcpNio2McastTest.java, infinispan_2.log.gz
>
>
> IRC discussion:
> {quote}
> bela_: Hi Bela, I have a weird failure in one test that seem to be rooted in JGroups. TCP_NIO2 is in charge, and there's a broadcast message to all nodes, but it seems it's not received on the other side.
> <bela_> rvansa: reproducible?
> <rvansa> bela_: it happens when the connection to a node is just being opened: I have added some trace logs and just a moment before writing to the NioConnection.send_buf it was in state "connection pending"
> <rvansa> bela_: sort of, after tens of runs of that test (on my machine) - and I've seen it first time in CI, so it could be
> <bela_> rvansa: NioConnection buffers writes up to a certain extent, then discards anything over the buffer limit
> <bela_> rvansa: max_send_buffers (default: 10). But retransmission should fix this, unless you don’t wait long enough
> <rvansa> bela_: I don't think it should go over the limit
> <rvansa> bela_: the test is not doing anything else, just sending CommitCommand (that should be couple hundred bytes at most) and then waiting
> <rvansa> bela_: according to the traces I've added, Buffers.write returned false when writing the local address, and then true when writing the actual message
> {quote}
> I have been trying to write a reproducer, and found that it's related to the fact that the failing test uses custom (fake) discovery protocol, that doesn't open the connection during startup. In my ~reproducer I had to modify tcp-nio.xml to use TCPPING with only the first node in hosts list (localhost[7800]):
> {code:xml}
> <TCPPING async_discovery="true" initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800]}" port_range="0"/>
> {code}
> This causes that the physical connection is not opened by discovery. However, the reproducer suffers from (always reproducible) flaw - it does not send the message to third node at all (and the test fails, therefore).
> Note that increasing the timeout in request options does not help.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years
[JBoss JIRA] (JGRP-2162) Failed to send broadcast when opening the connection
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2162?page=com.atlassian.jira.plugin.... ]
Bela Ban edited comment on JGRP-2162 at 5/10/17 2:18 AM:
---------------------------------------------------------
[~pruivo] How do I define that tcp-nio.xml is used in OrphanTransactionsCleanupTest? If you used a different discovery protocol (not TEST_PING), does the test pass?
Update: OK, I can see that tcp-nio.xml is used for TCP tests by default. I ran testJoinerTransactionSurvives() with invocationCount=100, and all 100 tests passed.
How do you reproduce the issue?
was (Author: belaban):
[~pruivo] How do I define that tcp-nio.xml is used in OrphanTransactionsCleanupTest? If you used a different discovery protocol (not TEST_PING), does the test pass?
> Failed to send broadcast when opening the connection
> ----------------------------------------------------
>
> Key: JGRP-2162
> URL: https://issues.jboss.org/browse/JGRP-2162
> Project: JGroups
> Issue Type: Bug
> Reporter: Radim Vansa
> Assignee: Bela Ban
> Fix For: 4.0.3
>
> Attachments: TcpNio2McastTest.java, infinispan_2.log.gz
>
>
> IRC discussion:
> {quote}
> bela_: Hi Bela, I have a weird failure in one test that seem to be rooted in JGroups. TCP_NIO2 is in charge, and there's a broadcast message to all nodes, but it seems it's not received on the other side.
> <bela_> rvansa: reproducible?
> <rvansa> bela_: it happens when the connection to a node is just being opened: I have added some trace logs and just a moment before writing to the NioConnection.send_buf it was in state "connection pending"
> <rvansa> bela_: sort of, after tens of runs of that test (on my machine) - and I've seen it first time in CI, so it could be
> <bela_> rvansa: NioConnection buffers writes up to a certain extent, then discards anything over the buffer limit
> <bela_> rvansa: max_send_buffers (default: 10). But retransmission should fix this, unless you don’t wait long enough
> <rvansa> bela_: I don't think it should go over the limit
> <rvansa> bela_: the test is not doing anything else, just sending CommitCommand (that should be couple hundred bytes at most) and then waiting
> <rvansa> bela_: according to the traces I've added, Buffers.write returned false when writing the local address, and then true when writing the actual message
> {quote}
> I have been trying to write a reproducer, and found that it's related to the fact that the failing test uses custom (fake) discovery protocol, that doesn't open the connection during startup. In my ~reproducer I had to modify tcp-nio.xml to use TCPPING with only the first node in hosts list (localhost[7800]):
> {code:xml}
> <TCPPING async_discovery="true" initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800]}" port_range="0"/>
> {code}
> This causes that the physical connection is not opened by discovery. However, the reproducer suffers from (always reproducible) flaw - it does not send the message to third node at all (and the test fails, therefore).
> Note that increasing the timeout in request options does not help.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years
[JBoss JIRA] (JGRP-2162) Failed to send broadcast when opening the connection
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2162?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2162:
--------------------------------
[~pruivo] How do I define that tcp-nio.xml is used in OrphanTransactionsCleanupTest? If you used a different discovery protocol (not TEST_PING), does the test pass?
> Failed to send broadcast when opening the connection
> ----------------------------------------------------
>
> Key: JGRP-2162
> URL: https://issues.jboss.org/browse/JGRP-2162
> Project: JGroups
> Issue Type: Bug
> Reporter: Radim Vansa
> Assignee: Bela Ban
> Fix For: 4.0.3
>
> Attachments: TcpNio2McastTest.java, infinispan_2.log.gz
>
>
> IRC discussion:
> {quote}
> bela_: Hi Bela, I have a weird failure in one test that seem to be rooted in JGroups. TCP_NIO2 is in charge, and there's a broadcast message to all nodes, but it seems it's not received on the other side.
> <bela_> rvansa: reproducible?
> <rvansa> bela_: it happens when the connection to a node is just being opened: I have added some trace logs and just a moment before writing to the NioConnection.send_buf it was in state "connection pending"
> <rvansa> bela_: sort of, after tens of runs of that test (on my machine) - and I've seen it first time in CI, so it could be
> <bela_> rvansa: NioConnection buffers writes up to a certain extent, then discards anything over the buffer limit
> <bela_> rvansa: max_send_buffers (default: 10). But retransmission should fix this, unless you don’t wait long enough
> <rvansa> bela_: I don't think it should go over the limit
> <rvansa> bela_: the test is not doing anything else, just sending CommitCommand (that should be couple hundred bytes at most) and then waiting
> <rvansa> bela_: according to the traces I've added, Buffers.write returned false when writing the local address, and then true when writing the actual message
> {quote}
> I have been trying to write a reproducer, and found that it's related to the fact that the failing test uses custom (fake) discovery protocol, that doesn't open the connection during startup. In my ~reproducer I had to modify tcp-nio.xml to use TCPPING with only the first node in hosts list (localhost[7800]):
> {code:xml}
> <TCPPING async_discovery="true" initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800]}" port_range="0"/>
> {code}
> This causes that the physical connection is not opened by discovery. However, the reproducer suffers from (always reproducible) flaw - it does not send the message to third node at all (and the test fails, therefore).
> Note that increasing the timeout in request options does not help.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years
[JBoss JIRA] (WFLY-8739) Provide operation to invalidate application sessions
by Stuart Douglas (JIRA)
[ https://issues.jboss.org/browse/WFLY-8739?page=com.atlassian.jira.plugin.... ]
Stuart Douglas moved UNDERTOW-1033 to WFLY-8739:
------------------------------------------------
Project: WildFly (was: Undertow)
Key: WFLY-8739 (was: UNDERTOW-1033)
Component/s: Web (Undertow)
(was: Core)
Affects Version/s: (was: 1.4.11.Final)
> Provide operation to invalidate application sessions
> ----------------------------------------------------
>
> Key: WFLY-8739
> URL: https://issues.jboss.org/browse/WFLY-8739
> Project: WildFly
> Issue Type: Feature Request
> Components: Web (Undertow)
> Reporter: Aaron Ogburn
> Assignee: Stuart Douglas
>
> JBossWeb had mbean operations that could be used to invalidate sessions manually whenever desired. No such operations are exposed with undertow so it would be nice if some operations were exposed via CLI that can be used to expire individual sessions or all sessions belonging to a web app.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years