[JBoss JIRA] (JGRP-2387) Message from a non-member causes FD_ALL to continually suspect it
by Bela Ban (Jira)
[ https://issues.jboss.org/browse/JGRP-2387?page=com.atlassian.jira.plugin.... ]
Bela Ban edited comment on JGRP-2387 at 10/3/19 9:13 AM:
---------------------------------------------------------
The technical detail:
FD_ALL keeps track of the time the last message from each member was seen in the "timestamps" map.
It periodically suspects any entries in this map whose timestamps are too old.
When a new view is installed, any members that left are removed from the map, and an entry is added for each member if it doesn't already exist.
When any FD_ALL message is received from a member its entry in "timestamps" is updated.
If msg_counts_as_heartbeat is on then the same is done for every message from that member. (this is off by default)
The problem: When it updates the timestamp, no membership check is done first.
So a message from a non-member triggers an entry added to the table, which is never removed until the next view is processed, and will continually send suspect events up the stack.
This triggers VERIFY_SUSPECT to try to ping it, which it can't because it doesn't have the address (but can cause a "no physical address" log in some cases).
VERIFY_SUSPECT will eventually send SUSPECT events up the stack, which are ignored by GMS because the node isn't part of the cluster.
was (Author: dereed):
The technical detail:
FD_SOCK keeps track of the time the last message from each member was seen in the "timestamps" map.
It periodically suspects any entries in this map whose timestamps are too old.
When a new view is installed, any members that left are removed from the map, and an entry is added for each member if it doesn't already exist.
When any FD_SOCK message is received from a member its entry in "timestamps" is updated.
If msg_counts_as_heartbeat is on then the same is done for every message from that member. (this is off by default)
The problem: When it updates the timestamp, no membership check is done first.
So a message from a non-member triggers an entry added to the table, which is never removed until the next view is processed, and will continually send suspect events up the stack.
This triggers VERIFY_SUSPECT to try to ping it, which it can't because it doesn't have the address (but can cause a "no physical address" log in some cases).
VERIFY_SUSPECT will eventually send SUSPECT events up the stack, which are ignored by GMS because the node isn't part of the cluster.
> Message from a non-member causes FD_ALL to continually suspect it
> -----------------------------------------------------------------
>
> Key: JGRP-2387
> URL: https://issues.jboss.org/browse/JGRP-2387
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.1
> Reporter: Dennis Reed
> Assignee: Bela Ban
> Priority: Major
> Fix For: 4.1.6
>
> Attachments: Test.java
>
>
> If an FD_ALL control message from a non-member is seen by FD_ALL, it will start continually suspecting that node. If msg_counts_as_heartbeat=true then any message from a non-member triggers the issue. The issue is cleared on the next view change.
> This does not cause any functional issues in the cluster, but can cause repeated WARN logs in some cases.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month
[JBoss JIRA] (JGRP-2387) Message from a non-member causes FD_ALL to continually suspect it
by Bela Ban (Jira)
[ https://issues.jboss.org/browse/JGRP-2387?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-2387:
---------------------------
Description:
If an FD_ALL control message from a non-member is seen by FD_ALL, it will start continually suspecting that node. If msg_counts_as_heartbeat=true then any message from a non-member triggers the issue. The issue is cleared on the next view change.
This does not cause any functional issues in the cluster, but can cause repeated WARN logs in some cases.
was:
If a FD_SOCK control message from a non-member is seen by FD_SOCK, it will start continually suspecting that node. If msg_counts_as_heartbeat=true then any message from a non-member triggers the issue. The issue is cleared on the next view change.
This does not cause any functional issues in the cluster, but can cause repeated WARN logs in some cases.
> Message from a non-member causes FD_ALL to continually suspect it
> -----------------------------------------------------------------
>
> Key: JGRP-2387
> URL: https://issues.jboss.org/browse/JGRP-2387
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.1
> Reporter: Dennis Reed
> Assignee: Bela Ban
> Priority: Major
> Fix For: 4.1.6
>
> Attachments: Test.java
>
>
> If an FD_ALL control message from a non-member is seen by FD_ALL, it will start continually suspecting that node. If msg_counts_as_heartbeat=true then any message from a non-member triggers the issue. The issue is cleared on the next view change.
> This does not cause any functional issues in the cluster, but can cause repeated WARN logs in some cases.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month
[JBoss JIRA] (JGRP-2388) DNS_PING#destroy could yield NPE hiding the root cause
by Bela Ban (Jira)
[ https://issues.jboss.org/browse/JGRP-2388?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-2388:
---------------------------
Fix Version/s: 4.1.6
> DNS_PING#destroy could yield NPE hiding the root cause
> ------------------------------------------------------
>
> Key: JGRP-2388
> URL: https://issues.jboss.org/browse/JGRP-2388
> Project: JGroups
> Issue Type: Bug
> Reporter: Radoslav Husar
> Assignee: Radoslav Husar
> Priority: Minor
> Fix For: 4.1.6
>
>
> Caused by: java.lang.NullPointerException
> at org.jgroups.protocols.dns.DNS_PING.destroy(DNS_PING.java:70)
> at java.util.ArrayList.forEach(ArrayList.java:1257)
> at org.jgroups.stack.ProtocolStack.destroy(ProtocolStack.java:876)
> at org.jgroups.stack.ProtocolStack.initProtocolStack(ProtocolStack.java:867)
> at org.jgroups.stack.ProtocolStack.init(ProtocolStack.java:849)
> at org.jgroups.JChannel.<init>(JChannel.java:155)
> at org.jboss.as.clustering.jgroups.JChannelFactory.createChannel(JChannelFactory.java:116)
> at org.jboss.as.clustering.jgroups.subsystem.ChannelServiceConfigurator.get(ChannelServiceConfigurator.java:96)
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month
[JBoss JIRA] (JGRP-2388) DNS_PING#destroy could yield NPE hiding the root cause
by Radoslav Husar (Jira)
[ https://issues.jboss.org/browse/JGRP-2388?page=com.atlassian.jira.plugin.... ]
Radoslav Husar commented on JGRP-2388:
--------------------------------------
e.g. when the DNS resolver classes fail to load; with the proposed fix, the above would result int
{code}
Caused by: java.lang.ClassNotFoundException: com.sun.jndi.dns.DnsContextFactory from [Module "org.wildfly.clustering.service" version 18.0.0.Final-SNAPSHOT from local module loader @6537cf78 (finder: local module finder @67b6d4ae (roots: /Users/rhusar/git/wildfly/build/target/wildfly-18.0.0.Final-SNAPSHOT/modules,/Users/rhusar/git/wildfly/build/target/wildfly-18.0.0.Final-SNAPSHOT/modules/system/layers/base))]
at org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:255)
at org.jboss.modules.ConcurrentClassLoader.performLoadClassUnchecked(ConcurrentClassLoader.java:410)
at org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:398)
at org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:116)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.jboss.as.naming.InitialContext.getDefaultInitCtx(InitialContext.java:115)
... 25 more
{code}
> DNS_PING#destroy could yield NPE hiding the root cause
> ------------------------------------------------------
>
> Key: JGRP-2388
> URL: https://issues.jboss.org/browse/JGRP-2388
> Project: JGroups
> Issue Type: Bug
> Reporter: Radoslav Husar
> Assignee: Radoslav Husar
> Priority: Minor
>
> Caused by: java.lang.NullPointerException
> at org.jgroups.protocols.dns.DNS_PING.destroy(DNS_PING.java:70)
> at java.util.ArrayList.forEach(ArrayList.java:1257)
> at org.jgroups.stack.ProtocolStack.destroy(ProtocolStack.java:876)
> at org.jgroups.stack.ProtocolStack.initProtocolStack(ProtocolStack.java:867)
> at org.jgroups.stack.ProtocolStack.init(ProtocolStack.java:849)
> at org.jgroups.JChannel.<init>(JChannel.java:155)
> at org.jboss.as.clustering.jgroups.JChannelFactory.createChannel(JChannelFactory.java:116)
> at org.jboss.as.clustering.jgroups.subsystem.ChannelServiceConfigurator.get(ChannelServiceConfigurator.java:96)
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month
[JBoss JIRA] (JGRP-2388) DNS_PING#destroy could yield NPE hiding the root cause
by Radoslav Husar (Jira)
Radoslav Husar created JGRP-2388:
------------------------------------
Summary: DNS_PING#destroy could yield NPE hiding the root cause
Key: JGRP-2388
URL: https://issues.jboss.org/browse/JGRP-2388
Project: JGroups
Issue Type: Bug
Reporter: Radoslav Husar
Assignee: Bela Ban
Caused by: java.lang.NullPointerException
at org.jgroups.protocols.dns.DNS_PING.destroy(DNS_PING.java:70)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at org.jgroups.stack.ProtocolStack.destroy(ProtocolStack.java:876)
at org.jgroups.stack.ProtocolStack.initProtocolStack(ProtocolStack.java:867)
at org.jgroups.stack.ProtocolStack.init(ProtocolStack.java:849)
at org.jgroups.JChannel.<init>(JChannel.java:155)
at org.jboss.as.clustering.jgroups.JChannelFactory.createChannel(JChannelFactory.java:116)
at org.jboss.as.clustering.jgroups.subsystem.ChannelServiceConfigurator.get(ChannelServiceConfigurator.java:96)
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month
[JBoss JIRA] (JGRP-2388) DNS_PING#destroy could yield NPE hiding the root cause
by Radoslav Husar (Jira)
[ https://issues.jboss.org/browse/JGRP-2388?page=com.atlassian.jira.plugin.... ]
Radoslav Husar reassigned JGRP-2388:
------------------------------------
Assignee: Radoslav Husar (was: Bela Ban)
> DNS_PING#destroy could yield NPE hiding the root cause
> ------------------------------------------------------
>
> Key: JGRP-2388
> URL: https://issues.jboss.org/browse/JGRP-2388
> Project: JGroups
> Issue Type: Bug
> Reporter: Radoslav Husar
> Assignee: Radoslav Husar
> Priority: Minor
>
> Caused by: java.lang.NullPointerException
> at org.jgroups.protocols.dns.DNS_PING.destroy(DNS_PING.java:70)
> at java.util.ArrayList.forEach(ArrayList.java:1257)
> at org.jgroups.stack.ProtocolStack.destroy(ProtocolStack.java:876)
> at org.jgroups.stack.ProtocolStack.initProtocolStack(ProtocolStack.java:867)
> at org.jgroups.stack.ProtocolStack.init(ProtocolStack.java:849)
> at org.jgroups.JChannel.<init>(JChannel.java:155)
> at org.jboss.as.clustering.jgroups.JChannelFactory.createChannel(JChannelFactory.java:116)
> at org.jboss.as.clustering.jgroups.subsystem.ChannelServiceConfigurator.get(ChannelServiceConfigurator.java:96)
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month
[JBoss JIRA] (DROOLS-4563) Upgrade javax.validation from 1.0.0.GA to 2.0.1.Final
by Marek Novotny (Jira)
[ https://issues.jboss.org/browse/DROOLS-4563?page=com.atlassian.jira.plugi... ]
Marek Novotny reassigned DROOLS-4563:
-------------------------------------
Assignee: Marek Novotny (was: Michael Biarnes Kiefer)
> Upgrade javax.validation from 1.0.0.GA to 2.0.1.Final
> ------------------------------------------------------
>
> Key: DROOLS-4563
> URL: https://issues.jboss.org/browse/DROOLS-4563
> Project: Drools
> Issue Type: Feature Request
> Reporter: Michael Biarnes Kiefer
> Assignee: Marek Novotny
> Priority: Optional
>
> To do the upgrade of javax.validation [(PR)|https://github.com/kiegroup/droolsjbpm-build-bootstrap/pull/1055] it is needed to check all GWT reps (uberfire, errai,wb etc) because the javax.validation version is not supported in these reps.
> The version upgrade is needed for spring-boot and quarkus - so it should be upgraded in kie-parent but overwritten in all poms of GWT reps that doesn't support this new version by the old version. It is thought to add a dependency overwite in the root pom in those repo's to overwrite it to use the original older version
> On the other hand existing overrides like [this|https://github.com/kiegroup/droolsjbpm-integration/blob/81415ae5cdc4...] should be removed then.
> All reps should be examined if they have an override or if the need the old version.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month
[JBoss JIRA] (JGRP-2387) Message from a non-member causes FD_ALL to continually suspect it
by Bela Ban (Jira)
[ https://issues.jboss.org/browse/JGRP-2387?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2387:
--------------------------------
Note: need to look at FD_ALL2, too. The second solution suggested by [~dereed] LGTM.
> Message from a non-member causes FD_ALL to continually suspect it
> -----------------------------------------------------------------
>
> Key: JGRP-2387
> URL: https://issues.jboss.org/browse/JGRP-2387
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.1
> Reporter: Dennis Reed
> Assignee: Bela Ban
> Priority: Major
> Fix For: 4.1.6
>
> Attachments: Test.java
>
>
> If a FD_SOCK control message from a non-member is seen by FD_SOCK, it will start continually suspecting that node. If msg_counts_as_heartbeat=true then any message from a non-member triggers the issue. The issue is cleared on the next view change.
> This does not cause any functional issues in the cluster, but can cause repeated WARN logs in some cases.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month
[JBoss JIRA] (JGRP-2387) Message from a non-member causes FD_ALL to continually suspect it
by Bela Ban (Jira)
[ https://issues.jboss.org/browse/JGRP-2387?page=com.atlassian.jira.plugin.... ]
Bela Ban edited comment on JGRP-2387 at 10/3/19 8:11 AM:
---------------------------------------------------------
FD_ALL or FD_SOCK? I guess FD_ALL... Looking into this now...
was (Author: belaban):
FD_ALL or FD_SOCK? Looking into this now...
> Message from a non-member causes FD_ALL to continually suspect it
> -----------------------------------------------------------------
>
> Key: JGRP-2387
> URL: https://issues.jboss.org/browse/JGRP-2387
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.1
> Reporter: Dennis Reed
> Assignee: Bela Ban
> Priority: Major
> Fix For: 4.1.6
>
> Attachments: Test.java
>
>
> If a FD_SOCK control message from a non-member is seen by FD_SOCK, it will start continually suspecting that node. If msg_counts_as_heartbeat=true then any message from a non-member triggers the issue. The issue is cleared on the next view change.
> This does not cause any functional issues in the cluster, but can cause repeated WARN logs in some cases.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month
[JBoss JIRA] (JGRP-2387) Message from a non-member causes FD_ALL to continually suspect it
by Bela Ban (Jira)
[ https://issues.jboss.org/browse/JGRP-2387?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2387:
--------------------------------
FD_ALL or FD_SOCK? Looking into this now...
> Message from a non-member causes FD_ALL to continually suspect it
> -----------------------------------------------------------------
>
> Key: JGRP-2387
> URL: https://issues.jboss.org/browse/JGRP-2387
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.1
> Reporter: Dennis Reed
> Assignee: Bela Ban
> Priority: Major
> Fix For: 4.1.6
>
> Attachments: Test.java
>
>
> If a FD_SOCK control message from a non-member is seen by FD_SOCK, it will start continually suspecting that node. If msg_counts_as_heartbeat=true then any message from a non-member triggers the issue. The issue is cleared on the next view change.
> This does not cause any functional issues in the cluster, but can cause repeated WARN logs in some cases.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month