[JBoss JIRA] (WFLY-6749) Cluster failover doesn't work on windows when network is disabled on a node
by Preeta Kuruvilla (JIRA)
[ https://issues.jboss.org/browse/WFLY-6749?page=com.atlassian.jira.plugin.... ]
Preeta Kuruvilla commented on WFLY-6749:
----------------------------------------
More Explanation:
Failover works when a node is stopped from Admin console. It also works when you press Ctrl + C to stop services on that node. It just doesn't work when you try to disable the network on the node and this disabling also results in hampering of application functionality on the cluster.
> Cluster failover doesn't work on windows when network is disabled on a node
> ---------------------------------------------------------------------------
>
> Key: WFLY-6749
> URL: https://issues.jboss.org/browse/WFLY-6749
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 8.2.0.Final
> Reporter: Preeta Kuruvilla
> Assignee: Paul Ferraro
> Priority: Critical
>
> This is about a two VM Wildfly cluster on windows environment. In order to test the failover, the team has disabled the network on one node. However the failover is not happening and the application functionality on the cluster is hampered as a result.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months
[JBoss JIRA] (JGRP-2082) Coordinator failover is taking longer because VERIFY_SUSPECT runs twice
by Takashi Nishigaya (JIRA)
[ https://issues.jboss.org/browse/JGRP-2082?page=com.atlassian.jira.plugin.... ]
Takashi Nishigaya edited comment on JGRP-2082 at 6/22/16 1:26 AM:
------------------------------------------------------------------
The following fix proposal works for me. All of the killed process are verified in the first phase. Is it correct?
{code:css}
--- jgroups//src/org/jgroups/protocols/FD_SOCK.java.orig 2016-06-20 12:10:02.000000000 +0900
+++ jgroups//src/org/jgroups/protocols/FD_SOCK.java 2016-06-20 12:10:44.000000000 +0900
@@ -470,7 +470,7 @@
Address first=eligible_mbrs.get(0);
if(local_addr.equals(first)) {
log.debug("%s: suspecting %s", local_addr, suspected_mbrs);
- for(Address suspect: suspects) {
+ for(Address suspect: suspected_mbrs) {
up_prot.up(new Event(Event.SUSPECT, suspect));
down_prot.down(new Event(Event.SUSPECT, suspect));
}
{code}
was (Author: nishigaya):
The following fix proposal works for me. All of the killed process are verified in the first phase. Is it correct?
~~~
--- jgroups//src/org/jgroups/protocols/FD_SOCK.java.orig 2016-06-20 12:10:02.000000000 +0900
+++ jgroups//src/org/jgroups/protocols/FD_SOCK.java 2016-06-20 12:10:44.000000000 +0900
@@ -470,7 +470,7 @@
Address first=eligible_mbrs.get(0);
if(local_addr.equals(first)) {
log.debug("%s: suspecting %s", local_addr, suspected_mbrs);
- for(Address suspect: suspects) {
+ for(Address suspect: suspected_mbrs) {
up_prot.up(new Event(Event.SUSPECT, suspect));
down_prot.down(new Event(Event.SUSPECT, suspect));
}
~~~
> Coordinator failover is taking longer because VERIFY_SUSPECT runs twice
> -----------------------------------------------------------------------
>
> Key: JGRP-2082
> URL: https://issues.jboss.org/browse/JGRP-2082
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.3
> Reporter: Osamu Nagano
> Assignee: Bela Ban
> Fix For: 3.6.10, 4.0
>
>
> There are 4 machines (m03, ..., m06) and 7 nodes (n001, ..., n007) on each. To test the coordinator failover behaviour, nodes on m03 are all killed at the same time. These are suspected and verified in sequence at the first time (line 9 to 18, only m03_n007 is verified as dead and others are just queued), while these are verified as a whole at the second time (line 19 to 34). Since the coordinator is in the queued at the first time, view change is not triggered and causing a delay.
> - m03_n001 is the original coordinator.
> - m05_n001 is the next coordinator and the owner of the following log messages.
> {code}
> 8 12:04:10,997 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK acceptor,m05_n001/clustered) m05_n001/clustered: accepted connection from /172.20.66.36:29702
> 9 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n001/clustered]
> 10 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n002/clustered]
> 11 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n003/clustered]
> 12 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n004/clustered]
> 13 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n005/clustered]
> 14 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n006/clustered]
> 15 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n007/clustered]
> 16 12:04:10,999 DEBUG [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: suspecting [m03_n001/clustered, m03_n002/clustered, m03_n003/clustered, m03_n004/clustered, m03_n005/clustered, m03_n006/clustered, m03_n007/clustered]
> 17 12:04:11,000 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n007/clustered is dead
> 18 12:04:12,000 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n007/clustered is dead (passing up SUSPECT event)
> 19 12:04:16,000 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n003/clustered, m03_n004/clustered, m03_n005/clustered, m03_n002/clustered, m03_n001/clustered, m03_n007/clustered, m03_n006/clustered]
> 20 12:04:16,000 DEBUG [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: suspecting [m03_n001/clustered, m03_n002/clustered, m03_n003/clustered, m03_n004/clustered, m03_n005/clustered, m03_n006/clustered, m03_n007/clustered]
> 21 12:04:16,000 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n003/clustered is dead
> 22 12:04:16,000 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n004/clustered is dead
> 23 12:04:16,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n005/clustered is dead
> 24 12:04:16,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n002/clustered is dead
> 25 12:04:16,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n001/clustered is dead
> 26 12:04:16,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n007/clustered is dead
> 27 12:04:16,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n006/clustered is dead
> 28 12:04:17,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n003/clustered is dead (passing up SUSPECT event)
> 29 12:04:17,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n004/clustered is dead (passing up SUSPECT event)
> 30 12:04:17,002 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n007/clustered is dead (passing up SUSPECT event)
> 31 12:04:17,002 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n001/clustered is dead (passing up SUSPECT event)
> 32 12:04:17,002 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n002/clustered is dead (passing up SUSPECT event)
> 33 12:04:17,003 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n005/clustered is dead (passing up SUSPECT event)
> 34 12:04:17,003 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n006/clustered is dead (passing up SUSPECT event)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months
[JBoss JIRA] (JGRP-2082) Coordinator failover is taking longer because VERIFY_SUSPECT runs twice
by Takashi Nishigaya (JIRA)
[ https://issues.jboss.org/browse/JGRP-2082?page=com.atlassian.jira.plugin.... ]
Takashi Nishigaya commented on JGRP-2082:
-----------------------------------------
The following fix proposal works for me. All of the killed process are verified in the first phase. Is it correct?
~~~
--- jgroups//src/org/jgroups/protocols/FD_SOCK.java.orig 2016-06-20 12:10:02.000000000 +0900
+++ jgroups//src/org/jgroups/protocols/FD_SOCK.java 2016-06-20 12:10:44.000000000 +0900
@@ -470,7 +470,7 @@
Address first=eligible_mbrs.get(0);
if(local_addr.equals(first)) {
log.debug("%s: suspecting %s", local_addr, suspected_mbrs);
- for(Address suspect: suspects) {
+ for(Address suspect: suspected_mbrs) {
up_prot.up(new Event(Event.SUSPECT, suspect));
down_prot.down(new Event(Event.SUSPECT, suspect));
}
~~~
> Coordinator failover is taking longer because VERIFY_SUSPECT runs twice
> -----------------------------------------------------------------------
>
> Key: JGRP-2082
> URL: https://issues.jboss.org/browse/JGRP-2082
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.3
> Reporter: Osamu Nagano
> Assignee: Bela Ban
> Fix For: 3.6.10, 4.0
>
>
> There are 4 machines (m03, ..., m06) and 7 nodes (n001, ..., n007) on each. To test the coordinator failover behaviour, nodes on m03 are all killed at the same time. These are suspected and verified in sequence at the first time (line 9 to 18, only m03_n007 is verified as dead and others are just queued), while these are verified as a whole at the second time (line 19 to 34). Since the coordinator is in the queued at the first time, view change is not triggered and causing a delay.
> - m03_n001 is the original coordinator.
> - m05_n001 is the next coordinator and the owner of the following log messages.
> {code}
> 8 12:04:10,997 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK acceptor,m05_n001/clustered) m05_n001/clustered: accepted connection from /172.20.66.36:29702
> 9 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n001/clustered]
> 10 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n002/clustered]
> 11 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n003/clustered]
> 12 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n004/clustered]
> 13 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n005/clustered]
> 14 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n006/clustered]
> 15 12:04:10,999 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n007/clustered]
> 16 12:04:10,999 DEBUG [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: suspecting [m03_n001/clustered, m03_n002/clustered, m03_n003/clustered, m03_n004/clustered, m03_n005/clustered, m03_n006/clustered, m03_n007/clustered]
> 17 12:04:11,000 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n007/clustered is dead
> 18 12:04:12,000 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n007/clustered is dead (passing up SUSPECT event)
> 19 12:04:16,000 TRACE [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: received SUSPECT message from m06_n007/clustered: suspects=[m03_n003/clustered, m03_n004/clustered, m03_n005/clustered, m03_n002/clustered, m03_n001/clustered, m03_n007/clustered, m03_n006/clustered]
> 20 12:04:16,000 DEBUG [org.jgroups.protocols.FD_SOCK] (INT-28,shared=udp) m05_n001/clustered: suspecting [m03_n001/clustered, m03_n002/clustered, m03_n003/clustered, m03_n004/clustered, m03_n005/clustered, m03_n006/clustered, m03_n007/clustered]
> 21 12:04:16,000 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n003/clustered is dead
> 22 12:04:16,000 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n004/clustered is dead
> 23 12:04:16,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n005/clustered is dead
> 24 12:04:16,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n002/clustered is dead
> 25 12:04:16,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n001/clustered is dead
> 26 12:04:16,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n007/clustered is dead
> 27 12:04:16,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (INT-28,shared=udp) verifying that m03_n006/clustered is dead
> 28 12:04:17,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n003/clustered is dead (passing up SUSPECT event)
> 29 12:04:17,001 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n004/clustered is dead (passing up SUSPECT event)
> 30 12:04:17,002 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n007/clustered is dead (passing up SUSPECT event)
> 31 12:04:17,002 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n001/clustered is dead (passing up SUSPECT event)
> 32 12:04:17,002 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n002/clustered is dead (passing up SUSPECT event)
> 33 12:04:17,003 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n005/clustered is dead (passing up SUSPECT event)
> 34 12:04:17,003 TRACE [org.jgroups.protocols.VERIFY_SUSPECT] (VERIFY_SUSPECT.TimerThread,m05_n001/clustered) m03_n006/clustered is dead (passing up SUSPECT event)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months
[JBoss JIRA] (WFLY-6749) Cluster failover doesn't work on windows when network is disabled on a node
by Preeta Kuruvilla (JIRA)
[ https://issues.jboss.org/browse/WFLY-6749?page=com.atlassian.jira.plugin.... ]
Preeta Kuruvilla commented on WFLY-6749:
----------------------------------------
However the Failover works when we bring down the services on a node with Ctrl+C without disabling the network
> Cluster failover doesn't work on windows when network is disabled on a node
> ---------------------------------------------------------------------------
>
> Key: WFLY-6749
> URL: https://issues.jboss.org/browse/WFLY-6749
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 8.2.0.Final
> Reporter: Preeta Kuruvilla
> Assignee: Paul Ferraro
> Priority: Critical
>
> This is about a two VM Wildfly cluster on windows environment. In order to test the failover, the team has disabled the network on one node. However the failover is not happening and the application functionality on the cluster is hampered as a result.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months
[JBoss JIRA] (WFLY-6750) Modify ApplicationSecurityDomainService#getAuthenticationMechanisms to filter out unavailable mechanisms
by Farah Juma (JIRA)
[ https://issues.jboss.org/browse/WFLY-6750?page=com.atlassian.jira.plugin.... ]
Farah Juma updated WFLY-6750:
-----------------------------
Labels: affects_elytron (was: )
> Modify ApplicationSecurityDomainService#getAuthenticationMechanisms to filter out unavailable mechanisms
> ---------------------------------------------------------------------------------------------------------
>
> Key: WFLY-6750
> URL: https://issues.jboss.org/browse/WFLY-6750
> Project: WildFly
> Issue Type: Feature Request
> Components: Web (Undertow)
> Reporter: Farah Juma
> Assignee: Farah Juma
> Labels: affects_elytron
> Fix For: 11.0.0.Alpha1
>
>
> When attempting to access an app as an anonymous user, the following {{IllegalStateException}} occurs:
> {code}
> java.lang.IllegalStateException: ELY01119: Unable to resolve MechanismConfiguration for mechanismType='HTTP', mechanismName='SPNEGO', hostName='localhost', protocol='http'.
> org.wildfly.security.auth.server.ServerAuthenticationContext$InactiveState.transition(ServerAuthenticationContext.java:1083)
> {code}
> The problem is that we're not filtering based on the mechanisms that are actually available.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months
[JBoss JIRA] (WFLY-6750) Modify ApplicationSecurityDomainService#getAuthenticationMechanisms to filter out unavailable mechanisms
by Farah Juma (JIRA)
Farah Juma created WFLY-6750:
--------------------------------
Summary: Modify ApplicationSecurityDomainService#getAuthenticationMechanisms to filter out unavailable mechanisms
Key: WFLY-6750
URL: https://issues.jboss.org/browse/WFLY-6750
Project: WildFly
Issue Type: Feature Request
Components: Web (Undertow)
Reporter: Farah Juma
Assignee: Farah Juma
Fix For: 11.0.0.Alpha1
When attempting to access an app as an anonymous user, the following {{IllegalStateException}} occurs:
{code}
java.lang.IllegalStateException: ELY01119: Unable to resolve MechanismConfiguration for mechanismType='HTTP', mechanismName='SPNEGO', hostName='localhost', protocol='http'.
org.wildfly.security.auth.server.ServerAuthenticationContext$InactiveState.transition(ServerAuthenticationContext.java:1083)
{code}
The problem is that we're not filtering based on the mechanisms that are actually available.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months
[JBoss JIRA] (WFLY-6671) ajp connection hangs if a post HTTP request header contains 'Transfer-Encoding: chunked'
by Stuart Douglas (JIRA)
[ https://issues.jboss.org/browse/WFLY-6671?page=com.atlassian.jira.plugin.... ]
Stuart Douglas commented on WFLY-6671:
--------------------------------------
I still want to get to the bottom of it, when I have time I will investigate with the same version of apache you mention (this will obviously take up some time, as I need to install + set up apache and mod_jk, which is why I have not got to it yet).
> ajp connection hangs if a post HTTP request header contains 'Transfer-Encoding: chunked'
> -----------------------------------------------------------------------------------------
>
> Key: WFLY-6671
> URL: https://issues.jboss.org/browse/WFLY-6671
> Project: WildFly
> Issue Type: Bug
> Components: Web (Undertow)
> Affects Versions: 10.0.0.Final
> Environment: Apache HTTP server 2.2.22 with mod_jk 1.2.37
> Reporter: river shen
> Assignee: Stuart Douglas
> Attachments: service-1.0-SNAPSHOT.war, src.zip, stacks.txt, standalone.xml, workers.properties
>
>
> When upgrading from JBOSS 7 to WILDFLY10, we observed following behavior:
> if an HTTP post contains 'Transfer-Encoding: chunked' and 'Content-Type:appliation/octet-stream' in its head, A servlet which handles it will hang for ever ( until the client drop the connection) if it calls HttpServletRequest.getInputStream() and tries to read the whole content of the returned InputStream. The InputStream's read() method will block for ever at the end of the stream as opposed to return -1.
> It only happens when the request is routed by apache web server through ajp; it does not happen if the client talks to wildfly directly through its 8080 http port.
> We have attached a minimal web application that reproduce this issue.
> Also attached is the standalone.xml and the apache configuration file.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months