[JBoss JIRA] (WFLY-6781) Wildfly cluster's failover functionality doesn't work as expected
by Preeta Kuruvilla (JIRA)
[ https://issues.jboss.org/browse/WFLY-6781?page=com.atlassian.jira.plugin.... ]
Preeta Kuruvilla commented on WFLY-6781:
----------------------------------------
Inline please find my comments.
I'm confused, as you seem to be using the terms RC and SL to interchangeably refer to web applications (e.g. RC.war and SL.rar) and a mysterious component with a remoting connection (e.g. RC communicates remotely with SL). What kind of component is this? How is the remoting connection used?
{color:#205081}Preeta: RC.war and SL.war are 2 applications we deploy on Wildfly. Both of them work together in our product. RC is clustered and hence it is deployed on Node1 and Node2 while SL(non-clustered) exist only on Node1. RC on Node1 and Node2 communicates with SL on Node1 through JMS which is configured on SL. The JMS Jndi URL to connect is http-remoting://ip-address:6080.{color}
server-host1-RC and server-host1-SL belong to different server groups - but I don't know how these server groups are configured.
You seem to be managing the server instance of each node as its own domain, is that right? Why is that? How do the configurations of these domains differ?
{color:#205081}Preeta: As I said, RC is clustered while SL is non-clustered and both are different components, hence they belong to different server groups although managed by one domain. The Domain controller is just one and it is in Node1. {color}
"By the way RC communicates remotely with JMS on SL using the below url :
http-remoting://<ip address of SL which is Node1>:6080/"
What is the context in which this url is used? Is this a JNDI provider URL? Details please.
{color:#205081}Preeta: Yes, it is JMS JNDI provider URL. As I said above, the JMS is configured on SL and RC connects to JMS using this url.{color}
Remoting connections are point-to-point connections and don't magically failover if the target location becomes unavailable. What exactly is your expectation here?
{color:#205081}Preeta: When network on Node2 is disabled, then Node1 should gracefully take over and RC in Node1 should continue to connect to JMS/SL on Node1. But what is happening is that the SL(on Node1) is not processing the request that comes from RC in Node1 {color}
I still don't understand exactly what aspect of your system is "clustered". I also don't understand which connections you expect to failover. Which (and what kind of) components are clustered? Do your logs indicate proper cluster formation? What do your logs indicate when you test these failure scenarios?
{color:#205081}Preeta: Only RC is clustered and not the SL. I have answered this above. Yes our logs indicate proper cluster formation. On disabling network on Node2, the service on Node2 does not come out of execution. It throws error saying "connection got disconnected", which is correct. The issue is that the Node1 under this condition does not work as expected, meaning the SL is not processing the request that comes from RC in Node1. {color}
{color:#205081}Note: Things work as expected when we give Ctrl +C on Node2 or else when we stop the server on Node2 from the admin console.{color}
> Wildfly cluster's failover functionality doesn't work as expected
> -----------------------------------------------------------------
>
> Key: WFLY-6781
> URL: https://issues.jboss.org/browse/WFLY-6781
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 8.2.0.Final
> Reporter: Preeta Kuruvilla
> Assignee: Paul Ferraro
> Priority: Blocker
>
> Following are the testing scenarios we did and the outcome:-
> 1. Network disabling on a VM for testing failover – Not working for both Linux and Windows environment.
> 2. Power off of a VM using VMware client for testing failover – Is working on Linux environment but not working on windows environment.
> 3. Ctrl + C method to stop services on a node for testing failover – works on both linux and windows environment
> 4. Stopping server running on Node /VM using Admin Console for testing failover - works on both linux and windows environment.
> Jgroups subsystem configuration in domain.xml we have is below:-
> <subsystem xmlns="urn:jboss:domain:jgroups:2.0" default-stack="udp">
> <stack name="udp">
> <transport type="UDP" socket-binding="jgroups-udp"/>
> <protocol type="PING"/>
> <protocol type="MERGE3"/>
> <protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
> <protocol type="FD_ALL"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="UFC"/>
> <protocol type="MFC"/>
> <protocol type="FRAG2"/>
> <protocol type="RSVP"/>
> </stack>
> <stack name="tcp">
> <transport type="TCP" socket-binding="jgroups-tcp"/>
> <protocol type="MPING" socket-binding="jgroups-mping"/>
> <protocol type="MERGE2"/>
> <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
> <protocol type="FD"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="MFC"/>
> <protocol type="FRAG2"/>
> <protocol type="RSVP"/>
> </stack>
> </subsystem>
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months
[JBoss JIRA] (WFLY-6682) Upgrade Hibernate to 5.2.0
by Scott Marlow (JIRA)
[ https://issues.jboss.org/browse/WFLY-6682?page=com.atlassian.jira.plugin.... ]
Scott Marlow edited comment on WFLY-6682 at 6/29/16 10:48 AM:
--------------------------------------------------------------
At *least* I'm down from 5 accounts to 3 :-) I just put in a request to remove the two extra accounts, thanks for the reminder!
was (Author: smarlow):
At I'm down from 5 accounts to 3 :-) I just put in a request to remove the two extra accounts, thanks for the reminder!
> Upgrade Hibernate to 5.2.0
> --------------------------
>
> Key: WFLY-6682
> URL: https://issues.jboss.org/browse/WFLY-6682
> Project: WildFly
> Issue Type: Component Upgrade
> Components: JPA / Hibernate
> Affects Versions: 10.0.0.Final
> Reporter: Frank Langelage
> Assignee: Scott Marlow
> Priority: Optional
> Fix For: 10.1.0.Final
>
> Attachments: screenshot-1.png
>
>
> Upgrade Hibernate to latest version 5.2.0.Final.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months
[JBoss JIRA] (WFLY-6791) Upgrade to Hibernate Search 5.5.4.Final
by Sanne Grinovero (JIRA)
Sanne Grinovero created WFLY-6791:
-------------------------------------
Summary: Upgrade to Hibernate Search 5.5.4.Final
Key: WFLY-6791
URL: https://issues.jboss.org/browse/WFLY-6791
Project: WildFly
Issue Type: Component Upgrade
Components: JPA / Hibernate
Reporter: Sanne Grinovero
Assignee: Sanne Grinovero
Priority: Minor
Fix For: 10.1.0.Final
We released a new micro version of Hibernate Search.
Fully backwards compatible, yet has some nice performance improvements and a bugfix.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months
[JBoss JIRA] (WFLY-6781) Wildfly cluster's failover functionality doesn't work as expected
by Paul Ferraro (JIRA)
[ https://issues.jboss.org/browse/WFLY-6781?page=com.atlassian.jira.plugin.... ]
Paul Ferraro commented on WFLY-6781:
------------------------------------
I'm confused, as you seem to be using the terms RC and SL to interchangeably refer to web applications (e.g. RC.war and SL.rar) and a mysterious component with a remoting connection (e.g. RC communicates remotely with SL). What kind of component is this? How is the remoting connection used?
server-host1-RC and server-host1-SL belong to different server groups - but I don't know how these server groups are configured.
You seem to be managing the server instance of each node as its own domain, is that right? Why is that? How do the configurations of these domains differ?
"By the way RC communicates remotely with SL using the below url :
http-remoting://<ip address of SL which is Node1>:6080/"
What is the context in which this url is used? Is this a JNDI provider URL? Details please.
Remoting connections are point-to-point connections and don't magically failover if the target location becomes unavailable. What exactly is your expectation here?
I still don't understand exactly what aspect of your system is "clustered". I also don't understand which connections you expect to failover. Which (and what kind of) components are clustered? Do your logs indicate proper cluster formation? What do your logs indicate when you test these failure scenarios?
> Wildfly cluster's failover functionality doesn't work as expected
> -----------------------------------------------------------------
>
> Key: WFLY-6781
> URL: https://issues.jboss.org/browse/WFLY-6781
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 8.2.0.Final
> Reporter: Preeta Kuruvilla
> Assignee: Paul Ferraro
> Priority: Blocker
>
> Following are the testing scenarios we did and the outcome:-
> 1. Network disabling on a VM for testing failover – Not working for both Linux and Windows environment.
> 2. Power off of a VM using VMware client for testing failover – Is working on Linux environment but not working on windows environment.
> 3. Ctrl + C method to stop services on a node for testing failover – works on both linux and windows environment
> 4. Stopping server running on Node /VM using Admin Console for testing failover - works on both linux and windows environment.
> Jgroups subsystem configuration in domain.xml we have is below:-
> <subsystem xmlns="urn:jboss:domain:jgroups:2.0" default-stack="udp">
> <stack name="udp">
> <transport type="UDP" socket-binding="jgroups-udp"/>
> <protocol type="PING"/>
> <protocol type="MERGE3"/>
> <protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
> <protocol type="FD_ALL"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="UFC"/>
> <protocol type="MFC"/>
> <protocol type="FRAG2"/>
> <protocol type="RSVP"/>
> </stack>
> <stack name="tcp">
> <transport type="TCP" socket-binding="jgroups-tcp"/>
> <protocol type="MPING" socket-binding="jgroups-mping"/>
> <protocol type="MERGE2"/>
> <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
> <protocol type="FD"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="MFC"/>
> <protocol type="FRAG2"/>
> <protocol type="RSVP"/>
> </stack>
> </subsystem>
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months
[JBoss JIRA] (WFLY-6781) Wildfly cluster's failover functionality doesn't work as expected
by Preeta Kuruvilla (JIRA)
[ https://issues.jboss.org/browse/WFLY-6781?page=com.atlassian.jira.plugin.... ]
Preeta Kuruvilla edited comment on WFLY-6781 at 6/29/16 10:18 AM:
------------------------------------------------------------------
The cluster that we have is a domain managed cluster where we have domain.xml and host.xml configured on Node1 and only host.xml configured on Node2. The jgroups is a subsystem in the domain.xml for the profile "ha".
Regarding our application, we have 2 components - RC.war and SL.war. The JMS is configured on SL. Only component RC is clustered. SL is not clustered.
In domain.xml we have:-
<server-groups>
<server-group name="main-server-group" profile="ha">
<socket-binding-group ref="ha-sockets"/>
<deployments>
<deployment name="RC.war" runtime-name="RC.war"/>
</deployments>
</server-group>
<server-group name="other-server-group" profile="ha">
<socket-binding-group ref="standard-sockets"/>
<paths>
<path name="jboss.server.log.dir.SL" path="jboss.server.log.dir" />
</paths>
<deployments>
<deployment name="SL.war" runtime-name="SL.war"/>
</deployments>
</server-group>
</server-groups>
***********************
In host.xml we have:-
Node 1 has - 2 server instances- RC and SL.
<servers>
<server name="server-host1-RC" group="main-server-group" auto-start="true">
<jvm name="default">
............
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
<server name="server-host1-SL" group="other-server-group" auto-start="true">
<jvm name="default">
........
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="standard-sockets" port-offset="0"/>
</server>
</servers>
Node 2 : has only one server instance and that has RC
<servers>
<server name="server-host2-RC" group="main-server-group" auto-start="true">
<jvm name="default">
.............
<jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
</servers>
Now when I say its not working as expected, when we test failover, I mean - the communication of RC and SL is broken. After the communication is broken, we have currently a workaround -restarting the RC and SL on cluster, which makes it work as before.
By the way RC communicates with SL on port 6080 via http.
RC communicates with JMS on SL (JMS JNDI url:http-remoting://<ip address of SL which is Node1>:6080/)
Just a note:- everything is working properly in production if we don't try the failover testing scenario by disabling network or powering off etc.
Let me know if you need any other info.
Thanks,
Preeta
was (Author: kpreeta12):
The cluster that we have is a domain managed cluster where we have domain.xml and host.xml configured on Node1 and only host.xml configured on Node2. The jgroups is a subsystem in the domain.xml for the profile "ha".
Regarding our application, we have 2 components - RC.war and SL.war. The JMS is configured on SL. Only component RC is clustered. SL is not clustered.
In domain.xml we have:-
<server-groups>
<server-group name="main-server-group" profile="ha">
<socket-binding-group ref="ha-sockets"/>
<deployments>
<deployment name="RC.war" runtime-name="RC.war"/>
</deployments>
</server-group>
<server-group name="other-server-group" profile="ha">
<socket-binding-group ref="standard-sockets"/>
<paths>
<path name="jboss.server.log.dir.SL" path="jboss.server.log.dir" />
</paths>
<deployments>
<deployment name="SL.war" runtime-name="SL.war"/>
</deployments>
</server-group>
</server-groups>
***********************
In host.xml we have:-
Node 1 has - 2 server instances- RC and SL.
<servers>
<server name="server-host1-RC" group="main-server-group" auto-start="true">
<jvm name="default">
............
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
<server name="server-host1-SL" group="other-server-group" auto-start="true">
<jvm name="default">
........
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="standard-sockets" port-offset="0"/>
</server>
</servers>
Node 2 : has only one server instance and that has RC
<servers>
<server name="server-host2-RC" group="main-server-group" auto-start="true">
<jvm name="default">
.............
<jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
</servers>
Now when I say its not working as expected, when we test failover, I mean - the communication of RC and SL is broken. After the communication is broken, we have currently a workaround -restarting the RC and SL on cluster, which makes it work as before.
By the way RC communicates with SL on port 6080 via http.
RC communicates with JMS on SL (JMS JNDI url:http-remoting://<ip address of SL which is Node1>:6080/)
Just a note:- everything is working properly in production if we don't try the failover testing scenario by disabling network or powering off etc.
Let me know if you need any other info.
Thanks,
Preeta
> Wildfly cluster's failover functionality doesn't work as expected
> -----------------------------------------------------------------
>
> Key: WFLY-6781
> URL: https://issues.jboss.org/browse/WFLY-6781
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 8.2.0.Final
> Reporter: Preeta Kuruvilla
> Assignee: Paul Ferraro
> Priority: Blocker
>
> Following are the testing scenarios we did and the outcome:-
> 1. Network disabling on a VM for testing failover – Not working for both Linux and Windows environment.
> 2. Power off of a VM using VMware client for testing failover – Is working on Linux environment but not working on windows environment.
> 3. Ctrl + C method to stop services on a node for testing failover – works on both linux and windows environment
> 4. Stopping server running on Node /VM using Admin Console for testing failover - works on both linux and windows environment.
> Jgroups subsystem configuration in domain.xml we have is below:-
> <subsystem xmlns="urn:jboss:domain:jgroups:2.0" default-stack="udp">
> <stack name="udp">
> <transport type="UDP" socket-binding="jgroups-udp"/>
> <protocol type="PING"/>
> <protocol type="MERGE3"/>
> <protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
> <protocol type="FD_ALL"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="UFC"/>
> <protocol type="MFC"/>
> <protocol type="FRAG2"/>
> <protocol type="RSVP"/>
> </stack>
> <stack name="tcp">
> <transport type="TCP" socket-binding="jgroups-tcp"/>
> <protocol type="MPING" socket-binding="jgroups-mping"/>
> <protocol type="MERGE2"/>
> <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
> <protocol type="FD"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="MFC"/>
> <protocol type="FRAG2"/>
> <protocol type="RSVP"/>
> </stack>
> </subsystem>
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months
[JBoss JIRA] (WFLY-6781) Wildfly cluster's failover functionality doesn't work as expected
by Preeta Kuruvilla (JIRA)
[ https://issues.jboss.org/browse/WFLY-6781?page=com.atlassian.jira.plugin.... ]
Preeta Kuruvilla edited comment on WFLY-6781 at 6/29/16 10:18 AM:
------------------------------------------------------------------
The cluster that we have is a domain managed cluster where we have domain.xml and host.xml configured on Node1 and only host.xml configured on Node2. The jgroups is a subsystem in the domain.xml for the profile "ha".
Regarding our application, we have 2 components - RC.war and SL.war. The JMS is configured on SL. Only component RC is clustered. SL is not clustered.
In domain.xml we have:-
<server-groups>
<server-group name="main-server-group" profile="ha">
<socket-binding-group ref="ha-sockets"/>
<deployments>
<deployment name="RC.war" runtime-name="RC.war"/>
</deployments>
</server-group>
<server-group name="other-server-group" profile="ha">
<socket-binding-group ref="standard-sockets"/>
<paths>
<path name="jboss.server.log.dir.SL" path="jboss.server.log.dir" />
</paths>
<deployments>
<deployment name="SL.war" runtime-name="SL.war"/>
</deployments>
</server-group>
</server-groups>
***********************
In host.xml we have:-
Node 1 has - 2 server instances- RC and SL.
<servers>
<server name="server-host1-RC" group="main-server-group" auto-start="true">
<jvm name="default">
............
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
<server name="server-host1-SL" group="other-server-group" auto-start="true">
<jvm name="default">
........
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="standard-sockets" port-offset="0"/>
</server>
</servers>
Node 2 : has only one server instance and that has RC
<servers>
<server name="server-host2-RC" group="main-server-group" auto-start="true">
<jvm name="default">
.............
<jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
</servers>
Now when I say its not working as expected, when we test failover, I mean - the communication of RC and SL is broken. After the communication is broken, we have currently a workaround -restarting the RC and SL on cluster, which makes it work as before.
By the way RC communicates with SL on port 6080 via http.
RC communicates with JMS on SL (JMS JNDI url:http-remoting://<ip address of SL which is Node1>:6080/)
Just a note:- everything is working properly in production if we don't try the failover testing scenario by disabling network or powering off etc.
Let me know if you need any other info.
Thanks,
Preeta
was (Author: kpreeta12):
The cluster that we have is a domain managed cluster where we have domain.xml and host.xml configured on Node1 and only host.xml configured on Node2. The jgroups is a subsystem in the domain.xml for the profile "ha".
Regarding our application, we have 2 components - RC.war and SL.war. The JMS is configured on SL. Only component RC is clustered. SL is not clustered.
Node 1 has - 2 server instances- RC and SL.
<servers>
<server name="server-host1-RC" group="main-server-group" auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<!--<option value="-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n"/>-->
<option value="-XX:CompileCommand=exclude,com/newscale/bfw/signon/filters,AuthenticationFilter"/>
<option value="-XX:CompileCommand=exclude,org/apache/xml/dtm/ref/sax2dtm/SAX2DTM,startElement"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/apache/xpath/compiler/XPathParser,UnionExpr"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
<server name="server-host1-SL" group="other-server-group" auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<option value="-server"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="standard-sockets" port-offset="0"/>
</server>
</servers>
Node 2 : has only one server instance and that has RC
<servers>
<server name="server-host2-RC" group="main-server-group" auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<!--<option value="-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n"/>-->
<option value="-XX:CompileCommand=exclude,com/newscale/bfw/signon/filters,AuthenticationFilter"/>
<option value="-XX:CompileCommand=exclude,org/apache/xml/dtm/ref/sax2dtm/SAX2DTM,startElement"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/apache/xpath/compiler/XPathParser,UnionExpr"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
</servers>
Now when I say its not working as expected, when we test failover, I mean - the communication of RC and SL is broken. After the communication is broken, we have currently a workaround -restarting the RC and SL on cluster, which makes it work as before.
By the way RC communicates with SL on port 6080 via http.
RC communicates with JMS on SL (JMS JNDI url:http-remoting://<ip address of SL which is Node1>:6080/)
Just a note:- everything is working properly in production if we don't try the failover testing scenario by disabling network or powering off etc.
Let me know if you need any other info.
Thanks,
Preeta
> Wildfly cluster's failover functionality doesn't work as expected
> -----------------------------------------------------------------
>
> Key: WFLY-6781
> URL: https://issues.jboss.org/browse/WFLY-6781
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 8.2.0.Final
> Reporter: Preeta Kuruvilla
> Assignee: Paul Ferraro
> Priority: Blocker
>
> Following are the testing scenarios we did and the outcome:-
> 1. Network disabling on a VM for testing failover – Not working for both Linux and Windows environment.
> 2. Power off of a VM using VMware client for testing failover – Is working on Linux environment but not working on windows environment.
> 3. Ctrl + C method to stop services on a node for testing failover – works on both linux and windows environment
> 4. Stopping server running on Node /VM using Admin Console for testing failover - works on both linux and windows environment.
> Jgroups subsystem configuration in domain.xml we have is below:-
> <subsystem xmlns="urn:jboss:domain:jgroups:2.0" default-stack="udp">
> <stack name="udp">
> <transport type="UDP" socket-binding="jgroups-udp"/>
> <protocol type="PING"/>
> <protocol type="MERGE3"/>
> <protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
> <protocol type="FD_ALL"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="UFC"/>
> <protocol type="MFC"/>
> <protocol type="FRAG2"/>
> <protocol type="RSVP"/>
> </stack>
> <stack name="tcp">
> <transport type="TCP" socket-binding="jgroups-tcp"/>
> <protocol type="MPING" socket-binding="jgroups-mping"/>
> <protocol type="MERGE2"/>
> <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
> <protocol type="FD"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="MFC"/>
> <protocol type="FRAG2"/>
> <protocol type="RSVP"/>
> </stack>
> </subsystem>
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months
[JBoss JIRA] (WFLY-6781) Wildfly cluster's failover functionality doesn't work as expected
by Preeta Kuruvilla (JIRA)
[ https://issues.jboss.org/browse/WFLY-6781?page=com.atlassian.jira.plugin.... ]
Preeta Kuruvilla edited comment on WFLY-6781 at 6/29/16 9:56 AM:
-----------------------------------------------------------------
The cluster that we have is a domain managed cluster where we have domain.xml and host.xml configured on Node1 and only host.xml configured on Node2. The jgroups is a subsystem in the domain.xml for the profile "ha".
Regarding our application, we have 2 components - RC.war and SL.war. The JMS is configured on SL. Only component RC is clustered. SL is not clustered.
Node 1 has - 2 server instances- RC and SL.
<servers>
<server name="server-host1-RC" group="main-server-group" auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<!--<option value="-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n"/>-->
<option value="-XX:CompileCommand=exclude,com/newscale/bfw/signon/filters,AuthenticationFilter"/>
<option value="-XX:CompileCommand=exclude,org/apache/xml/dtm/ref/sax2dtm/SAX2DTM,startElement"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/apache/xpath/compiler/XPathParser,UnionExpr"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
<server name="server-host1-SL" group="other-server-group" auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<option value="-server"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="standard-sockets" port-offset="0"/>
</server>
</servers>
Node 2 : has only one server instance and that has RC
<servers>
<server name="server-host2-RC" group="main-server-group" auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<!--<option value="-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n"/>-->
<option value="-XX:CompileCommand=exclude,com/newscale/bfw/signon/filters,AuthenticationFilter"/>
<option value="-XX:CompileCommand=exclude,org/apache/xml/dtm/ref/sax2dtm/SAX2DTM,startElement"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/apache/xpath/compiler/XPathParser,UnionExpr"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
</servers>
Now when I say its not working as expected, when we test failover, I mean - the communication of RC and SL is broken. After the communication is broken, we have currently a workaround -restarting the RC and SL on cluster, which makes it work as before.
By the way RC communicates with SL on port 6080 via http.
RC communicates with JMS on SL (JMS JNDI url:http-remoting://<ip address of SL which is Node1>:6080/)
Just a note:- everything is working properly in production if we don't try the failover testing scenario by disabling network or powering off etc.
Let me know if you need any other info.
Thanks,
Preeta
was (Author: kpreeta12):
The cluster that we have is a domain managed cluster where we have domain.xml and host.xml configured on Node1 and only host.xml configured on Node2. The jgroups is a subsystem in the domain.xml for the profile "ha".
Regarding our application, we have 2 components - RC.war and SL.war. The JMS is configured on SL. Only component RC is clustered. SL is not clustered.
Node 1 has - 2 server instances- RC and SL.
<servers>
<server name="server-host1-RC" group="main-server-group" auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<!--<option value="-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n"/>-->
<option value="-XX:CompileCommand=exclude,com/newscale/bfw/signon/filters,AuthenticationFilter"/>
<option value="-XX:CompileCommand=exclude,org/apache/xml/dtm/ref/sax2dtm/SAX2DTM,startElement"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/apache/xpath/compiler/XPathParser,UnionExpr"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
<server name="server-host1-SL" group="other-server-group" auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<option value="-server"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="standard-sockets" port-offset="0"/>
</server>
</servers>
Node 2 : has only one server instance and that has RC
<servers>
<server name="server-host2-RC" group="main-server-group" auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<!--<option value="-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n"/>-->
<option value="-XX:CompileCommand=exclude,com/newscale/bfw/signon/filters,AuthenticationFilter"/>
<option value="-XX:CompileCommand=exclude,org/apache/xml/dtm/ref/sax2dtm/SAX2DTM,startElement"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/apache/xpath/compiler/XPathParser,UnionExpr"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
</servers>
Now when I say its not working as expected, when we test failover, I mean - the communication of RC and SL is broken.
By the way RC communicates remotely with SL using the below url :
http-remoting://<ip address of SL which is Node1>:6080/
Just a note:- everything is working properly in production if we don't try disabling network or powering off etc.
Let me know if you need any other info.
Thanks,
Preeta
> Wildfly cluster's failover functionality doesn't work as expected
> -----------------------------------------------------------------
>
> Key: WFLY-6781
> URL: https://issues.jboss.org/browse/WFLY-6781
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 8.2.0.Final
> Reporter: Preeta Kuruvilla
> Assignee: Paul Ferraro
> Priority: Blocker
>
> Following are the testing scenarios we did and the outcome:-
> 1. Network disabling on a VM for testing failover – Not working for both Linux and Windows environment.
> 2. Power off of a VM using VMware client for testing failover – Is working on Linux environment but not working on windows environment.
> 3. Ctrl + C method to stop services on a node for testing failover – works on both linux and windows environment
> 4. Stopping server running on Node /VM using Admin Console for testing failover - works on both linux and windows environment.
> Jgroups subsystem configuration in domain.xml we have is below:-
> <subsystem xmlns="urn:jboss:domain:jgroups:2.0" default-stack="udp">
> <stack name="udp">
> <transport type="UDP" socket-binding="jgroups-udp"/>
> <protocol type="PING"/>
> <protocol type="MERGE3"/>
> <protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
> <protocol type="FD_ALL"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="UFC"/>
> <protocol type="MFC"/>
> <protocol type="FRAG2"/>
> <protocol type="RSVP"/>
> </stack>
> <stack name="tcp">
> <transport type="TCP" socket-binding="jgroups-tcp"/>
> <protocol type="MPING" socket-binding="jgroups-mping"/>
> <protocol type="MERGE2"/>
> <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
> <protocol type="FD"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="MFC"/>
> <protocol type="FRAG2"/>
> <protocol type="RSVP"/>
> </stack>
> </subsystem>
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 10 months