[jboss-jira] [JBoss JIRA] (WFLY-6781) Wildfly cluster's failover functionality doesn't work as expected
Preeta Kuruvilla (JIRA)
issues at jboss.org
Wed Jun 29 10:19:00 EDT 2016
[ https://issues.jboss.org/browse/WFLY-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258291#comment-13258291 ]
Preeta Kuruvilla edited comment on WFLY-6781 at 6/29/16 10:18 AM:
------------------------------------------------------------------
The cluster that we have is a domain managed cluster where we have domain.xml and host.xml configured on Node1 and only host.xml configured on Node2. The jgroups is a subsystem in the domain.xml for the profile "ha".
Regarding our application, we have 2 components - RC.war and SL.war. The JMS is configured on SL. Only component RC is clustered. SL is not clustered.
In domain.xml we have:-
<server-groups>
<server-group name="main-server-group" profile="ha">
<socket-binding-group ref="ha-sockets"/>
<deployments>
<deployment name="RC.war" runtime-name="RC.war"/>
</deployments>
</server-group>
<server-group name="other-server-group" profile="ha">
<socket-binding-group ref="standard-sockets"/>
<paths>
<path name="jboss.server.log.dir.SL" path="jboss.server.log.dir" />
</paths>
<deployments>
<deployment name="SL.war" runtime-name="SL.war"/>
</deployments>
</server-group>
</server-groups>
***********************
In host.xml we have:-
Node 1 has - 2 server instances- RC and SL.
<servers>
<server name="server-host1-RC" group="main-server-group" auto-start="true">
<jvm name="default">
............
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
<server name="server-host1-SL" group="other-server-group" auto-start="true">
<jvm name="default">
........
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="standard-sockets" port-offset="0"/>
</server>
</servers>
Node 2 : has only one server instance and that has RC
<servers>
<server name="server-host2-RC" group="main-server-group" auto-start="true">
<jvm name="default">
.............
<jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
</servers>
Now when I say its not working as expected, when we test failover, I mean - the communication of RC and SL is broken. After the communication is broken, we have currently a workaround -restarting the RC and SL on cluster, which makes it work as before.
By the way RC communicates with SL on port 6080 via http.
RC communicates with JMS on SL (JMS JNDI url:http-remoting://<ip address of SL which is Node1>:6080/)
Just a note:- everything is working properly in production if we don't try the failover testing scenario by disabling network or powering off etc.
Let me know if you need any other info.
Thanks,
Preeta
was (Author: kpreeta12):
The cluster that we have is a domain managed cluster where we have domain.xml and host.xml configured on Node1 and only host.xml configured on Node2. The jgroups is a subsystem in the domain.xml for the profile "ha".
Regarding our application, we have 2 components - RC.war and SL.war. The JMS is configured on SL. Only component RC is clustered. SL is not clustered.
Node 1 has - 2 server instances- RC and SL.
<servers>
<server name="server-host1-RC" group="main-server-group" auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<!--<option value="-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n"/>-->
<option value="-XX:CompileCommand=exclude,com/newscale/bfw/signon/filters,AuthenticationFilter"/>
<option value="-XX:CompileCommand=exclude,org/apache/xml/dtm/ref/sax2dtm/SAX2DTM,startElement"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/apache/xpath/compiler/XPathParser,UnionExpr"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
<server name="server-host1-SL" group="other-server-group" auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<option value="-server"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="standard-sockets" port-offset="0"/>
</server>
</servers>
Node 2 : has only one server instance and that has RC
<servers>
<server name="server-host2-RC" group="main-server-group" auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<!--<option value="-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n"/>-->
<option value="-XX:CompileCommand=exclude,com/newscale/bfw/signon/filters,AuthenticationFilter"/>
<option value="-XX:CompileCommand=exclude,org/apache/xml/dtm/ref/sax2dtm/SAX2DTM,startElement"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option value="-XX:CompileCommand=exclude,org/apache/xpath/compiler/XPathParser,UnionExpr"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets" port-offset="0"/>
</server>
</servers>
Now when I say its not working as expected, when we test failover, I mean - the communication of RC and SL is broken. After the communication is broken, we have currently a workaround -restarting the RC and SL on cluster, which makes it work as before.
By the way RC communicates with SL on port 6080 via http.
RC communicates with JMS on SL (JMS JNDI url:http-remoting://<ip address of SL which is Node1>:6080/)
Just a note:- everything is working properly in production if we don't try the failover testing scenario by disabling network or powering off etc.
Let me know if you need any other info.
Thanks,
Preeta
> Wildfly cluster's failover functionality doesn't work as expected
> -----------------------------------------------------------------
>
> Key: WFLY-6781
> URL: https://issues.jboss.org/browse/WFLY-6781
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 8.2.0.Final
> Reporter: Preeta Kuruvilla
> Assignee: Paul Ferraro
> Priority: Blocker
>
> Following are the testing scenarios we did and the outcome:-
> 1. Network disabling on a VM for testing failover – Not working for both Linux and Windows environment.
> 2. Power off of a VM using VMware client for testing failover – Is working on Linux environment but not working on windows environment.
> 3. Ctrl + C method to stop services on a node for testing failover – works on both linux and windows environment
> 4. Stopping server running on Node /VM using Admin Console for testing failover - works on both linux and windows environment.
> Jgroups subsystem configuration in domain.xml we have is below:-
> <subsystem xmlns="urn:jboss:domain:jgroups:2.0" default-stack="udp">
> <stack name="udp">
> <transport type="UDP" socket-binding="jgroups-udp"/>
> <protocol type="PING"/>
> <protocol type="MERGE3"/>
> <protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
> <protocol type="FD_ALL"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="UFC"/>
> <protocol type="MFC"/>
> <protocol type="FRAG2"/>
> <protocol type="RSVP"/>
> </stack>
> <stack name="tcp">
> <transport type="TCP" socket-binding="jgroups-tcp"/>
> <protocol type="MPING" socket-binding="jgroups-mping"/>
> <protocol type="MERGE2"/>
> <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
> <protocol type="FD"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="MFC"/>
> <protocol type="FRAG2"/>
> <protocol type="RSVP"/>
> </stack>
> </subsystem>
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
More information about the jboss-jira
mailing list