[
https://issues.jboss.org/browse/WFLY-6781?page=com.atlassian.jira.plugin....
]
Preeta Kuruvilla edited comment on WFLY-6781 at 6/29/16 10:18 AM:
------------------------------------------------------------------
The cluster that we have is a domain managed cluster where we have domain.xml and host.xml
configured on Node1 and only host.xml configured on Node2. The jgroups is a subsystem in
the domain.xml for the profile "ha".
Regarding our application, we have 2 components - RC.war and SL.war. The JMS is configured
on SL. Only component RC is clustered. SL is not clustered.
In domain.xml we have:-
<server-groups>
<server-group name="main-server-group" profile="ha">
<socket-binding-group ref="ha-sockets"/>
<deployments>
<deployment name="RC.war"
runtime-name="RC.war"/>
</deployments>
</server-group>
<server-group name="other-server-group" profile="ha">
<socket-binding-group ref="standard-sockets"/>
<paths>
<path name="jboss.server.log.dir.SL"
path="jboss.server.log.dir" />
</paths>
<deployments>
<deployment name="SL.war"
runtime-name="SL.war"/>
</deployments>
</server-group>
</server-groups>
***********************
In host.xml we have:-
Node 1 has - 2 server instances- RC and SL.
<servers>
<server name="server-host1-RC" group="main-server-group"
auto-start="true">
<jvm name="default">
............
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets"
port-offset="0"/>
</server>
<server name="server-host1-SL" group="other-server-group"
auto-start="true">
<jvm name="default">
........
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="standard-sockets"
port-offset="0"/>
</server>
</servers>
Node 2 : has only one server instance and that has RC
<servers>
<server name="server-host2-RC" group="main-server-group"
auto-start="true">
<jvm name="default">
.............
<jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets"
port-offset="0"/>
</server>
</servers>
Now when I say its not working as expected, when we test failover, I mean - the
communication of RC and SL is broken. After the communication is broken, we have currently
a workaround -restarting the RC and SL on cluster, which makes it work as before.
By the way RC communicates with SL on port 6080 via http.
RC communicates with JMS on SL (JMS JNDI url:http-remoting://<ip address of SL which is
Node1>:6080/)
Just a note:- everything is working properly in production if we don't try the
failover testing scenario by disabling network or powering off etc.
Let me know if you need any other info.
Thanks,
Preeta
was (Author: kpreeta12):
The cluster that we have is a domain managed cluster where we have domain.xml and host.xml
configured on Node1 and only host.xml configured on Node2. The jgroups is a subsystem in
the domain.xml for the profile "ha".
Regarding our application, we have 2 components - RC.war and SL.war. The JMS is configured
on SL. Only component RC is clustered. SL is not clustered.
Node 1 has - 2 server instances- RC and SL.
<servers>
<server name="server-host1-RC" group="main-server-group"
auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<!--<option
value="-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n"/>-->
<option
value="-XX:CompileCommand=exclude,com/newscale/bfw/signon/filters,AuthenticationFilter"/>
<option
value="-XX:CompileCommand=exclude,org/apache/xml/dtm/ref/sax2dtm/SAX2DTM,startElement"/>
<option
value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option
value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option
value="-XX:CompileCommand=exclude,org/apache/xpath/compiler/XPathParser,UnionExpr"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets"
port-offset="0"/>
</server>
<server name="server-host1-SL" group="other-server-group"
auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<option value="-server"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="standard-sockets"
port-offset="0"/>
</server>
</servers>
Node 2 : has only one server instance and that has RC
<servers>
<server name="server-host2-RC" group="main-server-group"
auto-start="true">
<jvm name="default">
<heap size="2048m" max-size="2048m"/>
<permgen size="512m" max-size="512m"/>
<jvm-options>
<!--<option
value="-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n"/>-->
<option
value="-XX:CompileCommand=exclude,com/newscale/bfw/signon/filters,AuthenticationFilter"/>
<option
value="-XX:CompileCommand=exclude,org/apache/xml/dtm/ref/sax2dtm/SAX2DTM,startElement"/>
<option
value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option
value="-XX:CompileCommand=exclude,org/exolab/castor/xml/Marshaller,marshal"/>
<option
value="-XX:CompileCommand=exclude,org/apache/xpath/compiler/XPathParser,UnionExpr"/>
</jvm-options>
</jvm>
<socket-bindings socket-binding-group="ha-sockets"
port-offset="0"/>
</server>
</servers>
Now when I say its not working as expected, when we test failover, I mean - the
communication of RC and SL is broken. After the communication is broken, we have currently
a workaround -restarting the RC and SL on cluster, which makes it work as before.
By the way RC communicates with SL on port 6080 via http.
RC communicates with JMS on SL (JMS JNDI url:http-remoting://<ip address of SL which is
Node1>:6080/)
Just a note:- everything is working properly in production if we don't try the
failover testing scenario by disabling network or powering off etc.
Let me know if you need any other info.
Thanks,
Preeta
Wildfly cluster's failover functionality doesn't work as
expected
-----------------------------------------------------------------
Key: WFLY-6781
URL:
https://issues.jboss.org/browse/WFLY-6781
Project: WildFly
Issue Type: Bug
Components: Clustering
Affects Versions: 8.2.0.Final
Reporter: Preeta Kuruvilla
Assignee: Paul Ferraro
Priority: Blocker
Following are the testing scenarios we did and the outcome:-
1. Network disabling on a VM for testing failover – Not working for both Linux and
Windows environment.
2. Power off of a VM using VMware client for testing failover – Is working on Linux
environment but not working on windows environment.
3. Ctrl + C method to stop services on a node for testing failover – works on both linux
and windows environment
4. Stopping server running on Node /VM using Admin Console for testing failover - works
on both linux and windows environment.
Jgroups subsystem configuration in domain.xml we have is below:-
<subsystem xmlns="urn:jboss:domain:jgroups:2.0"
default-stack="udp">
<stack name="udp">
<transport type="UDP"
socket-binding="jgroups-udp"/>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"
socket-binding="jgroups-udp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG2"/>
<protocol type="RSVP"/>
</stack>
<stack name="tcp">
<transport type="TCP"
socket-binding="jgroups-tcp"/>
<protocol type="MPING"
socket-binding="jgroups-mping"/>
<protocol type="MERGE2"/>
<protocol type="FD_SOCK"
socket-binding="jgroups-tcp-fd"/>
<protocol type="FD"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG2"/>
<protocol type="RSVP"/>
</stack>
</subsystem>
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)