[
https://issues.jboss.org/browse/WFLY-6781?page=com.atlassian.jira.plugin....
]
Preeta Kuruvilla commented on WFLY-6781:
----------------------------------------
Inline please find my comments.
I'm confused, as you seem to be using the terms RC and SL to interchangeably refer to
web applications (e.g. RC.war and SL.rar) and a mysterious component with a remoting
connection (e.g. RC communicates remotely with SL). What kind of component is this? How is
the remoting connection used?
{color:#205081}Preeta: RC.war and SL.war are 2 applications we deploy on Wildfly. Both of
them work together in our product. RC is clustered and hence it is deployed on Node1 and
Node2 while SL(non-clustered) exist only on Node1. RC on Node1 and Node2 communicates with
SL on Node1 through JMS which is configured on SL. The JMS Jndi URL to connect is
http-remoting://ip-address:6080.{color}
server-host1-RC and server-host1-SL belong to different server groups - but I don't
know how these server groups are configured.
You seem to be managing the server instance of each node as its own domain, is that right?
Why is that? How do the configurations of these domains differ?
{color:#205081}Preeta: As I said, RC is clustered while SL is non-clustered and both are
different components, hence they belong to different server groups although managed by one
domain. The Domain controller is just one and it is in Node1. {color}
"By the way RC communicates remotely with JMS on SL using the below url :
http-remoting://<ip address of SL which is Node1>:6080/"
What is the context in which this url is used? Is this a JNDI provider URL? Details
please.
{color:#205081}Preeta: Yes, it is JMS JNDI provider URL. As I said above, the JMS is
configured on SL and RC connects to JMS using this url.{color}
Remoting connections are point-to-point connections and don't magically failover if
the target location becomes unavailable. What exactly is your expectation here?
{color:#205081}Preeta: When network on Node2 is disabled, then Node1 should gracefully
take over and RC in Node1 should continue to connect to JMS/SL on Node1. But what is
happening is that the SL(on Node1) is not processing the request that comes from RC in
Node1 {color}
I still don't understand exactly what aspect of your system is "clustered".
I also don't understand which connections you expect to failover. Which (and what kind
of) components are clustered? Do your logs indicate proper cluster formation? What do your
logs indicate when you test these failure scenarios?
{color:#205081}Preeta: Only RC is clustered and not the SL. I have answered this above.
Yes our logs indicate proper cluster formation. On disabling network on Node2, the service
on Node2 does not come out of execution. It throws error saying "connection got
disconnected", which is correct. The issue is that the Node1 under this condition
does not work as expected, meaning the SL is not processing the request that comes from RC
in Node1. {color}
{color:#205081}Note: Things work as expected when we give Ctrl +C on Node2 or else when we
stop the server on Node2 from the admin console.{color}
Wildfly cluster's failover functionality doesn't work as
expected
-----------------------------------------------------------------
Key: WFLY-6781
URL:
https://issues.jboss.org/browse/WFLY-6781
Project: WildFly
Issue Type: Bug
Components: Clustering
Affects Versions: 8.2.0.Final
Reporter: Preeta Kuruvilla
Assignee: Paul Ferraro
Priority: Blocker
Following are the testing scenarios we did and the outcome:-
1. Network disabling on a VM for testing failover – Not working for both Linux and
Windows environment.
2. Power off of a VM using VMware client for testing failover – Is working on Linux
environment but not working on windows environment.
3. Ctrl + C method to stop services on a node for testing failover – works on both linux
and windows environment
4. Stopping server running on Node /VM using Admin Console for testing failover - works
on both linux and windows environment.
Jgroups subsystem configuration in domain.xml we have is below:-
<subsystem xmlns="urn:jboss:domain:jgroups:2.0"
default-stack="udp">
<stack name="udp">
<transport type="UDP"
socket-binding="jgroups-udp"/>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"
socket-binding="jgroups-udp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG2"/>
<protocol type="RSVP"/>
</stack>
<stack name="tcp">
<transport type="TCP"
socket-binding="jgroups-tcp"/>
<protocol type="MPING"
socket-binding="jgroups-mping"/>
<protocol type="MERGE2"/>
<protocol type="FD_SOCK"
socket-binding="jgroups-tcp-fd"/>
<protocol type="FD"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG2"/>
<protocol type="RSVP"/>
</stack>
</subsystem>
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)