[jboss-jira] [JBoss JIRA] (WFLY-6781) Wildfly cluster's failover functionality doesn't work as expected

Preeta Kuruvilla (JIRA) issues at jboss.org
Wed Jun 29 11:35:00 EDT 2016


    [ https://issues.jboss.org/browse/WFLY-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258714#comment-13258714 ] 

Preeta Kuruvilla commented on WFLY-6781:
----------------------------------------

Inline please find my comments.

I'm confused, as you seem to be using the terms RC and SL to interchangeably refer to web applications (e.g. RC.war and SL.rar) and a mysterious component with a remoting connection (e.g. RC communicates remotely with SL). What kind of component is this? How is the remoting connection used?

{color:#205081}Preeta: RC.war and SL.war are 2 applications we deploy on Wildfly. Both of them work together in our product. RC is clustered and hence it is deployed on Node1 and Node2 while SL(non-clustered) exist only on Node1. RC on Node1 and Node2 communicates with SL on Node1 through JMS which is configured on SL. The JMS Jndi URL to connect is http-remoting://ip-address:6080.{color}

server-host1-RC and server-host1-SL belong to different server groups - but I don't know how these server groups are configured.
You seem to be managing the server instance of each node as its own domain, is that right? Why is that? How do the configurations of these domains differ?

{color:#205081}Preeta: As I said, RC is clustered while SL is non-clustered and both are different components, hence they belong to different server groups although managed by one domain. The Domain controller is just one and it is in Node1. {color}

"By the way RC communicates remotely with JMS on SL using the below url :
http-remoting://<ip address of SL which is Node1>:6080/"
What is the context in which this url is used? Is this a JNDI provider URL? Details please.

{color:#205081}Preeta: Yes, it is JMS JNDI provider URL. As I said above, the JMS is configured on SL and RC connects to JMS using this url.{color}

Remoting connections are point-to-point connections and don't magically failover if the target location becomes unavailable. What exactly is your expectation here?

{color:#205081}Preeta: When network on Node2 is disabled, then Node1 should gracefully take over and RC in Node1 should continue to connect to JMS/SL on Node1. But what is happening is  that the SL(on Node1) is not processing the request that comes from RC in Node1 {color}


I still don't understand exactly what aspect of your system is "clustered". I also don't understand which connections you expect to failover. Which (and what kind of) components are clustered? Do your logs indicate proper cluster formation? What do your logs indicate when you test these failure scenarios?

{color:#205081}Preeta: Only RC is clustered and not the SL. I have answered this above. Yes our logs indicate proper cluster formation. On disabling network on Node2, the service on Node2 does not come out of execution. It throws error saying "connection got disconnected", which is correct. The issue is that the Node1 under this condition does not work as expected, meaning the SL is not processing the request that comes from RC in Node1. {color}

{color:#205081}Note: Things work as expected when we give Ctrl +C on Node2 or else when we stop the server on Node2 from the admin console.{color}

> Wildfly cluster's failover functionality doesn't work as expected
> -----------------------------------------------------------------
>
>                 Key: WFLY-6781
>                 URL: https://issues.jboss.org/browse/WFLY-6781
>             Project: WildFly
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 8.2.0.Final
>            Reporter: Preeta Kuruvilla
>            Assignee: Paul Ferraro
>            Priority: Blocker
>
> Following are the testing scenarios we did and the outcome:-
> 1. Network disabling on a VM for testing failover – Not working for both Linux and Windows environment.
> 2. Power off of a VM using VMware client  for testing failover – Is working on Linux environment but not working on windows environment.
> 3. Ctrl + C method to stop services on a node for testing failover – works on both linux and windows environment
> 4. Stopping server running on Node /VM using Admin Console  for testing failover  - works on both linux and windows environment.
> Jgroups subsystem configuration in domain.xml we have is below:-
> <subsystem xmlns="urn:jboss:domain:jgroups:2.0" default-stack="udp">
>                 <stack name="udp">
>                     <transport type="UDP" socket-binding="jgroups-udp"/>
>                     <protocol type="PING"/>
>                     <protocol type="MERGE3"/>
>                     <protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
>                     <protocol type="FD_ALL"/>
>                     <protocol type="VERIFY_SUSPECT"/>
>                     <protocol type="pbcast.NAKACK2"/>
>                     <protocol type="UNICAST3"/>
>                     <protocol type="pbcast.STABLE"/>
>                     <protocol type="pbcast.GMS"/>
>                     <protocol type="UFC"/>
>                     <protocol type="MFC"/>
>                     <protocol type="FRAG2"/>
>                     <protocol type="RSVP"/>
>                 </stack>
>                 <stack name="tcp">
>                     <transport type="TCP" socket-binding="jgroups-tcp"/>
>                     <protocol type="MPING" socket-binding="jgroups-mping"/>
>                     <protocol type="MERGE2"/>
>                     <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
>                     <protocol type="FD"/>
>                     <protocol type="VERIFY_SUSPECT"/>
>                     <protocol type="pbcast.NAKACK2"/>
>                     <protocol type="UNICAST3"/>
>                     <protocol type="pbcast.STABLE"/>
>                     <protocol type="pbcast.GMS"/>
>                     <protocol type="MFC"/>
>                     <protocol type="FRAG2"/>
>                     <protocol type="RSVP"/>
>                 </stack>
>             </subsystem>



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)



More information about the jboss-jira mailing list