[JBoss JIRA] (WFLY-11682) Clustered SLSB membership anomalies when all cluster members removed

Monday, 11 February 2019

    [
https://issues.jboss.org/browse/WFLY-11682?page=com.atlassian.jira.plugin...
] 

Richard Achmatowicz edited comment on WFLY-11682 at 2/11/19 11:55 AM:
----------------------------------------------------------------------

Managed to recreate the error with Jorg's reproducer.
It looks as though nodes which are kicked out of the cluster do not get a chance to send
updates to the client before being kicked out. So the last node to leave has no
opportunity to advise the client of its leaving. Which is why the client believes the last
node to leave is still alive. In the attached logs, node3 sees removal of client mappings
entries entries for node1 and node2, but does not receive notification of its own client
mapping entries.  

The server side code has changed a lot due to the new EJBClient/Elytron/Remoting
implementation, and some EJB client related features which did appear there (in
VersionOneChannelProtocolHandler) were not ported over (to AssociationImpl); specifically
EJB client related responses to suspend and resume. These should be added back in. 

This is one place where it would be possible to send a notification to a client if the
last node was going down (set a flag indicating we are the last node and we are being
suspended; if the server is not resumed, use the flag to send a message before the
EJBServerChannel connections to the clients are shut down).  

was (Author: rachmato):
Managed to recreate the error with Jorg's reproducer.
It looks as though nodes which are kicked out of the cluster do not get a chance to send
updates to the client before being kicked out. So the last node to leave has no
opportunity to advise the client of its leaving. Which is why the client believes the last
node to leave is still alive.

The server side code has changed a lot due to the new EJBClient/Elytron/Remoting
implementation, and some EJB client related features which did appear there (in
VersionOneChannelProtocolHandler) were not ported over (to AssociationImpl); specifically
EJB client related responses to suspend and resume. These should be added back in. 

This is one place where it would be possible to send a notification to a client if the
last node was going down (set a flag indicating we are the last node and we are being
suspended; if the server is not resumed, use the flag to send a message before the
EJBServerChannel connections to the clients are shut down).  

...
 Clustered SLSB membership anomalies when all cluster members removed
 --------------------------------------------------------------------

                 Key: WFLY-11682
                 URL: https://issues.jboss.org/browse/WFLY-11682
             Project: WildFly
          Issue Type: Bug
          Components: Clustering, EJB
    Affects Versions: 15.0.1.Final
         Environment: WildFly running in an n-node cluster with an EJB client sending
requests even during the time the cluster is down.
            Reporter: Jörg Bäsner
            Assignee: Richard Achmatowicz
            Priority: Major
         Attachments: node1.txt, node12.txt, node2.txt, node3.txt, playground.zip

 This description will be based on a 3 node cluster. Cluster node 1 and 2 are configured
in the {{PROVIDER_URL}}, node 3 is not.
 The client has a custom ClusterNodeSelector implementation that is printing the
{{connectedNodes}} and the {{availableNodes}} and doing a random balancing.
 As long as all nodes are up and running the client is calling EJBs in a balanced way.
 When node1 is shut down, the client get the notification below:
 {code}...
 DEBUG (XNIO-1 task-1) [org.jboss.ejb.client.invocation] Received MODULE_UNAVAILABLE(9)
message for module /playground
 DEBUG (XNIO-1 task-4) [org.jboss.ejb.client.invocation] Received MODULE_UNAVAILABLE(9)
message for module /playground
 DEBUG (XNIO-1 task-4) [org.jboss.ejb.client.invocation] Received MODULE_UNAVAILABLE(9)
message for module /playground
 DEBUG (XNIO-1 task-1) [org.jboss.ejb.client.invocation] Received MODULE_UNAVAILABLE(9)
message for module /playground
 DEBUG (XNIO-1 task-1) [org.jboss.ejb.client.invocation] Received
CLUSTER_TOPOLOGY_NODE_REMOVAL(18) message for (cluster, node) = (ejb, node1)
 DEBUG (XNIO-1 task-1) [org.jboss.ejb.client.invocation] Received
CLUSTER_TOPOLOGY_NODE_REMOVAL(18) message for (cluster, node) = (ejb, node1)
 ...
 {code}
 Then node2 is shut down. Again the client get the information, see:
 {code}
 ...
 DEBUG (XNIO-1 task-1) [org.jboss.ejb.client.invocation] Received MODULE_UNAVAILABLE(9)
message for module /playground
 DEBUG (XNIO-1 task-1) [org.jboss.ejb.client.invocation] Received MODULE_UNAVAILABLE(9)
message for module /playground
 DEBUG (XNIO-1 task-1) [org.jboss.ejb.client.invocation] Received
CLUSTER_TOPOLOGY_NODE_REMOVAL(18) message for (cluster, node) = (ejb, node2)
 ...
 {code}
 Finally node3 is being shut down. Now the client only get the following information:
 {code}
 ...
 DEBUG (XNIO-1 task-1) [org.jboss.ejb.client.invocation] Received MODULE_UNAVAILABLE(9)
message for module /playground
 DEBUG (XNIO-1 task-1) [org.jboss.ejb.client.invocation] Received MODULE_UNAVAILABLE(9)
message for module /playground
 ...
 {code}
 This mean the _node3_ is not being informed about the fact that the last node of the
cluster has been stopped.
 From this point on the client is always getting {{Caused by: java.net.ConnectException:
Connection refused}}
 Now node1 is started again, resulting in the following output for {{connectedNodes}} and
the {{availableNodes}}:
 {code}
 ...
 INFO  (ThreadPoolTaskExecutor-1) [com.jboss.examples.ejb.CustomClusterNodeSelector]
connectedNodes(1) '[node1]', availableNodes(2) '[node3, node1]'
 ...
 {code} 

--
This message was sent by Atlassian Jira
(v7.12.1#712002)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006