[jboss-jira] [JBoss JIRA] (WFLY-4748) Singleton service fails to start after repetitive cluster split with "Failed to reach quorum of 1"

Wed Feb 8 22:47:00 EST 2017

    [ https://issues.jboss.org/browse/WFLY-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13360852#comment-13360852 ] 

Bui Van Nghiem edited comment on WFLY-4748 at 2/8/17 10:46 PM:
---------------------------------------------------------------

I still face the same error on WildFly 9.0.2 after Full GC on master server:
2017-02-07 16:00:00,188 ERROR [stateTransferExecutor-thread--p18-t2]-[org.wildfly.clustering.server] WFLYCLSV0006: Failed to reach quorum of 1 for "my-service" service. No singleton master will be elected. 

was (Author: bvnghiem1012):
I still face the same error on WildFly 9.0.2 after Full GC on master server:
2017-02-07 16:00:00,188 ERROR [stateTransferExecutor-thread--p18-t2]-[org.wildfly.clustering.server] WFLYCLSV0006: Failed to reach quorum of 1 for "axs-application-server-deployment-9.6.90-346143" service. No singleton master will be elected. 

> Singleton service fails to start after repetitive cluster split with "Failed to reach quorum of 1"
> --------------------------------------------------------------------------------------------------
>
>                 Key: WFLY-4748
>                 URL: https://issues.jboss.org/browse/WFLY-4748
>             Project: WildFly
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 9.0.0.CR1, 10.0.0.Alpha2
>            Reporter: Tomas Hofman
>            Assignee: Paul Ferraro
>             Fix For: 10.0.0.Alpha6
>
>
> When cluster of two nodes with deployed singleton service (f.i. cluster-ha-singleton quickstart app) splits, merges, and splits again, one of the nodes fails to run the singleton service with error message "WFLYCLSV0006: Failed to reach *quorum of 1* for jboss.quickstart.ha.singleton.default2 service. No singleton master will be elected." - note the "quorum of 1".
> This only happens after the second and other successive splits. After the first split both nodes execute the service correctly.
> After analysis, it appears that nodes are never being added back to service providers cache upon cluster merge, because CacheServiceProviderRegistrationFactory#membershipChanged() is never called with 'merged' attribute set to 'true'.
> I presume that call should come from ChannelCommandDispatcherFactory#viewAccepted():
> {code}
> public void viewAccepted(View view) {
>     // ...
>     for (Listener listener: this.listeners) {
>         listener.membershipChanged(oldNodes, newNodes, view instanceof MergeView);
>     }
> }
> {code}
> This method gets called, but the problem is that the 'listeners' list is empty, so no listener is actually notified.

--
This message was sent by Atlassian JIRA
(v7.2.3#72005)