Singleton service fails to start after repetitive cluster split with
"Failed to reach quorum of 1"
--------------------------------------------------------------------------------------------------
Key: WFLY-4748
URL:
https://issues.jboss.org/browse/WFLY-4748
Project: WildFly
Issue Type: Bug
Components: Clustering
Affects Versions: 9.0.0.CR1, 10.0.0.Alpha2
Reporter: Tomas Hofman
Assignee: Paul Ferraro
Fix For: 10.0.0.Alpha6
When cluster of two nodes with deployed singleton service (f.i. cluster-ha-singleton
quickstart app) splits, merges, and splits again, one of the nodes fails to run the
singleton service with error message "WFLYCLSV0006: Failed to reach *quorum of 1* for
jboss.quickstart.ha.singleton.default2 service. No singleton master will be elected."
- note the "quorum of 1".
This only happens after the second and other successive splits. After the first split
both nodes execute the service correctly.
After analysis, it appears that nodes are never being added back to service providers
cache upon cluster merge, because
CacheServiceProviderRegistrationFactory#membershipChanged() is never called with
'merged' attribute set to 'true'.
I presume that call should come from ChannelCommandDispatcherFactory#viewAccepted():
{code}
public void viewAccepted(View view) {
// ...
for (Listener listener: this.listeners) {
listener.membershipChanged(oldNodes, newNodes, view instanceof MergeView);
}
}
{code}
This method gets called, but the problem is that the 'listeners' list is empty,
so no listener is actually notified.