]
Paul Ferraro updated MODCLUSTER-66:
-----------------------------------
Fix Version/s: 1.1.0.CR1
HAModClusterService needs to handle cluster splits
--------------------------------------------------
Key: MODCLUSTER-66
URL:
https://jira.jboss.org/jira/browse/MODCLUSTER-66
Project: mod_cluster
Issue Type: Task
Affects Versions: 1.0.0.Beta4
Reporter: Brian Stansberry
Assignee: Paul Ferraro
Fix For: 1.1.0.CR1
The case where a split of the JGroups group occurs but nodes are still able to contact
the httpd servers needs to be handled. There is a brief discussion of this on
https://www.jboss.org/community/docs/DOC-11431 under "Split-Brain Syndrome".
Problem is split-brain will result in nodes removing each other from httpd, resulting in
no nodes active.
The wiki page describes a simple approach. A more complex approach would be to take a
"primary partition" approach, whereby say an initial cluster of size n==6 {A, B,
C, D, E, F} splits into two cluster {A, B, C, D} and { E, F}. To continue to handle
requests a partition would need to have at least Math.floor((float) n / 2 + 1) members.
What kind of approach is appropriate would probably depend on the deployed webapps and
how they interact with the cluster. If there is no clustered state that can become
inconsistent across the cluster split, the simple approach described on the wiki can work
fine (an HAModClusterService master doesn't disable a node if httpd reports it is
still available). If there is shared state that needs to remain consistent (e.g. a
clustered Hibernate Second Level Cache) then primary partition works better.
Most likely this overall problem will be resolved in stages, e.g. the simple approach
from the wiki first.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: