[mod_cluster-issues] [JBoss JIRA] (MODCLUSTER-66) HAModClusterService needs to handle cluster splits

Michal Babacek (JIRA) issues at jboss.org
Fri Aug 8 06:29:18 EDT 2014


     [ https://issues.jboss.org/browse/MODCLUSTER-66?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michal Babacek closed MODCLUSTER-66.
------------------------------------



Closing. Clean-up.
At least one of the following applies:

  * the issue has been thoroughly tested as a part of one of the current releases
or
  * it hasn't occurred in ~2 years
or
  * it's utterly harmless

> HAModClusterService needs to handle cluster splits
> --------------------------------------------------
>
>                 Key: MODCLUSTER-66
>                 URL: https://issues.jboss.org/browse/MODCLUSTER-66
>             Project: mod_cluster
>          Issue Type: Task
>      Security Level: Public(Everyone can see) 
>    Affects Versions: 1.0.0.Beta4
>            Reporter: Brian Stansberry
>            Assignee: Paul Ferraro
>             Fix For: 1.1.0.Beta1
>
>
> The case where a split of the JGroups group occurs but nodes are still able to contact the httpd servers needs to be handled. There is a brief discussion of this on https://www.jboss.org/community/docs/DOC-11431 under "Split-Brain Syndrome".  Problem is split-brain will result in nodes removing each other from httpd, resulting in no nodes active.
> The wiki page describes a simple approach. A more complex approach would be to take a "primary partition" approach, whereby say an initial cluster of size n==6 {A, B, C, D, E, F} splits into two cluster {A, B, C, D} and { E, F}. To continue to handle requests a partition would need to have at least Math.floor((float) n / 2 + 1) members.
> What kind of approach is appropriate would probably depend on the deployed webapps and how they interact with the cluster. If there is no clustered state that can become inconsistent across the cluster split, the simple approach described on the wiki can work fine (an HAModClusterService master doesn't disable a node if httpd reports it is still available).  If there is shared state that needs to remain consistent (e.g. a clustered Hibernate Second Level Cache) then primary partition works better.
> Most likely this overall problem will be resolved in stages, e.g. the simple approach from the wiki first.



--
This message was sent by Atlassian JIRA
(v6.2.6#6264)


More information about the mod_cluster-issues mailing list