[infinispan-issues] [JBoss JIRA] (ISPN-6394) Coalesce server group view and Infinispan/JGroups view

Fri Apr 15 05:40:00 EDT 2016

     [ https://issues.jboss.org/browse/ISPN-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tristan Tarrant updated ISPN-6394:
----------------------------------
              Status: Pull Request Sent  (was: Open)
    Git Pull Request: https://github.com/infinispan/infinispan-management-console/pull/78


> Coalesce server group view and Infinispan/JGroups view
> ------------------------------------------------------
>
>                 Key: ISPN-6394
>                 URL: https://issues.jboss.org/browse/ISPN-6394
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Console
>    Affects Versions: 8.2.0.Final
>            Reporter: Vladimir Blagojevic
>            Assignee: Vladimir Blagojevic
>            Priority: Critical
>             Fix For: 9.0.0.Alpha2, 8.2.2.Final, 9.0.0.Final
>
>
> Currently the console is using the server-group knowledge (i.e. which host/servers belong to a specific group). While that is definitely the "ideal" situation, we also need to ensure that it corresponds to the "actual" cluster as known to Infinispan/JGroups. This information should be then used to present the user with appropriate warnings if necessary.
> For each container %c in each server %s in the server group we need to extract the "members" property:
> /host=%h/server=%s/subsystem=datagrid-infinispan/cache-container=%c:read-attribute(name=members)
> This returns a list of server names (in the form %h:%s).
> This is how we should use the information (in combination with the existing "cluster-availability" property information from the coordinator):
> 1. If the server-group list coincides with the container members of all nodes, all is good: the cluster is healthy, all nodes are up and running
> 2. If all of the container members contain the SAME subset of the server group, but the missing members are in the STOPPED or STARTING state, everything could be normal: we should depend on the coordinator's "cluster-availability" to tell us if the cluster is unhealthy.
> 3. If the container members differ between each other and with the server group view, and all these servers are in RUNNING we have a potential split brain or a cluster which is not formed correctly.
> The above deduction should determine not only the label / colour-coding we place in the view header (AVAILABLE, DEGRADED, etc) but also some of the view content: in both the cluster nodes view and the cache nodes view we need to group / sort by membership, so that we clearly show split clusters and stopped nodes. 


--
This message was sent by Atlassian JIRA
(v6.4.11#64026)