]
Vladimir Blagojevic commented on ISPN-6394:
-------------------------------------------
[~NadirX] As I am implementing the fix I realized we can use view mismatch information for
clusters view (Cache containers) list. We can not use it for list of
clusters/server-groups. We do not know in server-group list which cache containers are
deployed on them.
Coalesce server group view and Infinispan/JGroups view
------------------------------------------------------
Key: ISPN-6394
URL:
https://issues.jboss.org/browse/ISPN-6394
Project: Infinispan
Issue Type: Bug
Components: Console
Affects Versions: 8.2.0.Final
Reporter: Vladimir Blagojevic
Assignee: Vladimir Blagojevic
Priority: Critical
Fix For: 9.0.0.Alpha1, 9.0.0.Final
Currently the console is using the server-group knowledge (i.e. which host/servers belong
to a specific group). While that is definitely the "ideal" situation, we also
need to ensure that it corresponds to the "actual" cluster as known to
Infinispan/JGroups. This information should be then used to present the user with
appropriate warnings if necessary.
For each container %c in each server %s in the server group we need to extract the
"members" property:
/host=%h/server=%s/subsystem=datagrid-infinispan/cache-container=%c:read-attribute(name=members)
This returns a list of server names (in the form %h:%s).
This is how we should use the information (in combination with the existing
"cluster-availability" property information from the coordinator):
1. If the server-group list coincides with the container members of all nodes, all is
good: the cluster is healthy, all nodes are up and running
2. If all of the container members contain the SAME subset of the server group, but the
missing members are in the STOPPED or STARTING state, everything could be normal: we
should depend on the coordinator's "cluster-availability" to tell us if the
cluster is unhealthy.
3. If the container members differ between each other and with the server group view, and
all these servers are in RUNNING we have a potential split brain or a cluster which is not
formed correctly.
The above deduction should determine not only the label / colour-coding we place in the
view header (AVAILABLE, DEGRADED, etc) but also some of the view content: in both the
cluster nodes view and the cache nodes view we need to group / sort by membership, so that
we clearly show split clusters and stopped nodes.