[jboss-dev-forums] [Design of Clustering on JBoss] - HASingleton election and communication with mod_cluster

Wednesday, 16 July 2008

See http://wiki.jboss.org/wiki/en/ModClusterDesign 
and http://wiki.jboss.org/wiki/Mod-Cluster_AS_integration for background.

HASingleton election refers to the process by which one of a set of nodes in a cluster is
chosen as the one that will act as the singleton master. Upon receiving events generated
by the AS's DistributedReplicantManager (DRM) service, each node in the cluster
independently decides who will be the master using an HASingletonElectionPolicy impl.  The
key requirement of the policy impl is that its decision must be deterministic, so every
node makes the same decision.

An HASingletonElection policy impl can use whatever data it wants to make its decision. 
The AS ships with a couple impls, the most basic of which uses the nodes' ordinal
position in the current JGroups view to pick the master.

For the ModClusterService use case, there is an additional important factor in the
election decision: a node's ability to communicate with the httpd servers that are
proxying requests. A ModClusterService instance that can only communicate with one of four
httpd servers is less desirable as the singleton master than one that can communicate with
all four.

An added twist on this is certain httpd servers are more valuable than others.  An htttpd
server that has received configuration messages from the JBoss side is more valuable than
one that hasn't. I call such a server "established".  Imagine a scenario
where there are 4 httpd servers, with only #1 and #2 established. Node A can communicate
with #1 and #2, while B can communicate with #2 as well as a newly discovered #3 and
#4> Node A should be the master even though B can communicate with more httpd servers. 
B has no ability to inform httpd #1 of future config changes or load balance factor
changes, potentially leaving it in an invalid state.

The solution I see for managing the election process is to include information about a
node's ability to communicate with the httpd servers inside the object it registers
with the DistributedReplicantManager service.  The object registered with DRM is available
to HASingletonElectionPolicy impls, so they can use this information in their election
decisions.  So that's fairly straightforward.

What's more tricky is ensuring that timing effects in the recognition of httpd server
topology changes don't lead to changes in the HASingleton master. For example:

a) httpd #5 comes on line and advertises itself.  JBoss node B notices this slightly
before master node A and takes over as singleton master, only to have A become the master
again shortly later.
b) communication to httpd #3 is disrupted. JBoss node A (the master) notices this slightly
before node B, and thus gets unelected as master, only to end up being re-elected shortly
thereafter as B notices the trouble.

A possible protocol for handling this:

1) All nodes can "discover" httpd servers.  "Discover" means learning
the address:port of the httpd server either by parsing a config at startup, detecting one
via an AdvertiseListener, or possibly by having a new address:port added by an
administrative tool..

2) All nodes delegate to the current HASingleton master to manage the process of including
new httpd servers in the available pool or removing leaving ones. So, if non-master node B
discovers a new httpd server, it sends a message to master node A advising of the new
server.  Node B does nothing further to initiate communication with the new httpd server.
It does keep a reference to the new/removed server address:port so it can resend the
message to a new master if the current master fails.

3) The master node updates its MCMPHandler with the address/port of all discovered
servers, whether discovered locally or via a message from another node. (The MCMPHandler
has to be able to deal with duplicate adds).

4) During the regular status processing:

a) the master performs a status check on its own MCMPHandler -- i.e. adds newly discovered
httpd servers, removes any it has been told no longer exist, tries to recover any in ERROR
state via sending (cluster-wide) configuration data.

b) the master can now check how many httpd servers it knows about, how many are healthy,
how many are established.

c) master sends an RPC to the cluster providing the address/port of all the currently
known servers, along with a flag as to whether they are established.  The other nodes use
this to add/remove servers from their MCMPHandler, and then have their MCPHandler perform
a status check. This status check *does not* involve sending configuration messages to
recover any in ERROR state; instead an innocuous message like INFO should be sent. Goal is
simply to test the ability to communicate with the httpd server. Only the master node
sends configuration requests to the httpd side.

d) each non-master node checks how many httpd servers it knows about, how many are
healthy, how many are established and responds to the RPC with this information.

e) the master now has all the information that would go into an HASingleton election. If
the state of any of the nodes has changed, that node will need to update its object stored
in the DistributedReplicantManager service. Any time the DRM is updated, a new election
occurs. The question is in what order nodes should make their update. If the master node
itself has changed state, it needs to decide whether to update the DRM before telling
other nodes to do so, or to let the other nodes update first, and then make its update. 
If one choice or the other will result in the current master being reelected, the master
should use that order for the next 2 steps:

f) master sends a message telling the other nodes to update the DRM with the state they
sent in step d).

g) master updates the DRM with its current state

h) upon receipt of message from f), any reference to a new/removed server the node may
have cached in step 2) above is cleared (since step d/e is completed, so all nodes in the
cluster know about the new httpd server)

Failure handling in the above:

1) Step 2 above. Problem of dropping discovery messages.  See 4.h.  If a new master is
elected before 4.h occurs, resend the discovery message to the new master.

2) Node doesn't respond to the 4.d RPC or responds with exception. In 4.g, the master
instructs any such node to report itself to DRM as being in an error state; the election
policy should treat such a node as being last in line for election as master.

3) Master fails after 4.e. The failure is detected so a new master is chosed, but the
election is based on outdated information. Whoever is elected master, the next cycle the
DRM will be updated with current information and a new election can occur, possibly
resulting in yet another change in master. Need to decide how to deal with that; i.e. how
should the "temporary" new master react?  Depends on what a new master does
anyway. If taking over as master doesn't involve any heavy work (i.e. refreshing all
the configs on all the httpd servers), then this situation is harmless. At the moment I
don't think taking over as master will involve heavy work.

4) Master fails after 4.f, doesn't complete 4.g. An election will occur after 4.f,
using outdated data for the master node. Election can have two results: i) the master is
re-elected, in which case the cluster just waits for the master to be suspected, forcing a
new election. This is fine.  ii) the master is not re-elected and a new node takes over as
master, which is fine.

5) Master chooses 4.g before 4.f but fails after 4.g.  Same as 3) above.

View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4164903#...

Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-dev-forums] [Design of Clustering on JBoss] - HASingleton election and communication with mod_cluster