On 4/11/16 3:43 PM, Ken Wills wrote:
On Mon, Apr 11, 2016 at 11:57 AM, Brian Stansberry
<brian.stansberry(a)redhat.com <mailto:brian.stansberry@redhat.com>> wrote:
Just an FYI: I spent a couple days and worked up a POC[1] of creating a
JGroups-based reliable group communication mesh over the sockets our
Host Controllers use for intra-domain management communications.
Nice! I've been thinking about the mechanics of this a bit recently, but
I hadn't gotten to any sort of transport details, this looks interesting.
Currently those sockets are used to form a tree of connections; master
HC to slave HCs and then HCs to their servers. Slave HCs don't talk to
each other. That kind of topology works fine for our current use cases,
but not for other use cases, where a full communication mesh is more
appropriate.
2 use cases led me to explore this:
1) A longstanding request to have automatic failover of the master HC to
a backup. There are different ways to do this, but group communication
based leader election is a possible solution. My preference, really.
I'd come to the same conclusion of it being an election. A deterministic
election algorithm, perhaps allowing the configuration to supply some
sort of weighted value to influence the election on each node, perhaps
analogous to how the master browser smb election works (version + weight
+ etc).
Yep.
For sure the master must be running the latest version.
2)
https://issues.jboss.org/browse/WFLY-1066, which has led to various
design alternatives, one of which is a distributed cache of topology
information, available via each HC. See [2] for some of that discussion.
I don't know if this kind of communication is a good idea, or if it's
the right solution to either of these use cases. Lots of things need
careful thought!! But I figured it was worth some time to experiment.
And it worked in at least a basic POC way, hence this FYI.
Not knowing a lot about jgroups .. for very large domains is the mesh
NxN in size?
Yes.
For thousands of nodes would this become a problem,
It's one concern I have, yes. There are large JGroups clusters, but they
may be based on the UDP multicast transport JGroups offers.
or would
a mechanism to segment into local groups perhaps, with only certain
nodes participating in the mesh and being eligible for election?
For sure we'd have something in the host.xml that controls whether a
particular HC joins the group.
I don't think this is a big problem for the DC election use case, as you
don't need a large number of HCs in the group. You'd have a few
"potential" DCs that could join the group, and the remaining slaves
don't need to.
For use cases where you want slave HCs to be in the cluster though, it's
a concern. The distributed topology cache thing may or may not need
that. It needs a few HCs to provide HA, but those could be the same ones
that are "potential" HCs. But if only a few are in the group, the
servers need to be told how to reach those HCs. Chicken and egg, as the
point of the topology cache is to provide that kind of data to servers!
If a server's own HC is required to be a part of the group though, that
helps cut through the chicken/egg problem.
Ken
--
Brian Stansberry
Senior Principal Software Engineer
JBoss by Red Hat