[wildfly-dev] Inter Host Controller group communication mesh

Mon Apr 11 17:20:02 EDT 2016

On 4/11/16 3:43 PM, Ken Wills wrote:
>
>
> On Mon, Apr 11, 2016 at 11:57 AM, Brian Stansberry
> <brian.stansberry at redhat.com <mailto:brian.stansberry at redhat.com>> wrote:
>
>     Just an FYI: I spent a couple days and worked up a POC[1] of creating a
>     JGroups-based reliable group communication mesh over the sockets our
>     Host Controllers use for intra-domain management communications.
>
>
> Nice! I've been thinking about the mechanics of this a bit recently, but
> I hadn't gotten to any sort of transport details, this looks interesting.
>
>     Currently those sockets are used to form a tree of connections; master
>     HC to slave HCs and then HCs to their servers. Slave HCs don't talk to
>     each other. That kind of topology works fine for our current use cases,
>     but not for other use cases, where a full communication mesh is more
>     appropriate.
>
>     2 use cases led me to explore this:
>
>     1) A longstanding request to have automatic failover of the master HC to
>     a backup. There are different ways to do this, but group communication
>     based leader election is a possible solution. My preference, really.
>
>
> I'd come to the same conclusion of it being an election. A deterministic
> election algorithm, perhaps allowing the configuration to supply some
> sort of weighted value to influence the election on each node, perhaps
> analogous to how the master browser smb election works (version + weight
> + etc).

Yep.

For sure the master must be running the latest version.

>
>
>     2) https://issues.jboss.org/browse/WFLY-1066, which has led to various
>     design alternatives, one of which is a distributed cache of topology
>     information, available via each HC. See [2] for some of that discussion.
>
>     I don't know if this kind of communication is a good idea, or if it's
>     the right solution to either of these use cases. Lots of things need
>     careful thought!! But I figured it was worth some time to experiment.
>     And it worked in at least a basic POC way, hence this FYI.
>
>
> Not knowing a lot about jgroups .. for very large domains is the mesh
> NxN in size?

Yes.

For thousands of nodes would this become a problem,

It's one concern I have, yes. There are large JGroups clusters, but they 
may be based on the UDP multicast transport JGroups offers.

> or would
> a mechanism to segment into local groups perhaps, with only certain
> nodes participating in the mesh and being eligible for election?

For sure we'd have something in the host.xml that controls whether a 
particular HC joins the group.

I don't think this is a big problem for the DC election use case, as you 
don't need a large number of HCs in the group. You'd have a few 
"potential" DCs that could join the group, and the remaining slaves 
don't need to.

For use cases where you want slave HCs to be in the cluster though, it's 
a concern. The distributed topology cache thing may or may not need 
that. It needs a few HCs to provide HA, but those could be the same ones 
that are "potential" HCs. But if only a few are in the group, the 
servers need to be told how to reach those HCs. Chicken and egg, as the 
point of the topology cache is to provide that kind of data to servers! 
If a server's own HC is required to be a part of the group though, that 
helps cut through the chicken/egg problem.

> Ken
>

-- 
Brian Stansberry
Senior Principal Software Engineer
JBoss by Red Hat