On Mon, Apr 11, 2016 at 6:57 PM, Brian Stansberry <brian.stansberry@redhat.com> wrote:

Just an FYI: I spent a couple days and worked up a POC[1] of creating a
JGroups-based reliable group communication mesh over the sockets our
Host Controllers use for intra-domain management communications.

Currently those sockets are used to form a tree of connections; master
HC to slave HCs and then HCs to their servers. Slave HCs don't talk to
each other. That kind of topology works fine for our current use cases,
but not for other use cases, where a full communication mesh is more
appropriate.

2 use cases led me to explore this:

1) A longstanding request to have automatic failover of the master HC to
a backup. There are different ways to do this, but group communication
based leader election is a possible solution. My preference, really.

2) https://issues.jboss.org/browse/WFLY-1066, which has led to various
design alternatives, one of which is a distributed cache of topology
information, available via each HC. See [2] for some of that discussion.

I don't know if this kind of communication is a good idea, or if it's
the right solution to either of these use cases. Lots of things need
careful thought!! But I figured it was worth some time to experiment.
And it worked in at least a basic POC way, hence this FYI.

If you're interested in details, here are some Q&A:

Q: Why JGroups?

A: Because 1) I know it well 2) I trust it and 3) it's already used for
this kind of group communications in full WildFly.

Q: Why the management sockets? Why not other sockets?

A: Slave HCs already need configuration for how to discover the master.
Using the same sockets lets us reuse that discovery configuration for
the JGroups communications as well. If we're going to use this kind of
communication in an serious way, the configuration needs to be as easy
as possible.

Q: How does it work?

A: JGroups is based on a stack of "protocols" each of which handles one
aspect of reliable group communications. The POC creates and uses a
standard protocol stack, except it replaces two standard protocols with
custom ones:

a) JGroups has various "Discovery" protocols which are used to find
possible peers. I implemented one that integrates with the HC's domain
controller discovery logic. It's basically a copy of the oft used
TCPPING protocol with about 10-15 lines of code changed.

b) JGroups has various "Transport" protocols which are responsible for
actually sending/receiving over the network. I created a new one of
those that knows how to use the WF management comms stuff built on JBoss
Remoting. JGroups provides a number of base classes to use in this
transport area, so I was able to rely on a lot of existing functionality
and could just focus on the details specific to this case.

Q: What have you done using the POC?

A: I created a master HC and a slave on my laptop and saw them form a
cluster and exchange messages. Typical stuff like starting and stopping
the HCs worked. I see no reason why having multiple slaves wouldn't have
worked too; I just didn't do it.

Q: What's next?

A: Nothing really. We have a couple concrete use cases we're looking to
solve. We need to figure out the best solution for those use cases. If
this kind of thing is useful in that, great. If not, it was a fun POC.

[1]
https://github.com/wildfly/wildfly-core/compare/master...bstansberry:jgroups-dc
. See the commit message on the single commit to learn a bit more.

[2] https://developer.jboss.org/wiki/ADomainManagedServiceRegistry

--
Brian Stansberry
Senior Principal Software Engineer
JBoss by Red Hat
_______________________________________________
wildfly-dev mailing list
wildfly-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/wildfly-dev