[wildfly-dev] Inter Host Controller group communication mesh

Tue Apr 12 09:52:01 EDT 2016

On 4/12/16 4:29 AM, Sebastian Laskawiec wrote:
> Adding Bela to the thread...
>
> The POC looks really nice to me. I could try take it from herre and
> finish WFLY-1066 implementation to see how everything works together.
>
> The only thing that comes into my mind is whether we should (or or not)
> add capability and server group information to it? I think most of the
> subsystems would be interested in that.
>

We'd still need a design for exactly how a distributed cache of topology 
info would work. Using JGroups opens up the possibility of using 
Infinispan, but the structure of the data in the cache is still TBD. I 
think capability and server group data will be part of that.

We also have to work out how the servers access the cache data. As Ken 
pointed out having a large TCP mesh might be problematic, so do we want 
each HC in the cluster, or a subset, with then some other mechanism for 
the servers accessing the cache?

> On Mon, Apr 11, 2016 at 6:57 PM, Brian Stansberry
> <brian.stansberry at redhat.com <mailto:brian.stansberry at redhat.com>> wrote:
>
>     Just an FYI: I spent a couple days and worked up a POC[1] of creating a
>     JGroups-based reliable group communication mesh over the sockets our
>     Host Controllers use for intra-domain management communications.
>
>     Currently those sockets are used to form a tree of connections; master
>     HC to slave HCs and then HCs to their servers. Slave HCs don't talk to
>     each other. That kind of topology works fine for our current use cases,
>     but not for other use cases, where a full communication mesh is more
>     appropriate.
>
>     2 use cases led me to explore this:
>
>     1) A longstanding request to have automatic failover of the master HC to
>     a backup. There are different ways to do this, but group communication
>     based leader election is a possible solution. My preference, really.
>
>     2) https://issues.jboss.org/browse/WFLY-1066, which has led to various
>     design alternatives, one of which is a distributed cache of topology
>     information, available via each HC. See [2] for some of that discussion.
>
>     I don't know if this kind of communication is a good idea, or if it's
>     the right solution to either of these use cases. Lots of things need
>     careful thought!! But I figured it was worth some time to experiment.
>     And it worked in at least a basic POC way, hence this FYI.
>
>     If you're interested in details, here are some Q&A:
>
>     Q: Why JGroups?
>
>     A: Because 1) I know it well 2) I trust it and 3) it's already used for
>     this kind of group communications in full WildFly.
>
>     Q: Why the management sockets? Why not other sockets?
>
>     A: Slave HCs already need configuration for how to discover the master.
>     Using the same sockets lets us reuse that discovery configuration for
>     the JGroups communications as well. If we're going to use this kind of
>     communication in an serious way, the configuration needs to be as easy
>     as possible.
>
>     Q: How does it work?
>
>     A: JGroups is based on a stack of "protocols" each of which handles one
>     aspect of reliable group communications. The POC creates and uses a
>     standard protocol stack, except it replaces two standard protocols with
>     custom ones:
>
>     a) JGroups has various "Discovery" protocols which are used to find
>     possible peers. I implemented one that integrates with the HC's domain
>     controller discovery logic. It's basically a copy of the oft used
>     TCPPING protocol with about 10-15 lines of code changed.
>
>     b) JGroups has various "Transport" protocols which are responsible for
>     actually sending/receiving over the network. I created a new one of
>     those that knows how to use the WF management comms stuff built on JBoss
>     Remoting. JGroups provides a number of base classes to use in this
>     transport area, so I was able to rely on a lot of existing functionality
>     and could just focus on the details specific to this case.
>
>     Q: What have you done using the POC?
>
>     A: I created a master HC and a slave on my laptop and saw them form a
>     cluster and exchange messages. Typical stuff like starting and stopping
>     the HCs worked. I see no reason why having multiple slaves wouldn't have
>     worked too; I just didn't do it.
>
>     Q: What's next?
>
>     A: Nothing really. We have a couple concrete use cases we're looking to
>     solve. We need to figure out the best solution for those use cases. If
>     this kind of thing is useful in that, great. If not, it was a fun POC.
>
>     [1]
>     https://github.com/wildfly/wildfly-core/compare/master...bstansberry:jgroups-dc
>     . See the commit message on the single commit to learn a bit more.
>
>     [2] https://developer.jboss.org/wiki/ADomainManagedServiceRegistry
>
>     --
>     Brian Stansberry
>     Senior Principal Software Engineer
>     JBoss by Red Hat
>     _______________________________________________
>     wildfly-dev mailing list
>     wildfly-dev at lists.jboss.org <mailto:wildfly-dev at lists.jboss.org>
>     https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
>

-- 
Brian Stansberry
Senior Principal Software Engineer
JBoss by Red Hat