[wildfly-dev] Inter Host Controller group communication mesh
Brian Stansberry
brian.stansberry at redhat.com
Tue Apr 12 09:52:01 EDT 2016
On 4/12/16 4:29 AM, Sebastian Laskawiec wrote:
> Adding Bela to the thread...
>
> The POC looks really nice to me. I could try take it from herre and
> finish WFLY-1066 implementation to see how everything works together.
>
> The only thing that comes into my mind is whether we should (or or not)
> add capability and server group information to it? I think most of the
> subsystems would be interested in that.
>
We'd still need a design for exactly how a distributed cache of topology
info would work. Using JGroups opens up the possibility of using
Infinispan, but the structure of the data in the cache is still TBD. I
think capability and server group data will be part of that.
We also have to work out how the servers access the cache data. As Ken
pointed out having a large TCP mesh might be problematic, so do we want
each HC in the cluster, or a subset, with then some other mechanism for
the servers accessing the cache?
> On Mon, Apr 11, 2016 at 6:57 PM, Brian Stansberry
> <brian.stansberry at redhat.com <mailto:brian.stansberry at redhat.com>> wrote:
>
> Just an FYI: I spent a couple days and worked up a POC[1] of creating a
> JGroups-based reliable group communication mesh over the sockets our
> Host Controllers use for intra-domain management communications.
>
> Currently those sockets are used to form a tree of connections; master
> HC to slave HCs and then HCs to their servers. Slave HCs don't talk to
> each other. That kind of topology works fine for our current use cases,
> but not for other use cases, where a full communication mesh is more
> appropriate.
>
> 2 use cases led me to explore this:
>
> 1) A longstanding request to have automatic failover of the master HC to
> a backup. There are different ways to do this, but group communication
> based leader election is a possible solution. My preference, really.
>
> 2) https://issues.jboss.org/browse/WFLY-1066, which has led to various
> design alternatives, one of which is a distributed cache of topology
> information, available via each HC. See [2] for some of that discussion.
>
> I don't know if this kind of communication is a good idea, or if it's
> the right solution to either of these use cases. Lots of things need
> careful thought!! But I figured it was worth some time to experiment.
> And it worked in at least a basic POC way, hence this FYI.
>
> If you're interested in details, here are some Q&A:
>
> Q: Why JGroups?
>
> A: Because 1) I know it well 2) I trust it and 3) it's already used for
> this kind of group communications in full WildFly.
>
> Q: Why the management sockets? Why not other sockets?
>
> A: Slave HCs already need configuration for how to discover the master.
> Using the same sockets lets us reuse that discovery configuration for
> the JGroups communications as well. If we're going to use this kind of
> communication in an serious way, the configuration needs to be as easy
> as possible.
>
> Q: How does it work?
>
> A: JGroups is based on a stack of "protocols" each of which handles one
> aspect of reliable group communications. The POC creates and uses a
> standard protocol stack, except it replaces two standard protocols with
> custom ones:
>
> a) JGroups has various "Discovery" protocols which are used to find
> possible peers. I implemented one that integrates with the HC's domain
> controller discovery logic. It's basically a copy of the oft used
> TCPPING protocol with about 10-15 lines of code changed.
>
> b) JGroups has various "Transport" protocols which are responsible for
> actually sending/receiving over the network. I created a new one of
> those that knows how to use the WF management comms stuff built on JBoss
> Remoting. JGroups provides a number of base classes to use in this
> transport area, so I was able to rely on a lot of existing functionality
> and could just focus on the details specific to this case.
>
> Q: What have you done using the POC?
>
> A: I created a master HC and a slave on my laptop and saw them form a
> cluster and exchange messages. Typical stuff like starting and stopping
> the HCs worked. I see no reason why having multiple slaves wouldn't have
> worked too; I just didn't do it.
>
> Q: What's next?
>
> A: Nothing really. We have a couple concrete use cases we're looking to
> solve. We need to figure out the best solution for those use cases. If
> this kind of thing is useful in that, great. If not, it was a fun POC.
>
> [1]
> https://github.com/wildfly/wildfly-core/compare/master...bstansberry:jgroups-dc
> . See the commit message on the single commit to learn a bit more.
>
> [2] https://developer.jboss.org/wiki/ADomainManagedServiceRegistry
>
> --
> Brian Stansberry
> Senior Principal Software Engineer
> JBoss by Red Hat
> _______________________________________________
> wildfly-dev mailing list
> wildfly-dev at lists.jboss.org <mailto:wildfly-dev at lists.jboss.org>
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
>
--
Brian Stansberry
Senior Principal Software Engineer
JBoss by Red Hat
More information about the wildfly-dev
mailing list