Inter Host Controller group communication mesh

Monday, 11 April 2016

Just an FYI: I spent a couple days and worked up a POC[1] of creating a 
JGroups-based reliable group communication mesh over the sockets our 
Host Controllers use for intra-domain management communications.

Currently those sockets are used to form a tree of connections; master 
HC to slave HCs and then HCs to their servers. Slave HCs don't talk to 
each other. That kind of topology works fine for our current use cases, 
but not for other use cases, where a full communication mesh is more 
appropriate.

2 use cases led me to explore this:

1) A longstanding request to have automatic failover of the master HC to 
a backup. There are different ways to do this, but group communication 
based leader election is a possible solution. My preference, really.

2) https://issues.jboss.org/browse/WFLY-1066, which has led to various 
design alternatives, one of which is a distributed cache of topology 
information, available via each HC. See [2] for some of that discussion.

I don't know if this kind of communication is a good idea, or if it's 
the right solution to either of these use cases. Lots of things need 
careful thought!! But I figured it was worth some time to experiment. 
And it worked in at least a basic POC way, hence this FYI.

If you're interested in details, here are some Q&A:

Q: Why JGroups?

A: Because 1) I know it well 2) I trust it and 3) it's already used for 
this kind of group communications in full WildFly.

Q: Why the management sockets? Why not other sockets?

A: Slave HCs already need configuration for how to discover the master. 
Using the same sockets lets us reuse that discovery configuration for 
the JGroups communications as well. If we're going to use this kind of 
communication in an serious way, the configuration needs to be as easy 
as possible.

Q: How does it work?

A: JGroups is based on a stack of "protocols" each of which handles one 
aspect of reliable group communications. The POC creates and uses a 
standard protocol stack, except it replaces two standard protocols with 
custom ones:

a) JGroups has various "Discovery" protocols which are used to find 
possible peers. I implemented one that integrates with the HC's domain 
controller discovery logic. It's basically a copy of the oft used 
TCPPING protocol with about 10-15 lines of code changed.

b) JGroups has various "Transport" protocols which are responsible for 
actually sending/receiving over the network. I created a new one of 
those that knows how to use the WF management comms stuff built on JBoss 
Remoting. JGroups provides a number of base classes to use in this 
transport area, so I was able to rely on a lot of existing functionality 
and could just focus on the details specific to this case.

Q: What have you done using the POC?

A: I created a master HC and a slave on my laptop and saw them form a 
cluster and exchange messages. Typical stuff like starting and stopping 
the HCs worked. I see no reason why having multiple slaves wouldn't have 
worked too; I just didn't do it.

Q: What's next?

A: Nothing really. We have a couple concrete use cases we're looking to 
solve. We need to figure out the best solution for those use cases. If 
this kind of thing is useful in that, great. If not, it was a fun POC.

[1] 
https://github.com/wildfly/wildfly-core/compare/master...bstansberry:jgro... 
. See the commit message on the single commit to learn a bit more.

[2] https://developer.jboss.org/wiki/ADomainManagedServiceRegistry

-- 
Brian Stansberry
Senior Principal Software Engineer
JBoss by Red Hat

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013