[wildfly-dev] Inter Host Controller group communication mesh
Brian Stansberry
brian.stansberry at redhat.com
Tue Apr 12 12:49:20 EDT 2016
On 4/12/16 4:43 AM, Ryan Emerson wrote:
> Overall looks good to me, however I have a question about the automatic failover use case, how do you intend to handle split brain scenarios?
>
My basic thought was to require a quorum and if no quorum is available,
provide a degraded level of service.
A degraded level of service probably means no master. A domain can
function with no master. I can brainstorm about possible slight
enhancements to service beyond that, but I question whether they are
worth the effort, at least initially.
> Example scenarios: You have a network of {Master HC, Slave1, Slave2, Slave3, Slave4, Slave5} and the network splits into two partitions of {Master HC, Slave1, Slave2} and {Slave3, Slave4, Slave5}. Or even three distinct partitions consisting of #2 nodes.
>
I think we need a 3rd conceptual type -- a Potential Master. Not just
any HC can become master. It has to:
1) Be the latest version.
2) Be configured such that it's keeping a complete set of the domain
config and any domain managed content.
3) Is configured to use the group communication service used for leader
election.
4) Most likely it would also have specific config saying it can be a
master. I doubt this is something users will want to leave to chance.
So, electing a leader requires a quorum of Potential Masters.
> If no additional provisions were added, how detrimental would it be if two Master HCs were elected in distinct partitions and the network partitions became one again (resulting in two Master HCs)?
>
Two masters means two potentially inconsistent domain configurations
(i.e. domain.xml and content repo) are possible. We don't want that,
hence the quorum requirement.
A question is what should slave HCs do in the absence of a master. They
are isolated from control by a master, but don't know if there is still
a functioning set of DC+slaves out there, meaning the slaves may be
missing relevant config changes. Should they shut down, or keep going?
We already have this issue though, and we've elected to have the slaves
keep going, updating their config if they can reconnect to a master. We
chose to keep the appservers running, and not to have them be vulnerable
to problems with master-slave connectivity. Having autopromotion of a
new master makes it slightly more valid to just shut down, since going
masterless is less likely, but I still think it's not a good idea.
> ----- Original Message -----
> From: "Brian Stansberry" <brian.stansberry at redhat.com>
> To: wildfly-dev at lists.jboss.org
> Sent: Monday, 11 April, 2016 5:57:59 PM
> Subject: [wildfly-dev] Inter Host Controller group communication mesh
>
> Just an FYI: I spent a couple days and worked up a POC[1] of creating a
> JGroups-based reliable group communication mesh over the sockets our
> Host Controllers use for intra-domain management communications.
>
> Currently those sockets are used to form a tree of connections; master
> HC to slave HCs and then HCs to their servers. Slave HCs don't talk to
> each other. That kind of topology works fine for our current use cases,
> but not for other use cases, where a full communication mesh is more
> appropriate.
>
> 2 use cases led me to explore this:
>
> 1) A longstanding request to have automatic failover of the master HC to
> a backup. There are different ways to do this, but group communication
> based leader election is a possible solution. My preference, really.
>
> 2) https://issues.jboss.org/browse/WFLY-1066, which has led to various
> design alternatives, one of which is a distributed cache of topology
> information, available via each HC. See [2] for some of that discussion.
>
> I don't know if this kind of communication is a good idea, or if it's
> the right solution to either of these use cases. Lots of things need
> careful thought!! But I figured it was worth some time to experiment.
> And it worked in at least a basic POC way, hence this FYI.
>
> If you're interested in details, here are some Q&A:
>
> Q: Why JGroups?
>
> A: Because 1) I know it well 2) I trust it and 3) it's already used for
> this kind of group communications in full WildFly.
>
> Q: Why the management sockets? Why not other sockets?
>
> A: Slave HCs already need configuration for how to discover the master.
> Using the same sockets lets us reuse that discovery configuration for
> the JGroups communications as well. If we're going to use this kind of
> communication in an serious way, the configuration needs to be as easy
> as possible.
>
> Q: How does it work?
>
> A: JGroups is based on a stack of "protocols" each of which handles one
> aspect of reliable group communications. The POC creates and uses a
> standard protocol stack, except it replaces two standard protocols with
> custom ones:
>
> a) JGroups has various "Discovery" protocols which are used to find
> possible peers. I implemented one that integrates with the HC's domain
> controller discovery logic. It's basically a copy of the oft used
> TCPPING protocol with about 10-15 lines of code changed.
>
> b) JGroups has various "Transport" protocols which are responsible for
> actually sending/receiving over the network. I created a new one of
> those that knows how to use the WF management comms stuff built on JBoss
> Remoting. JGroups provides a number of base classes to use in this
> transport area, so I was able to rely on a lot of existing functionality
> and could just focus on the details specific to this case.
>
> Q: What have you done using the POC?
>
> A: I created a master HC and a slave on my laptop and saw them form a
> cluster and exchange messages. Typical stuff like starting and stopping
> the HCs worked. I see no reason why having multiple slaves wouldn't have
> worked too; I just didn't do it.
>
> Q: What's next?
>
> A: Nothing really. We have a couple concrete use cases we're looking to
> solve. We need to figure out the best solution for those use cases. If
> this kind of thing is useful in that, great. If not, it was a fun POC.
>
> [1]
> https://github.com/wildfly/wildfly-core/compare/master...bstansberry:jgroups-dc
> . See the commit message on the single commit to learn a bit more.
>
> [2] https://developer.jboss.org/wiki/ADomainManagedServiceRegistry
>
--
Brian Stansberry
Senior Principal Software Engineer
JBoss by Red Hat
More information about the wildfly-dev
mailing list