On 4/5/13 11:02 PM, Dennis Reed wrote:
Some other alternatives for detecting a split:
- a hard-coded number of nodes required for a quorum configured by
the customer. This is what we recommend for HASIngletons when the
customer wants to guarantee that something is running at most once in
the cluster.
Yes - this is what I described in my previous email (section 5.6.3 of
the JGroups manual). I think this is our best solution so far, and it
can be implemented quickly.
- an alternative communication channel JBoss Messaging added this a
year or two ago, where it uses a database in addition to JGroups to
keep track of cluster membership and detect discrepencies between the
two (I don't know all the details of the implementation)
Yes, a so called arbiter helps to increase chances of detecting a
partition, but it isn't fool-proof, e.g. if the partition has none of
the nodes being able to contact an arbiter database, or if multiple
partitions (with same number of nodes) are able to talk to the DB.
- something in JGroups to notify Infinispan of the difference
between
a normal node leave and being kicked (easily detectable inside
JGroups, but I don't think this is really exposed to apps?)
This could be done at the level of Infinispan (see my prev email). It
would detect graceful leaves, but not crashes or network splits
#1 should probably be pluggable, as a particular strategy may make
sense for specific use cases.
-Dennis
On 04/05/2013 08:53 AM, Manik Surtani wrote:
> Guys,
>
> So this is what I have in mind for this, looking for opinions.
>
> 1. We write a SplitBrainListener which is registered when the
> channel connects. The aim of this listener is to identify when we
> have a partition. This can be identified when a view change is
> detected, and the new view is significantly smaller than the old
> view. Easier to detect for large clusters, but smaller clusters
> will be harder - trying to decide between a node leaving vs a
> partition. (Any better ideas here?)
>
> 2. The SBL flips a switch in an interceptor
> (SplitBrainHandlerInterceptor?) which switches the node to be
> read-only (reject invocations that change the state of the local
> node) if it is in the smaller partition (newView.size <
> oldView.size / 2). Only works reliably for odd-numbered cluster
> sizes, and the issues with small clusters seen in (1) will affect
> here as well.
>
> 3. The SBL can flip the switch in the interceptor back to normal
> operation once a MergeView is detected.
>
> It's no way near perfect but at least it means that we can
> recommend enabling this and setting up an odd number of nodes, with
> a cluster size of at least N if you want to reduce inconsistency in
> your grid during partitions.
>
> Is this even useful?
>
> Bela, is there a more reliable mechanism to detect a split in (1)?
>
> Cheers Manik
>
> -- Manik Surtani manik(a)jboss.org
twitter.com/maniksurtani
>
> Platform Architect, JBoss Data Grid
http://red.ht/data-grid
>
>
> _______________________________________________ infinispan-dev
> mailing list infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________ infinispan-dev
mailing list infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Bela Ban, JGroups lead (
http://www.jgroups.org)