[infinispan-dev] ISPN-263 and handling partitions

Bela Ban bban at redhat.com
Sat Apr 6 03:35:32 EDT 2013



On 4/5/13 11:02 PM, Dennis Reed wrote:
> Some other alternatives for detecting a split:
>
> - a hard-coded number of nodes required for a quorum configured by
> the customer. This is what we recommend for HASIngletons when the
> customer wants to guarantee that something is running at most once in
> the cluster.


Yes - this is what I described in my previous email (section 5.6.3 of 
the JGroups manual). I think this is our best solution so far, and it 
can be implemented quickly.


> - an alternative communication channel JBoss Messaging added this a
> year or two ago, where it uses a database in addition to JGroups to
> keep track of cluster membership and detect discrepencies between the
> two (I don't know all the details of the implementation)

Yes, a so called arbiter helps to increase chances of detecting a 
partition, but it isn't fool-proof, e.g. if the partition has none of 
the nodes being able to contact an arbiter database, or if multiple 
partitions (with same number of nodes) are able to talk to the DB.


> - something in JGroups to notify Infinispan of the difference between
> a normal node leave and being kicked (easily detectable inside
> JGroups, but I don't think this is really exposed to apps?)


This could be done at the level of Infinispan (see my prev email). It 
would detect graceful leaves, but not crashes or network splits


> #1 should probably be pluggable, as a particular strategy may make
> sense for specific use cases.
>
> -Dennis
>
> On 04/05/2013 08:53 AM, Manik Surtani wrote:
>> Guys,
>>
>> So this is what I have in mind for this, looking for opinions.
>>
>> 1.  We write a SplitBrainListener which is registered when the
>> channel connects.  The aim of this listener is to identify when we
>> have a partition.  This can be identified when a view change is
>> detected, and the new view is significantly smaller than the old
>> view.  Easier to detect for large clusters, but smaller clusters
>> will be harder - trying to decide between a node leaving vs a
>> partition.  (Any better ideas here?)
>>
>> 2.  The SBL flips a switch in an interceptor
>> (SplitBrainHandlerInterceptor?) which switches the node to be
>> read-only (reject invocations that change the state of the local
>> node) if it is in the smaller partition (newView.size <
>> oldView.size / 2).  Only works reliably for odd-numbered cluster
>> sizes, and the issues with small clusters seen in (1) will affect
>> here as well.
>>
>> 3.  The SBL can flip the switch in the interceptor back to normal
>> operation once a MergeView is detected.
>>
>> It's no way near perfect but at least it means that we can
>> recommend enabling this and setting up an odd number of nodes, with
>> a cluster size of at least N if you want to reduce inconsistency in
>> your grid during partitions.
>>
>> Is this even useful?
>>
>> Bela, is there a more reliable mechanism to detect a split in (1)?
>>
>> Cheers Manik
>>
>> -- Manik Surtani manik at jboss.org twitter.com/maniksurtani
>>
>> Platform Architect, JBoss Data Grid http://red.ht/data-grid
>>
>>
>> _______________________________________________ infinispan-dev
>> mailing list infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________ infinispan-dev
> mailing list infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)


More information about the infinispan-dev mailing list