[infinispan-dev] ISPN-263 and handling partitions

Bela Ban bban at redhat.com
Wed Apr 17 06:24:37 EDT 2013


If we go with a primary partition approach, then only the primary 
partition will be allowed to make progress (a.k.a. accept changes), we 
therefore won't have any conflicts.

The partition approach must be chosen so there can only be 1 primary 
partition max, and the minority partitions shut down or turn read-only. 
When merging, minority partitions need to get the state from the primary 
partition, so state transfer on a merge always needs to flow from the 
primary partition to the minority partition(s).

I don't know how this could be done, but perhaps an approach would be to 
treat members of minority partitions on a merge as if they were fresh 
joiners ?

On 4/17/13 10:31 AM, Adrian Nistor wrote:
> In case of MergeView the cluster topology manager running on (the new)
> coordinator will request the current cache topology from all members and
> will compute a new topology as the union of all. The new topology id is
> computed as the max + 2 of the existing topology ids. Any currently
> pending rebalance in any subpartition is ended now and a new rebalance
> is triggered for the new cluster. No data version conflict resolution is
> performed => chaos :)
>
> On 04/16/2013 10:05 PM, Manik Surtani wrote:
>> Guys - I've started documenting this here [1] and will put together a prototype this week.
>>
>> One question though, perhaps one for Dan/Adrian - is there any special handling for state transfer if a MergeView is detected?
>>
>> - M
>>
>> [1] https://community.jboss.org/wiki/DesignDealingWithNetworkPartitions
>>
>> On 6 Apr 2013, at 04:26, Bela Ban <bban at redhat.com> wrote:
>>
>>>
>>> On 4/5/13 3:53 PM, Manik Surtani wrote:
>>>> Guys,
>>>>
>>>> So this is what I have in mind for this, looking for opinions.
>>>>
>>>> 1.  We write a SplitBrainListener which is registered when the
>>>> channel connects.  The aim of this listener is to identify when we
>>>> have a partition.  This can be identified when a view change is
>>>> detected, and the new view is significantly smaller than the old
>>>> view.  Easier to detect for large clusters, but smaller clusters will
>>>> be harder - trying to decide between a node leaving vs a partition.
>>>> (Any better ideas here?)
>>>>
>>>> 2.  The SBL flips a switch in an interceptor
>>>> (SplitBrainHandlerInterceptor?) which switches the node to be
>>>> read-only (reject invocations that change the state of the local
>>>> node) if it is in the smaller partition (newView.size < oldView.size
>>>> / 2).  Only works reliably for odd-numbered cluster sizes, and the
>>>> issues with small clusters seen in (1) will affect here as well.
>>>>
>>>> 3.  The SBL can flip the switch in the interceptor back to normal
>>>> operation once a MergeView is detected.
>>>>
>>>> It's no way near perfect but at least it means that we can recommend
>>>> enabling this and setting up an odd number of nodes, with a cluster
>>>> size of at least N if you want to reduce inconsistency in your grid
>>>> during partitions.
>>>>
>>>> Is this even useful?
>>>
>>> So I assume this is to shut down (or make read-only) non primary
>>> partitions. I'd go with an approach similar to [1] section 5.6.2, which
>>> makes a partition read-only once it drops below a certain number of nodes N.
>>>
>>>
>>>> Bela, is there a more reliable mechanism to detect a split in (1)?
>>> I'm afraid no. We never know whether a large number of members being
>>> removed from the view means that they left, or that we have a partition,
>>> e.g. because a switch crashed.
>>>
>>> One thing you could do though is for members who are about to leave
>>> regularly to broadcast a LEAVE messages, so that when the view is
>>> received, the SBL knows those members, and might be able to determine
>>> better whether we have a partition, or not.
>>>
>>> [1] http://www.jgroups.org/manual-3.x/html/user-advanced.html, section 5.6.2
>>>
>>> --
>>> Bela Ban, JGroups lead (http://www.jgroups.org)
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> --
>> Manik Surtani
>> manik at jboss.org
>> twitter.com/maniksurtani
>>
>> Platform Architect, JBoss Data Grid
>> http://red.ht/data-grid
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)


More information about the infinispan-dev mailing list