[infinispan-dev] ISPN-263 and handling partitions

Manik Surtani msurtani at redhat.com
Wed Apr 17 10:47:52 EDT 2013


On 17 Apr 2013, at 07:24, Bela Ban <bban at redhat.com> wrote:

> If we go with a primary partition approach, then only the primary 
> partition will be allowed to make progress (a.k.a. accept changes), we 
> therefore won't have any conflicts.
> 
> The partition approach must be chosen so there can only be 1 primary 
> partition max, and the minority partitions shut down or turn read-only. 
> When merging, minority partitions need to get the state from the primary 
> partition, so state transfer on a merge always needs to flow from the 
> primary partition to the minority partition(s).

Correct.  This was the 'special behaviour' that I was asking for, to check whether this state transfer from primary partition to secondary partitions happen during a merge, or whether the minority partition nodes are just wiped and treated as fresh joiners.

> 
> I don't know how this could be done, but perhaps an approach would be to 
> treat members of minority partitions on a merge as if they were fresh 
> joiners ?
> 
> On 4/17/13 10:31 AM, Adrian Nistor wrote:
>> In case of MergeView the cluster topology manager running on (the new)
>> coordinator will request the current cache topology from all members and
>> will compute a new topology as the union of all. The new topology id is
>> computed as the max + 2 of the existing topology ids. Any currently
>> pending rebalance in any subpartition is ended now and a new rebalance
>> is triggered for the new cluster. No data version conflict resolution is
>> performed => chaos :)
>> 
>> On 04/16/2013 10:05 PM, Manik Surtani wrote:
>>> Guys - I've started documenting this here [1] and will put together a prototype this week.
>>> 
>>> One question though, perhaps one for Dan/Adrian - is there any special handling for state transfer if a MergeView is detected?
>>> 
>>> - M
>>> 
>>> [1] https://community.jboss.org/wiki/DesignDealingWithNetworkPartitions
>>> 
>>> On 6 Apr 2013, at 04:26, Bela Ban <bban at redhat.com> wrote:
>>> 
>>>> 
>>>> On 4/5/13 3:53 PM, Manik Surtani wrote:
>>>>> Guys,
>>>>> 
>>>>> So this is what I have in mind for this, looking for opinions.
>>>>> 
>>>>> 1.  We write a SplitBrainListener which is registered when the
>>>>> channel connects.  The aim of this listener is to identify when we
>>>>> have a partition.  This can be identified when a view change is
>>>>> detected, and the new view is significantly smaller than the old
>>>>> view.  Easier to detect for large clusters, but smaller clusters will
>>>>> be harder - trying to decide between a node leaving vs a partition.
>>>>> (Any better ideas here?)
>>>>> 
>>>>> 2.  The SBL flips a switch in an interceptor
>>>>> (SplitBrainHandlerInterceptor?) which switches the node to be
>>>>> read-only (reject invocations that change the state of the local
>>>>> node) if it is in the smaller partition (newView.size < oldView.size
>>>>> / 2).  Only works reliably for odd-numbered cluster sizes, and the
>>>>> issues with small clusters seen in (1) will affect here as well.
>>>>> 
>>>>> 3.  The SBL can flip the switch in the interceptor back to normal
>>>>> operation once a MergeView is detected.
>>>>> 
>>>>> It's no way near perfect but at least it means that we can recommend
>>>>> enabling this and setting up an odd number of nodes, with a cluster
>>>>> size of at least N if you want to reduce inconsistency in your grid
>>>>> during partitions.
>>>>> 
>>>>> Is this even useful?
>>>> 
>>>> So I assume this is to shut down (or make read-only) non primary
>>>> partitions. I'd go with an approach similar to [1] section 5.6.2, which
>>>> makes a partition read-only once it drops below a certain number of nodes N.
>>>> 
>>>> 
>>>>> Bela, is there a more reliable mechanism to detect a split in (1)?
>>>> I'm afraid no. We never know whether a large number of members being
>>>> removed from the view means that they left, or that we have a partition,
>>>> e.g. because a switch crashed.
>>>> 
>>>> One thing you could do though is for members who are about to leave
>>>> regularly to broadcast a LEAVE messages, so that when the view is
>>>> received, the SBL knows those members, and might be able to determine
>>>> better whether we have a partition, or not.
>>>> 
>>>> [1] http://www.jgroups.org/manual-3.x/html/user-advanced.html, section 5.6.2
>>>> 
>>>> --
>>>> Bela Ban, JGroups lead (http://www.jgroups.org)
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> --
>>> Manik Surtani
>>> manik at jboss.org
>>> twitter.com/maniksurtani
>>> 
>>> Platform Architect, JBoss Data Grid
>>> http://red.ht/data-grid
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
> 
> -- 
> Bela Ban, JGroups lead (http://www.jgroups.org)
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
twitter.com/maniksurtani

Platform Architect, JBoss Data Grid
http://red.ht/data-grid




More information about the infinispan-dev mailing list