[infinispan-dev] ISPN-263 and handling partitions
Manik Surtani
msurtani at redhat.com
Wed Apr 17 10:47:52 EDT 2013
On 17 Apr 2013, at 07:24, Bela Ban <bban at redhat.com> wrote:
> If we go with a primary partition approach, then only the primary
> partition will be allowed to make progress (a.k.a. accept changes), we
> therefore won't have any conflicts.
>
> The partition approach must be chosen so there can only be 1 primary
> partition max, and the minority partitions shut down or turn read-only.
> When merging, minority partitions need to get the state from the primary
> partition, so state transfer on a merge always needs to flow from the
> primary partition to the minority partition(s).
Correct. This was the 'special behaviour' that I was asking for, to check whether this state transfer from primary partition to secondary partitions happen during a merge, or whether the minority partition nodes are just wiped and treated as fresh joiners.
>
> I don't know how this could be done, but perhaps an approach would be to
> treat members of minority partitions on a merge as if they were fresh
> joiners ?
>
> On 4/17/13 10:31 AM, Adrian Nistor wrote:
>> In case of MergeView the cluster topology manager running on (the new)
>> coordinator will request the current cache topology from all members and
>> will compute a new topology as the union of all. The new topology id is
>> computed as the max + 2 of the existing topology ids. Any currently
>> pending rebalance in any subpartition is ended now and a new rebalance
>> is triggered for the new cluster. No data version conflict resolution is
>> performed => chaos :)
>>
>> On 04/16/2013 10:05 PM, Manik Surtani wrote:
>>> Guys - I've started documenting this here [1] and will put together a prototype this week.
>>>
>>> One question though, perhaps one for Dan/Adrian - is there any special handling for state transfer if a MergeView is detected?
>>>
>>> - M
>>>
>>> [1] https://community.jboss.org/wiki/DesignDealingWithNetworkPartitions
>>>
>>> On 6 Apr 2013, at 04:26, Bela Ban <bban at redhat.com> wrote:
>>>
>>>>
>>>> On 4/5/13 3:53 PM, Manik Surtani wrote:
>>>>> Guys,
>>>>>
>>>>> So this is what I have in mind for this, looking for opinions.
>>>>>
>>>>> 1. We write a SplitBrainListener which is registered when the
>>>>> channel connects. The aim of this listener is to identify when we
>>>>> have a partition. This can be identified when a view change is
>>>>> detected, and the new view is significantly smaller than the old
>>>>> view. Easier to detect for large clusters, but smaller clusters will
>>>>> be harder - trying to decide between a node leaving vs a partition.
>>>>> (Any better ideas here?)
>>>>>
>>>>> 2. The SBL flips a switch in an interceptor
>>>>> (SplitBrainHandlerInterceptor?) which switches the node to be
>>>>> read-only (reject invocations that change the state of the local
>>>>> node) if it is in the smaller partition (newView.size < oldView.size
>>>>> / 2). Only works reliably for odd-numbered cluster sizes, and the
>>>>> issues with small clusters seen in (1) will affect here as well.
>>>>>
>>>>> 3. The SBL can flip the switch in the interceptor back to normal
>>>>> operation once a MergeView is detected.
>>>>>
>>>>> It's no way near perfect but at least it means that we can recommend
>>>>> enabling this and setting up an odd number of nodes, with a cluster
>>>>> size of at least N if you want to reduce inconsistency in your grid
>>>>> during partitions.
>>>>>
>>>>> Is this even useful?
>>>>
>>>> So I assume this is to shut down (or make read-only) non primary
>>>> partitions. I'd go with an approach similar to [1] section 5.6.2, which
>>>> makes a partition read-only once it drops below a certain number of nodes N.
>>>>
>>>>
>>>>> Bela, is there a more reliable mechanism to detect a split in (1)?
>>>> I'm afraid no. We never know whether a large number of members being
>>>> removed from the view means that they left, or that we have a partition,
>>>> e.g. because a switch crashed.
>>>>
>>>> One thing you could do though is for members who are about to leave
>>>> regularly to broadcast a LEAVE messages, so that when the view is
>>>> received, the SBL knows those members, and might be able to determine
>>>> better whether we have a partition, or not.
>>>>
>>>> [1] http://www.jgroups.org/manual-3.x/html/user-advanced.html, section 5.6.2
>>>>
>>>> --
>>>> Bela Ban, JGroups lead (http://www.jgroups.org)
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> --
>>> Manik Surtani
>>> manik at jboss.org
>>> twitter.com/maniksurtani
>>>
>>> Platform Architect, JBoss Data Grid
>>> http://red.ht/data-grid
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik at jboss.org
twitter.com/maniksurtani
Platform Architect, JBoss Data Grid
http://red.ht/data-grid
More information about the infinispan-dev
mailing list