[infinispan-dev] ISPN-263 and handling partitions
Manik Surtani
msurtani at redhat.com
Mon Apr 22 12:43:41 EDT 2013
On 22 Apr 2013, at 14:47, Bela Ban <bban at redhat.com> wrote:
>
>
> On 4/19/13 11:51 AM, Sanne Grinovero wrote:
>> TBH I'm not understanding which problem this thread is about :)
>>
>> Surely network partitions are a problem, but there are many forms of
>> "partition", and many different opinions of what an "acceptable"
>> behaviour is that the grid should implement, which largely depend on
>> assumptions the client application is making.
>
>
> This thread is not about providing different application data merge
> policies, which is also something that needs to be done, but more likely
> in the context of eventual consistency in Infinispan. This thread is
> about a *primary partition* approach which says that only members in a
> primary partition are allowed to make progress, and others aren't.
>
> 'No progress' needs to be defined, but so far I think we've agreed on a
> notion of read-only members, which do provide access to data, but don't
> allow modifications. One could also think about not even allowing reads
> as the data might be stale. Perhaps this is configurable.
+1
> The idea is that if only a primary partition can make progress, and only
> *one* primary partitions exists in the system at any time, then we can
> simply overwrite the data of minority partitions with data from the
> primary partition on a merge.
>
> So a primary partition approach is not about how to merge (possibly
> conflicting) data after a cluster split heals, but about how to
> *prevent* conflicting data in the first place.
>
> If you think about this in CAP terminology, we sacrifice availability in
> favor of consistency.
Precisely. This would still follow the strongly consistent model.
>
>
>> Since we seem to be discussing a case in which the minority group is
>> expected to flip into read-only mode, could we step back and describe:
>> - why this is an acceptable solution for some class of applications?
>> - what kind of potential network failure we want to take compensating
>> actions for?
>
>
> There *are* no compensating actions, as we avoid the different branches
> of the same key, contrary to eventual consistency which is all about
> merging conflicting branches of a key.
>
>
>> I'm not an expert on how people physically wire up single nodes, racks
>> and rooms to allow for our virtual connections, but let's assume that
>> all nodes are connected with a single "cable" between each other, or
>> if concrete multiple cables are actually used, could we rely on system
>> configuration to guarantee packets can find alternative routes if one
>> wire is eaten by mice?
>
>
> In my experience, dropped packets due to this almost never happen. Most
> partitions (in Infinispan/JGroups) occur because of the following things:
> - Maxed out thread pool at one or more members, which leads to missed
> heartbeat acks and false suspicions
> - Misconfigured switch / firewall, especially if members are in
> different subnets
> - Buggy firmware in the switch, e.g. dropping multicasts every now and
> then (IGMP snooping)
> - Small packet queues, so packets are discarded
> - GC inhibiting heartbeats for some time
>
>
>> It seems important to me to define what level of network failure we
>> want to address, for example are we assuming we don't deal with cases
>> in which nodes can talk to one group but not vice-versa?
>
> We cannot guarantee this won't happen. As a matter of fact, I've seen
> this in practice, and MERGE{2,3} contain code that deals with asymmetric
> partitions.
>
>
>> If the effect of a nework failure is a completely isolated group, can
>> we assume Hot Rod clients can't reach them either?
>
>
> Partitions may include some clients and some cluster nodes, we cannot
> assume they cleanly separate clients from server nodes. Unfortunately... :-)
>
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik at jboss.org
twitter.com/maniksurtani
Platform Architect, JBoss Data Grid
http://red.ht/data-grid
More information about the infinispan-dev
mailing list