[infinispan-dev] ISPN-263 and handling partitions

Manik Surtani msurtani at redhat.com
Mon Apr 22 12:43:41 EDT 2013


On 22 Apr 2013, at 14:47, Bela Ban <bban at redhat.com> wrote:

> 
> 
> On 4/19/13 11:51 AM, Sanne Grinovero wrote:
>> TBH I'm not understanding which problem this thread is about :)
>> 
>> Surely network partitions are a problem, but there are many forms of
>> "partition", and many different opinions of what an "acceptable"
>> behaviour is that the grid should implement, which largely depend on
>> assumptions the client application is making.
> 
> 
> This thread is not about providing different application data merge 
> policies, which is also something that needs to be done, but more likely 
> in the context of eventual consistency in Infinispan. This thread is 
> about a *primary partition* approach which says that only members in a 
> primary partition are allowed to make progress, and others aren't.
> 
> 'No progress' needs to be defined, but so far I think we've agreed on a 
> notion of read-only members, which do provide access to data, but don't 
> allow modifications. One could also think about not even allowing reads 
> as the data might be stale. Perhaps this is configurable.

+1

> The idea is that if only a primary partition can make progress, and only 
> *one* primary partitions exists in the system at any time, then we can 
> simply overwrite the data of minority partitions with data from the 
> primary partition on a merge.
> 
> So a primary partition approach is not about how to merge (possibly 
> conflicting) data after a cluster split heals, but about how to 
> *prevent* conflicting data in the first place.
> 
> If you think about this in CAP terminology, we sacrifice availability in 
> favor of consistency.

Precisely.  This would still follow the strongly consistent model.

> 
> 
>> Since we seem to be discussing a case in which the minority group is
>> expected to flip into read-only mode, could we step back and describe:
>>  - why this is an acceptable solution for some class of applications?
>>  - what kind of potential network failure we want to take compensating
>> actions for?
> 
> 
> There *are* no compensating actions, as we avoid the different branches 
> of the same key, contrary to eventual consistency which is all about 
> merging conflicting branches of a key.
> 
> 
>> I'm not an expert on how people physically wire up single nodes, racks
>> and rooms to allow for our virtual connections, but let's assume that
>> all nodes are connected with a single "cable" between each other, or
>> if concrete multiple cables are actually used, could we rely on system
>> configuration to guarantee packets can find alternative routes if one
>> wire is eaten by mice?
> 
> 
> In my experience, dropped packets due to this almost never happen. Most 
> partitions (in Infinispan/JGroups) occur because of the following things:
> - Maxed out thread pool at one or more members, which leads to missed 
> heartbeat acks and false suspicions
> - Misconfigured switch / firewall, especially if members are in 
> different subnets
> - Buggy firmware in the switch, e.g. dropping multicasts every now and 
> then (IGMP snooping)
> - Small packet queues, so packets are discarded
> - GC inhibiting heartbeats for some time
> 
> 
>> It seems important to me to define what level of network failure we
>> want to address, for example are we assuming we don't deal with cases
>> in which nodes can talk to one group but not vice-versa?
> 
> We cannot guarantee this won't happen. As a matter of fact, I've seen 
> this in practice, and MERGE{2,3} contain code that deals with asymmetric 
> partitions.
> 
> 
>> If the effect of a nework failure is a completely isolated group, can
>> we assume Hot Rod clients can't reach them either?
> 
> 
> Partitions may include some clients and some cluster nodes, we cannot 
> assume they cleanly separate clients from server nodes. Unfortunately... :-)
> 
> 
> -- 
> Bela Ban, JGroups lead (http://www.jgroups.org)
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
twitter.com/maniksurtani

Platform Architect, JBoss Data Grid
http://red.ht/data-grid




More information about the infinispan-dev mailing list