[infinispan-dev] ISPN-263 and handling partitions

Mon Apr 22 09:47:29 EDT 2013

On 4/19/13 11:51 AM, Sanne Grinovero wrote:
> TBH I'm not understanding which problem this thread is about :)
>
> Surely network partitions are a problem, but there are many forms of
> "partition", and many different opinions of what an "acceptable"
> behaviour is that the grid should implement, which largely depend on
> assumptions the client application is making.

This thread is not about providing different application data merge 
policies, which is also something that needs to be done, but more likely 
in the context of eventual consistency in Infinispan. This thread is 
about a *primary partition* approach which says that only members in a 
primary partition are allowed to make progress, and others aren't.

'No progress' needs to be defined, but so far I think we've agreed on a 
notion of read-only members, which do provide access to data, but don't 
allow modifications. One could also think about not even allowing reads 
as the data might be stale. Perhaps this is configurable.

The idea is that if only a primary partition can make progress, and only 
*one* primary partitions exists in the system at any time, then we can 
simply overwrite the data of minority partitions with data from the 
primary partition on a merge.

So a primary partition approach is not about how to merge (possibly 
conflicting) data after a cluster split heals, but about how to 
*prevent* conflicting data in the first place.

If you think about this in CAP terminology, we sacrifice availability in 
favor of consistency.

> Since we seem to be discussing a case in which the minority group is
> expected to flip into read-only mode, could we step back and describe:
>   - why this is an acceptable solution for some class of applications?
>   - what kind of potential network failure we want to take compensating
> actions for?

There *are* no compensating actions, as we avoid the different branches 
of the same key, contrary to eventual consistency which is all about 
merging conflicting branches of a key.

> I'm not an expert on how people physically wire up single nodes, racks
> and rooms to allow for our virtual connections, but let's assume that
> all nodes are connected with a single "cable" between each other, or
> if concrete multiple cables are actually used, could we rely on system
> configuration to guarantee packets can find alternative routes if one
> wire is eaten by mice?

In my experience, dropped packets due to this almost never happen. Most 
partitions (in Infinispan/JGroups) occur because of the following things:
- Maxed out thread pool at one or more members, which leads to missed 
heartbeat acks and false suspicions
- Misconfigured switch / firewall, especially if members are in 
different subnets
- Buggy firmware in the switch, e.g. dropping multicasts every now and 
then (IGMP snooping)
- Small packet queues, so packets are discarded
- GC inhibiting heartbeats for some time

> It seems important to me to define what level of network failure we
> want to address, for example are we assuming we don't deal with cases
> in which nodes can talk to one group but not vice-versa?

We cannot guarantee this won't happen. As a matter of fact, I've seen 
this in practice, and MERGE{2,3} contain code that deals with asymmetric 
partitions.

> If the effect of a nework failure is a completely isolated group, can
> we assume Hot Rod clients can't reach them either?

Partitions may include some clients and some cluster nodes, we cannot 
assume they cleanly separate clients from server nodes. Unfortunately... :-)

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)