On 10/30/13 8:28 PM, William Burns wrote:
Since it seems I can't comment on the wiki itself, I am just
replying here.
I wonder if the third option 'Primary partition' is desirable. I
think availability in some cases would be harmed more than we would
like.
Lets say you have a 5 node cluster where 3 of the nodes are behind the
same router and the remaining 2 are behind a different one. If the
router crashes, power loss etc. for the 3 and are no longer
addressable you have your 2 partitions (possibly 1 or even 4). When
this occurs the other 2 nodes would go into read only mode since they
lost the quorum check.
Yes, this is intended. Actually, the minority partition {D,E} might even
become totally inaccessible, ie. rejecting *all* requests (also reads).
This is in line with the Primary Partition approach where a majority
partition is allowed to make progress, and all minority partitions shut
down. In terms of CAP, we're sacrificing availabilty here in favor of
consistency.
But the 3 nodes that are "writable" can't be
accessed any longer and thus no writes can be performed on the cluster.
You mean some clients cannot access {A,B,C} ? Sure, then so be it, but
at least we don't have any inconsistent state. Again, PP is *one* tool
we give to th user to handle partitions.
It seems we would still want to allow writes to provide as
high of availability as possible.
PP is *not* about availability, it is about consistency. Good for some
apps, bad for others. If you pick PP, you lose availability.
Also if we did have read only, what criteria would cause those nodes
to be writeable again?
Once you become the primary partition, e.g. when a view is received
where view.size() >= N where N is a predefined threshold. Can be
different, as long as it is deterministic.
There is no guarantee when the other nodes
will ever come back up or if there will ever be additional ones anytime soon.
If a system picks the Primary Partition approach, then it can become
completely inaccessible (read-only). In this case, I envisage that a
sysadmin will be notified, who can then start additional nodes for the
system to acquire primary partition and become accessible again.
--
Bela Ban, JGroups lead (
http://www.jgroups.org)