Re: [infinispan-dev] design for cluster events (wiki page)

Friday, 1 November 2013

On 10/31/13 11:20 PM, Sanne Grinovero wrote:
...
 On 31 October 2013 20:07, Mircea Markus <mmarkus(a)redhat.com&gt;
wrote:
>
> On Oct 31, 2013, at 3:45 PM, Dennis Reed <dereed(a)redhat.com&gt; wrote:
>
>> On 10/31/2013 02:18 AM, Bela Ban wrote:
>>>
>>>> Also if we did have read only, what criteria would cause those nodes
>>>> to be writeable again?
>>> Once you become the primary partition, e.g. when a view is received
>>> where view.size() >= N where N is a predefined threshold. Can be
>>> different, as long as it is deterministic.
>>>
>>>> There is no guarantee when the other nodes
>>>> will ever come back up or if there will ever be additional ones anytime
soon.
>>> If a system picks the Primary Partition approach, then it can become
>>> completely inaccessible (read-only). In this case, I envisage that a
>>> sysadmin will be notified, who can then start additional nodes for the
>>> system to acquire primary partition and become accessible again.
>>
>> There should be a way to manually modify the primary partition status.
>> So if the admin knows the nodes will never return, they can manually
>> enable the partition.
>
> The status will be exposed through JMX at any point, disregarding if there's a
split brain going on or not.
>
>>
>> Also, the PartitionContext should know whether the nodes left normally
>> or not.
>> If you have 5 nodes in a cluster, and you shut down 3 of them, you'll
>> want the remaining two to remain available.
>> But if there was a network partition, you wouldn't.  So it needs to know
>> the difference.
>
> very good point again.
> Thank you Dennis!

 Let's clarify. If 3 nodes out of 5 are killed without a
 reconfiguration, you do NOT want the remaining two to remain available
 unless explicitly told so by an admin. It is not possible to
 automatically make a distinction between 3 nodes being shut down vs. 3
 crashed nodes. 

We could determine that a node left *gracefully* by sending an RPC 
before leaving. But for all other cases, we don't know whether a node 
got partitioned away, or whether it crashed.

For the graceful-leave case, we could say that we can go below the 
read-only threshold to remain available. This would increase overall 
availability a bit.

...
 In our face to face meeting we did point out that an admin needs
hooks
 to be able to:
   - specify how many nodes are expected in the full system (and adapt
 dynamically)
   - some admin command to "clean shutdown" a node (which was also
 discussed as a strong requirement in scope of CacheStores so I'm
 assuming the operation is defined already)

 The design Wiki has captured the API we discussed around the
 PartitionHandlingStrategy but is missing the details about these
 operations, that should probably be added to the PartitionContext as
 well.

 Also in the scope of CacheStore consistency we had discussed the need
 to store the expected nodes to be in the View: for example when the
 grid is started and all nodes are finding each other, the Cache shall
 not be considered started until all required nodes have joined. 

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] design for cluster events (wiki page)