[infinispan-dev] design for cluster events (wiki page)
Mircea Markus
mmarkus at redhat.com
Fri Nov 1 12:56:05 EDT 2013
Exact same page, thanks for clarifications.
> On 1 Nov 2013, at 16:05, Sanne Grinovero <sanne at infinispan.org> wrote:
>
>> On 1 November 2013 11:56, Mircea Markus <mmarkus at redhat.com> wrote:
>>
>>> On Oct 31, 2013, at 10:20 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>>>
>>>> On 31 October 2013 20:07, Mircea Markus <mmarkus at redhat.com> wrote:
>>>>
>>>>> On Oct 31, 2013, at 3:45 PM, Dennis Reed <dereed at redhat.com> wrote:
>>>>>
>>>>>> On 10/31/2013 02:18 AM, Bela Ban wrote:
>>>>>>
>>>>>>> Also if we did have read only, what criteria would cause those nodes
>>>>>>> to be writeable again?
>>>>>> Once you become the primary partition, e.g. when a view is received
>>>>>> where view.size() >= N where N is a predefined threshold. Can be
>>>>>> different, as long as it is deterministic.
>>>>>>
>>>>>>> There is no guarantee when the other nodes
>>>>>>> will ever come back up or if there will ever be additional ones anytime soon.
>>>>>> If a system picks the Primary Partition approach, then it can become
>>>>>> completely inaccessible (read-only). In this case, I envisage that a
>>>>>> sysadmin will be notified, who can then start additional nodes for the
>>>>>> system to acquire primary partition and become accessible again.
>>>>>
>>>>> There should be a way to manually modify the primary partition status.
>>>>> So if the admin knows the nodes will never return, they can manually
>>>>> enable the partition.
>>>>
>>>> The status will be exposed through JMX at any point, disregarding if there's a split brain going on or not.
>>>>
>>>>>
>>>>> Also, the PartitionContext should know whether the nodes left normally
>>>>> or not.
>>>>> If you have 5 nodes in a cluster, and you shut down 3 of them, you'll
>>>>> want the remaining two to remain available.
>>>>> But if there was a network partition, you wouldn't. So it needs to know
>>>>> the difference.
>>>>
>>>> very good point again.
>>>> Thank you Dennis!
>>>
>>> Let's clarify. If 3 nodes out of 5 are killed without a
>>> reconfiguration, you do NOT want the remaining two to remain available
>>> unless explicitly told so by an admin. It is not possible to
>>> automatically make a distinction between 3 nodes being shut down vs. 3
>>> crashed nodes.
>>
>> I'm not sure you can make this generalization: it's really up to the implementor of PartitionHandlingStrategy to decide that. Knowing whether it was a clean shutdown or not might be relevant to that decision. I think the focus of this functionality should be on how exactly to react to partitions happening, but provide the hooks for the user to make that decision and act on system's availability.
>
> We're on the same page on that, I'm just stressing that there is not
> automatic way that we can make a distinction between crash or
> intentional shutdown, if we don't have the "clean shutdown" method
> like Bela also reminded.
>
>>
>>>
>>> In our face to face meeting we did point out that an admin needs hooks
>>> to be able to:
>>> - specify how many nodes are expected in the full system (and adapt
>>> dynamically)
>>
>> yes, that's an custom implementation of PartitionHandlingStrategy. One we might provide out of the box.
>
> Right it could be part of the default PartitionHandlingStrategy but I
> think all strategies might be interested in this, and that it's
> Infinispan (core) responsibility to also provide ways to admin the
> expected view at runtime.
>
>
>>> - some admin command to "clean shutdown" a node (which was also
>>> discussed as a strong requirement in scope of CacheStores so I'm
>>> assuming the operation is defined already)
>>>
>>> The design Wiki has captured the API we discussed around the
>>> PartitionHandlingStrategy but is missing the details about these
>>> operations, that should probably be added to the PartitionContext as
>>> well.
>>
>> The PartitionContext allows a partition to be marked as unavailable, I think that should do.
>
> You also need the "clean shutdown", very likely with the RPC suggested by Bela.
>
>
>>> Also in the scope of CacheStore consistency we had discussed the need
>>> to store the expected nodes to be in the View: for example when the
>>> grid is started and all nodes are finding each other, the Cache shall
>>> not be considered started until all required nodes have joined.
>>
>> the discussion is here: https://community.jboss.org/wiki/ControlledClusterShutdownWithDataRestoreFromPersistentStorage
>>
>>>
>>> Cheers,
>>> Sanne
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
More information about the infinispan-dev
mailing list