Sounds good, although I think ISPN-3140 should remain in its current scope and just
address point 1 below. We can create a separate JIRA for point 2, since I think even
point 1 on its own is useful for some use cases (as you say, where data loss isn't a
concern).
On 31 May 2013, at 17:40, Adrian Nistor <anistor(a)redhat.com> wrote:
Yes, ISPN-1394 has a broader scope but the proposed solution for
ISPN-3140 solves quite a lot of ISPN-1394 and it's not complex. We might not even need
ISPN-1394 soon unless somebody really wants to control data ownership down to segment
granularity. If we only want to batch joins/leaves and manually kick out nodes with or
without loosing their data then this proposal should be enough. This solution should not
prevent implementation of ISPN-1394 in future and will not need to be removed/undone.
Here are the details:
1. Add a JMX writable attribute (or operation?) to ClusterTopologyManager (name it
suppressRehashing?) that is false by default but should also be configurable via API or
xml. While this attribute is true the ClusterTopologyManager queues all
join/leave/exclude(see below) requests and does not execute them on the spot as it would
normally happen. The value of this attribute is ignored on all nodes but the coordinator.
When it is set back to false all queued operations (except the ones that cancel eachother
out) are executed. The setter should be synchronous so when setting is back to
false it does not return until the queue is empty and all rehashing was processed.
2. We add a JMX operation excludeNodes(list of addresses) to ClusterTopologyManager.
Calling this method on any node but the coordinator is no-op. This operation removes the
node from the topology (almost as if it left) and forces a rebalance. The node is still
present in the current CH but not in the pending CH. It's basically disowned by all
its data which is now being transferred to other (not excluded) nodes. At the end of the
rebalance the node is removed from topology for good and can be shut down without loosing
data. Note that if suppressRehashing==false operation excludeNodes(..) just queues them
for later removal. We can batch multiple such exclusions and then re-activate the
rehashing.
The parts that need to be implemented are written in italic above. Everything else is
already there.
excludeNodes is a way of achieving a soft shutdown and should be used only if we care
about preserving data int the extreme case where the nodes are the last/single owners. We
can just kill the node directly if we do not care about its data.
suppressRehashing is a way of achieving some kind of batching of topology changes. This
should speed up state transfer a lot because it avoids a lot of pointless reshuffling of
data segments when we have many successive joiners/leavers.
So what happens if the current coordinator dies for whatever reason? The new one will
take control and will not have knowledge of the existing rehash queue or the previous
status of suppressRehashing attribute so it will just get the current cache membership
status from all members of current view and proceed with the rehashing as usual. If the
user does not want this he can set a default value of true for suppressRehashing. The
admin has to interact now via JMX with the new coordinator. But that's not as bad as
the alternative where all the nodes are involved in this jmx scheme :) I think having only
the coordinator involved in this is a plus.
Manik, how does this fit for the full and partial shutdown?
Cheers
Adi
On 05/31/2013 04:20 PM, Manik Surtani wrote:
>
> On 31 May 2013, at 13:52, Dan Berindei <dan.berindei(a)gmail.com> wrote:
>
>> If we only want to deal with full cluster shutdown, then I think stopping all
application requests, calling Cache.clear() on one node, and then shutting down all the
nodes should be simpler. On start, assuming no cache store, the caches will start empty,
so starting all the nodes at once and only allowing application requests when they've
all joined should also work without extra work.
>>
>> If we only want to stop a part of the cluster, suppressing rebalancing would be
better, because we wouldn't lose all the data. But we'd still lose the keys whose
owners are all among the nodes we want to stop. I've discussed this with Adrian, and
we think if we want to stop a part of the cluster without losing data we need a JMX
operation on the coordinator that will "atomically" remove a set of nodes from
the CH. After the operation completes, the user will know it's safe to stop those
nodes without losing data.
>
> I think the no-data-loss option is bigger scope, perhaps part of ISPN-1394. And
that's not what I am asking about.
>
>> When it comes to starting a part of the cluster, a "pause rebalancing"
option would probably be better - but again, on the coordinator, not on each joining node.
And clearly, if more than numOwner nodes leave while rebalancing is suspended, data will
be lost.
>
> Yup. This sort of option would only be used where data loss isn't an issue (such
as a distributed cache). Where data loss is an issue, we'd need more control -
ISPN-1394.
>
>>
>> Cheers
>> Dan
>>
>>
>>
>> On Fri, May 31, 2013 at 12:17 PM, Manik Surtani <msurtani(a)redhat.com>
wrote:
>> Guys
>>
>> We've discussed ISPN-3140 elsewhere before, I'm brining it to this forum
now.
>>
>>
https://issues.jboss.org/browse/ISPN-3140
>>
>> Any thoughts/concerns? Particularly looking to hear from Dan or Adrian about
viability, complexity, ease of implementation.
>>
>> Thanks
>> Manik
>> --
>> Manik Surtani
>> manik(a)jboss.org
>>
twitter.com/maniksurtani
>>
>> Platform Architect, JBoss Data Grid
>>
http://red.ht/data-grid
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik(a)jboss.org
>
twitter.com/maniksurtani
>
> Platform Architect, JBoss Data Grid
>
http://red.ht/data-grid
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev