[infinispan-dev] Suppressing state transfer via JMX

Adrian Nistor anistor at redhat.com
Fri May 31 13:26:03 EDT 2013


A little correction in my previous email is need. When I said "Note that 
if suppressRehashing==false operation excludeNodes(..) just queues ... " 
I actualy meant to say suppressRehashing==true.

On 05/31/2013 07:40 PM, Adrian Nistor wrote:
> Yes, ISPN-1394 has a broader scope but the proposed solution for 
> ISPN-3140 solves quite a lot of ISPN-1394 and it's not complex. We 
> might not even need ISPN-1394 soon unless somebody really wants to 
> control data ownership down to segment granularity. If we only want to 
> batch joins/leaves and manually kick out nodes with or without loosing 
> their data then this proposal should be enough. This solution should 
> not prevent implementation of ISPN-1394 in future and will not need to 
> be removed/undone.
>
> Here are the details:
>
> 1. /Add a JMX writable attribute (or operation?) to 
> ClusterTopologyManager (name it suppressRehashing?) that is false by 
> default but should also be configurable via API or xml. While this 
> attribute is true the ClusterTopologyManager queues all 
> join/leave/exclude(see below) requests and does not execute them on 
> the spot as it would normally happen. The value of this attribute is 
> ignored on all nodes but the coordinator. When it is set back to false 
> all queued operations (except the ones that cancel eachother out) are 
> executed. The setter should be synchronous so when setting is back to 
> false it does not return until the queue is empty and all rehashing 
> was processed. /
>
> 2. /We add a JMX operation excludeNodes(list of addresses) to 
> ClusterTopologyManager. Calling this method on any node but the 
> coordinator is no-op. This operation removes the node from the 
> topology (almost as if it left) and forces a rebalance./ The node is 
> still present in the current CH but not in the pending CH. It's 
> basically disowned by all its data which is now being transferred to 
> other (not excluded) nodes. At the end of the rebalance the node is 
> removed from topology for good and can be shut down without loosing 
> data. Note that if suppressRehashing==false operation excludeNodes(..) 
> just queues them for later removal. We can batch multiple such 
> exclusions and then re-activate the rehashing.
>
> The parts that need to be implemented are written in italic above. 
> Everything else is already there.
>
> excludeNodes is a way of achieving a soft shutdown and should be used 
> only if we care about preserving data int the extreme case where the 
> nodes are the last/single owners. We can just kill the node directly 
> if we do not care about its data.
>
> suppressRehashing is a way of achieving some kind of batching of 
> topology changes. This should speed up state transfer a lot because it 
> avoids a lot of pointless reshuffling of data segments when we have 
> many successive joiners/leavers.
>
> So what happens if the current coordinator dies for whatever reason? 
> The new one will take control and will not have knowledge of the 
> existing rehash queue or the previous status of suppressRehashing 
> attribute so it will just get the current cache membership status from 
> all members of current view and proceed with the rehashing as usual. 
> If the user does not want this he can set a default value of true for 
> suppressRehashing. The admin has to interact now via JMX with the new 
> coordinator. But that's not as bad as the alternative where all the 
> nodes are involved in this jmx scheme :) I think having only the 
> coordinator involved in this is a plus.
>
> Manik, how does this fit for the full and partial shutdown?
>
> Cheers
> Adi
>
>
> On 05/31/2013 04:20 PM, Manik Surtani wrote:
>>
>> On 31 May 2013, at 13:52, Dan Berindei <dan.berindei at gmail.com 
>> <mailto:dan.berindei at gmail.com>> wrote:
>>
>>> If we only want to deal with full cluster shutdown, then I think 
>>> stopping all application requests, calling Cache.clear() on one 
>>> node, and then shutting down all the nodes should be simpler. On 
>>> start, assuming no cache store, the caches will start empty, so 
>>> starting all the nodes at once and only allowing application 
>>> requests when they've all joined should also work without extra work.
>>>
>>> If we only want to stop a part of the cluster, suppressing 
>>> rebalancing would be better, because we wouldn't lose all the data. 
>>> But we'd still lose the keys whose owners are all among the nodes we 
>>> want to stop. I've discussed this with Adrian, and we think if we 
>>> want to stop a part of the cluster without losing data we need a JMX 
>>> operation on the coordinator that will "atomically" remove a set of 
>>> nodes from the CH. After the operation completes, the user will know 
>>> it's safe to stop those nodes without losing data.
>>
>> I think the no-data-loss option is bigger scope, perhaps part of 
>> ISPN-1394.  And that's not what I am asking about.
>>
>>> When it comes to starting a part of the cluster, a "pause 
>>> rebalancing" option would probably be better - but again, on the 
>>> coordinator, not on each joining node. And clearly, if more than 
>>> numOwner nodes leave while rebalancing is suspended, data will be lost.
>>
>> Yup.  This sort of option would only be used where data loss isn't an 
>> issue (such as a distributed cache).  Where data loss is an issue, 
>> we'd need more control - ISPN-1394.
>>
>>>
>>> Cheers
>>> Dan
>>>
>>>
>>>
>>> On Fri, May 31, 2013 at 12:17 PM, Manik Surtani <msurtani at redhat.com 
>>> <mailto:msurtani at redhat.com>> wrote:
>>>
>>>     Guys
>>>
>>>     We've discussed ISPN-3140 elsewhere before, I'm brining it to
>>>     this forum now.
>>>
>>>     https://issues.jboss.org/browse/ISPN-3140
>>>
>>>     Any thoughts/concerns?  Particularly looking to hear from Dan or
>>>     Adrian about viability, complexity, ease of implementation.
>>>
>>>     Thanks
>>>     Manik
>>>     --
>>>     Manik Surtani
>>>     manik at jboss.org <mailto:manik at jboss.org>
>>>     twitter.com/maniksurtani <http://twitter.com/maniksurtani>
>>>
>>>     Platform Architect, JBoss Data Grid
>>>     http://red.ht/data-grid
>>>
>>>
>>>     _______________________________________________
>>>     infinispan-dev mailing list
>>>     infinispan-dev at lists.jboss.org
>>>     <mailto:infinispan-dev at lists.jboss.org>
>>>     https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org <mailto:infinispan-dev at lists.jboss.org>
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Manik Surtani
>> manik at jboss.org <mailto:manik at jboss.org>
>> twitter.com/maniksurtani <http://twitter.com/maniksurtani>
>>
>> Platform Architect, JBoss Data Grid
>> http://red.ht/data-grid
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20130531/d9624251/attachment-0001.html 


More information about the infinispan-dev mailing list