[infinispan-dev] Proposal: ISPN-1394 Manual rehashing in 5.2

Manik Surtani manik at jboss.org
Sat Feb 4 09:49:54 EST 2012


On 1 Feb 2012, at 12:23, Dan Berindei wrote:

> Bela, you're right, this is essentially what we talked about in Lisbon:
> https://community.jboss.org/wiki/AsymmetricCachesAndManualRehashingDesign
> 
> For joins I actually started working on a policy of coalescing joins
> that happen one after the other in a short time interval. The current
> implementation is very primitive, as I shifted focus to stability, but
> it does coalesce joins 1 second after another join started (or while
> that join is still running).
> 
> I don't quite agree with Sanne's assessment that it's fine for
> getCache() to block for 5 minutes until the administrator allows the
> new node to join. We should modify startCaches() instead to signal to
> the coordinator that we are ready to receive data for one or all of
> the defined caches, and wait with a customizable time limit until the
> caches have properly joined the cluster.
> 
> The getCache() timeout should not be increased at all. Instead I would
> propose that getCache() returns a functional cache immediately, even
> if the cache didn't receive any data, and it works solely as an L1
> cache until the administrator allows it to join. I'd even make it
> possible to designate a cache as an L1-only cache, so it's never an
> owner for any key.

I presume this would be encoded in the Address?  That would make sense for a node permanently designated as an L1 node.  But then how would this work for a node temporarily acting as L1 only, until it has been allowed to join?  Change the Address instance on the fly?  A delegating Address?  :/

> For leaves, the main problem is that every node has to compute the
> same primary owner for a key, at all times. So we need a 2PC cache
> view installation immediately after any leave to ensure that every
> node determines the primary owner in the same way - we can't coalesce
> or postpone leaves.

Yes, manual rehashing would probably just be for joins.  Controlled shutdown in itself is manual, and crashes, well, need to be dealt with immediately IMO.

> 
> For 5.2 I will try to decouple the cache view installation from the
> state transfer, so in theory we will be able to coalesce/postpone the
> state transfer for leaves as well
> (https://issues.jboss.org/browse/ISPN-1827). I'm kind of need it for
> non-blocking state transfer, because with the current implementation a
> leave forces us to cancel any state transfer in progress and restart
> with the updated cache view - a state transfer rollback will be very
> expensive with NBST.
> 
> 
> Erik does raise a valid point - with TACH, if we bring up a node with
> a different siteId, then it will be an owner for all the keys in the
> cache. That node probably isn't provisioned to hold all the keys, so
> it would very likely run out of memory or evict much of the data. I
> guess that makes it a 5.2 issue?

Yes.

> Shutting down a site should be possible even with what we have now -
> just insert a DISCARD protocol in the JGroups stack of all the nodes
> that are shutting down, and when FD finally times out on the nodes in
> the surviving datacenter they won't have any state transfer to do
> (although it may cause a few failed state transfer attempts). We could
> make it simpler though.
> 
> 
> Cheers
> Dan
> 
> 
> On Tue, Jan 31, 2012 at 6:21 PM, Erik Salter <an1310 at hotmail.com> wrote:
>> ...such as bringing up a backup data center.
>> 
>> -----Original Message-----
>> From: infinispan-dev-bounces at lists.jboss.org
>> [mailto:infinispan-dev-bounces at lists.jboss.org] On Behalf Of Bela Ban
>> Sent: Tuesday, January 31, 2012 11:18 AM
>> To: infinispan-dev at lists.jboss.org
>> Subject: Re: [infinispan-dev] Proposal: ISPN-1394 Manual rehashing in 5.2
>> 
>> I cannot volunteer either, but I find it important to be done in 5.2 !
>> 
>> Unless rehashing works flawlessly with a large number of nodes joining
>> at the same time, I think manual rehashing is crucial...
>> 
>> 
>> 
>> On 1/31/12 5:13 PM, Sanne Grinovero wrote:
>>> On 31 January 2012 16:06, Bela Ban<bban at redhat.com>  wrote:
>>>> This is essentially what I suggested at the Lisbon meeting, right ?
>>> 
>>> Yes!
>>> 
>>>> I think Dan had a design wiki on this somewhere...
>>> 
>>> Just rising it here as it was moved to 6.0, while I think it deserves
>>> a dedicated thread to better think about it. If it's not hard, I think
>>> it should be done sooner.
>>> But while I started the thread to wake up the brilliant minds, I can't
>>> volunteer for this to make it happen.
>>> 
>>> Sanne
>>> 
>>>> 
>>>> 
>>>> On 1/31/12 4:53 PM, Sanne Grinovero wrote:
>>>>> I think this is an important feature to have soon;
>>>>> 
>>>>> My understanding of it:
>>>>> 
>>>>> We default with the feature off, and newly discovered nodes are
>>>>> added/removed as usual. With a JMX operatable switch, one can disable
>>>>> this:
>>>>> 
>>>>> If a remote node is joining the JGroups view, but rehash is off: it
>>>>> will be added to a to-be-installed view, but this won't be installed
>>>>> until rehash is enabled again. This gives time to add more changes
>>>>> before starting the rehash, and would help a lot to start larger
>>>>> clusters.
>>>>> 
>>>>> If the [self] node is booting and joining a cluster with manual rehash
>>>>> off, the start process and any getCache() invocation should block and
>>>>> wait for it to be enabled. This would need of course to override the
>>>>> usually low timeouts.
>>>>> 
>>>>> When a node is suspected it's a bit a different story as we need to
>>>>> make sure no data is lost. The principle is the same, but maybe we
>>>>> should have two flags: one which is a "soft request" to avoid rehashes
>>>>> of less than N members (and refuse N>=numOwners ?), one which is just
>>>>> disable it and don't care: data might be in a cachestore, data might
>>>>> not be important. Which reminds me, we should consider as well a JMX
>>>>> command to flush the container to the CacheLoader.
>>>>> 
>>>>> --Sanne
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> --
>>>> Bela Ban
>>>> Lead JGroups (http://www.jgroups.org)
>>>> JBoss / Red Hat
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> Bela Ban
>> Lead JGroups (http://www.jgroups.org)
>> JBoss / Red Hat
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org






More information about the infinispan-dev mailing list