[infinispan-dev] Infinispan Large Scale support

Erik Salter esalter at bnivideo.com
Wed Apr 6 09:23:51 EDT 2011


Hi Bela,

I'm interested in your changes as well, as concurrent startup vexes my cache usage as well.  Is there a wiki or JIRA I could look at to understand the fundamental differences?

Thanks,

Erik

-----Original Message-----
From: infinispan-dev-bounces at lists.jboss.org [mailto:infinispan-dev-bounces at lists.jboss.org] On Behalf Of Bela Ban
Sent: Wednesday, April 06, 2011 2:46 AM
To: infinispan-dev at lists.jboss.org
Subject: Re: [infinispan-dev] Infinispan Large Scale support

Hi David,

Dan and I had a talk about integrating my changes to the distribution code to 5.x. As I mentioned below, the current code is quite brittle wrt concurrent startup, so this will get fixed with my changes. I hope we can backport this to the 4.2.x branch as well. As a matter of fact, I actually made my changes on a branch off of 4.2.x.

On 4/5/11 2:51 PM, david marion wrote:
>
> Bela,
>
>    Yes, it is a replicated cache and I used your udp-largecluster.xml file and just modified it slightly. It does appear that the distributed cache is in a deadlock (or there is a race condition), the coordinator comes up, but the other caches do not, they sit there and wait. I was able to get a distributed cache up and running on 100+ nodes, now I cannot get 5 of them running.
>
>> Date: Tue, 5 Apr 2011 11:09:54 +0200
>> From: bban at redhat.com
>> To: infinispan-dev at lists.jboss.org
>> Subject: Re: [infinispan-dev] Infinispan Large Scale support
>>
>>
>>
>> On 4/4/11 5:45 PM, david marion wrote:
>>>
>>>
>>> Good news! I was able to use the system property from ISPN-83 and remove the FLUSH from the jgroups config with 4.2.1.FINAL, and start-up times are much much better. We have a replicated cache on about 420+ nodes up in under 2 minutes.
>>
>>
>> Great ! Just to confirm: this is 420+ Infinispan instances, with
>> replication enabled, correct ?
>>
>> Did you use a specific JGroups config (e.g. udp-largecluster.xml) ?
>>
>>
>>> I am seeing an issue with the distributed cache though with as little as 5 nodes.
>>>
>>> In the coordinator log I see
>>>
>>> org.infinispan.distribution.DistributionmanagerImpl: Detected a view change. Member list changed.......
>>> org.infinispan.distribution.DistributionmanagerImpl: This is a JOIN
>>> event! Wait for notification from new joiner<name>
>>>
>>> In the log from the joining node I see:
>>>
>>> org.infinispan.distribution.JoinTask: Commencing rehash on
>>> node:<name>. Before start, distributionManager.joinComplete=false
>>> org.infinispan.distribution.JoinTask: Requesting old consistent hash
>>> from coordinator
>>>
>>> I jstack'd the joiner, the DefaultCacheManager.getCache() method is
>>> waiting on org.infinispan.distribution.DistributionManagerImpl.waitForJoinToComplete() and the Rehasher thread is waiting on:
>>>
>>> at
>>> org.infinispan.util.concurrent.ReclosableLatch.await(ReclosableLatch
>>> .java:75) at
>>> org.infinipsan.remoting.transport.jgroups.JGroupsDistSync.blockUntil
>>> NoJoinsInProgress(JGroupsDistSync.java:113)
>>>
>>> Any thoughts?
>>
>>
>> I recently took a look at the distribution code, and this part is
>> very brittle with respect to parallel startup and merging. Plus, I
>> believe the (blocking) RPC to fetch the old CH from the coordinator
>> might deadlock in certain cases...
>>
>> I've got a pull request for a push based rebalancing versus pull
>> based rebalancing pending. It'll likely make it into 5.x, as a matter
>> of fact I've got a chat about this this afternoon.
>>
>>
>>
>>
>>>> Date: Wed, 23 Mar 2011 15:58:19 +0100
>>>> From: bban at redhat.com
>>>> To: infinispan-dev at lists.jboss.org
>>>> Subject: Re: [infinispan-dev] Infinispan Large Scale support
>>>>
>>>>
>>>>
>>>> On 3/23/11 2:39 PM, david marion wrote:
>>>>>
>>>>> Bela,
>>>>>
>>>>> Is there a way to start up the JGroups stack on every node without using Infinispan?
>>>>
>>>>
>>>> You could use ViewDemo [1] or Draw. Or write your own small test
>>>> program; if you take a look at ViewDemo's src, you'll see that it's
>>>> onyl a page of code.
>>>>
>>>>
>>>>> Is there some functional test that I can run or something? I know
>>>>> I can't remove the FLUSH from Infinispan until 5.0.0 and I don't know if I can upgrade the underlying JGroups jar.
>>>>
>>>>
>>>> I suggest test with the latest JGroups (2.12.0) and +FLUSH and -FLUSH.
>>>> The +FLUSH config should be less painful now, with the introduction
>>>> of view bundling: we need to run flush fewer times than before.
>>>>
>>>>
>>>> [1] http://community.jboss.org/wiki/TestingJBoss
>>>>
>>>> --
>>>> Bela Ban
>>>> Lead JGroups / Clustering Team
>>>> JBoss
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Bela Ban
Lead JGroups / Clustering Team
JBoss
_______________________________________________
infinispan-dev mailing list
infinispan-dev at lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

The information contained in this message is legally privileged and confidential, and is intended for the individual or entity to whom it is addressed (or their designee). If this message is read by anyone other than the intended recipient, please be advised that distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete or destroy all copies of this message.



More information about the infinispan-dev mailing list