Hi David,
Dan and I had a talk about integrating my changes to the distribution
code to 5.x. As I mentioned below, the current code is quite brittle wrt
concurrent startup, so this will get fixed with my changes. I hope we
can backport this to the 4.2.x branch as well. As a matter of fact, I
actually made my changes on a branch off of 4.2.x.
On 4/5/11 2:51 PM, david marion wrote:
Bela,
Yes, it is a replicated cache and I used your udp-largecluster.xml file and just
modified it slightly. It does appear that the distributed cache is in a deadlock (or there
is a race condition), the coordinator comes up, but the other caches do not, they sit
there and wait. I was able to get a distributed cache up and running on 100+ nodes, now I
cannot get 5 of them running.
> Date: Tue, 5 Apr 2011 11:09:54 +0200
> From: bban(a)redhat.com
> To: infinispan-dev(a)lists.jboss.org
> Subject: Re: [infinispan-dev] Infinispan Large Scale support
>
>
>
> On 4/4/11 5:45 PM, david marion wrote:
>>
>>
>> Good news! I was able to use the system property from ISPN-83 and remove the
FLUSH from the jgroups config with 4.2.1.FINAL, and start-up times are much much better.
We have a replicated cache on about 420+ nodes up in under 2 minutes.
>
>
> Great ! Just to confirm: this is 420+ Infinispan instances, with
> replication enabled, correct ?
>
> Did you use a specific JGroups config (e.g. udp-largecluster.xml) ?
>
>
>> I am seeing an issue with the distributed cache though with as little as 5
nodes.
>>
>> In the coordinator log I see
>>
>> org.infinispan.distribution.DistributionmanagerImpl: Detected a view change.
Member list changed.......
>> org.infinispan.distribution.DistributionmanagerImpl: This is a JOIN event! Wait
for notification from new joiner<name>
>>
>> In the log from the joining node I see:
>>
>> org.infinispan.distribution.JoinTask: Commencing rehash on node:<name>.
Before start, distributionManager.joinComplete=false
>> org.infinispan.distribution.JoinTask: Requesting old consistent hash from
coordinator
>>
>> I jstack'd the joiner, the DefaultCacheManager.getCache() method is waiting
on org.infinispan.distribution.DistributionManagerImpl.waitForJoinToComplete() and the
Rehasher thread
>> is waiting on:
>>
>> at org.infinispan.util.concurrent.ReclosableLatch.await(ReclosableLatch.java:75)
>> at
org.infinipsan.remoting.transport.jgroups.JGroupsDistSync.blockUntilNoJoinsInProgress(JGroupsDistSync.java:113)
>>
>> Any thoughts?
>
>
> I recently took a look at the distribution code, and this part is very
> brittle with respect to parallel startup and merging. Plus, I believe
> the (blocking) RPC to fetch the old CH from the coordinator might
> deadlock in certain cases...
>
> I've got a pull request for a push based rebalancing versus pull based
> rebalancing pending. It'll likely make it into 5.x, as a matter of fact
> I've got a chat about this this afternoon.
>
>
>
>
>>> Date: Wed, 23 Mar 2011 15:58:19 +0100
>>> From: bban(a)redhat.com
>>> To: infinispan-dev(a)lists.jboss.org
>>> Subject: Re: [infinispan-dev] Infinispan Large Scale support
>>>
>>>
>>>
>>> On 3/23/11 2:39 PM, david marion wrote:
>>>>
>>>> Bela,
>>>>
>>>> Is there a way to start up the JGroups stack on every node without using
Infinispan?
>>>
>>>
>>> You could use ViewDemo [1] or Draw. Or write your own small test
>>> program; if you take a look at ViewDemo's src, you'll see that
it's onyl
>>> a page of code.
>>>
>>>
>>>> Is there some functional test that I can run or something? I know I
can't remove the FLUSH from Infinispan until 5.0.0 and I don't know if I can
upgrade the underlying
>>>> JGroups jar.
>>>
>>>
>>> I suggest test with the latest JGroups (2.12.0) and +FLUSH and -FLUSH.
>>> The +FLUSH config should be less painful now, with the introduction of
>>> view bundling: we need to run flush fewer times than before.
>>>
>>>
>>> [1]
http://community.jboss.org/wiki/TestingJBoss
>>>
>>> --
>>> Bela Ban
>>> Lead JGroups / Clustering Team
>>> JBoss
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Bela Ban
> Lead JGroups / Clustering Team
> JBoss
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev