Tristan, looks like Mircea's looking into this. I've done some investigation
yesterday, Mircea, pls ping me today and we can have a chat about this.
On 23 Nov 2010, at 16:16, Manik Surtani wrote:
Not as yet - I'll try and find some time this week.
On 23 Nov 2010, at 14:50, Tristan Tarrant wrote:
> Manik, did you have an opportunity to read the trace ?
>
> Tristan
>
> On Mon, Nov 22, 2010 at 18:49, Tristan Tarrant <tristan.tarrant(a)gmail.com>
wrote:
> I have attached the trace to
https://jira.jboss.org/browse/ISPN-765
>
> If you need more info, just ask :)
>
> Tristan
>
> On Mon, Nov 22, 2010 at 17:53, Manik Surtani <manik(a)jboss.org> wrote:
> I believe the rehasher actually does do this. Do you have a test that consistently
reproduces this?
>
> On 20 Nov 2010, at 15:36, Tristan Tarrant wrote:
>
>> I can reproduce something similar every time. With the latest 4.2.x git, if I
wait for each node to initialize before starting the next everything is fine: with an
empty cache the time to start each node is in the order of a couple of seconds. If
instead, I start two or more nodes at the same time, Infinispan seizes up for a much
longer time. In the worst case I have seen Infinispan time out after 20 minutes. I can
provide exact traces (I am not on my work computer at the moment), but I think that the
rehasher should queue up other nodes while it is busy rehashing. This is quite a serious
issue in my opinion and we should try and fix it before CR.
>>
>> Tristan
>>
>> On Mon, Nov 15, 2010 at 11:58, Galder Zamarreño <galder(a)redhat.com> wrote:
>> Sorry, I mean to say that "I'm pretty sure this is NOT a network
issue".
>>
>> It might be mitigated that way yeah, but it'd be helpful for you to help us
adding some TRACE logging for org.infinispan and attaching the startup log files of all
parties involved if you can replicate the issue easily.
>>
>> On Nov 12, 2010, at 4:31 PM, Tristan Tarrant wrote:
>>
>> > Network issue is a bit vague. Would the problem be mitigated by manually
starting each cache one at a time ?
>> >
>> > Tristan
>> >
>> > On Fri, Nov 12, 2010 at 16:09, Galder Zamarreño <galder(a)redhat.com>
wrote:
>> > I'm pretty sure this is a network issue. I saw it myself when testing
concurrent Hot Rod startups, but only happened once of all the times I tried.
>> >
>> > On Nov 12, 2010, at 4:07 PM, Vladimir Blagojevic wrote:
>> >
>> > > This looks more like a network related issue than anything else. Can we
move it jboss forums?
>> > >
>> > > On 2010-11-12, at 11:52 AM, Galder Zamarreño wrote:
>> > >
>> > >> Could be the same root cause behind
https://jira.jboss.org/browse/ISPN-762 ?
>> > >>
>> > >> Mircea/Vladimir, did you guys change the rehashing code for beta1?
I believe you did. If so, could you look at this?
>> > >>
>> > >> On Nov 11, 2010, at 3:25 PM, Tristan Tarrant wrote:
>> > >>
>> > >>> Dear all,
>> > >>> we are using 4.2.0.BETA1 and we are getting the following
message when a second node is started.
>> > >>>
>> > >>> 2010-11-11 15:04:38,397 INFO [STDOUT] (main) 20101111 150438
INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (main) Cache local
address is asrab02-38676, physical addresses are [10.100.0.6:55448]
>> > >>> 2010-11-11 15:06:38,656 INFO [STDOUT] (Rehasher-asrab02-38676)
20101111 150638 ERROR [org.infinispan.distribution.JoinTask] (Rehasher-asrab02-38676)
Caught exception!
>> > >>> org.infinispan.CacheException:
org.infinispan.util.concurrent.TimeoutException: Timed out after 120 seconds waiting for a
response from asrab01-58616
>> > >>> at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:122)
>> > >>> at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:403)
>> > >>> at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:101)
>> > >>> at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:125)
>> > >>> at
org.infinispan.distribution.JoinTask.retrieveOldCH(JoinTask.java:187)
>> > >>> at
org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:87)
>> > >>> at
org.infinispan.distribution.RehashTask.call(RehashTask.java:53)
>> > >>> at
org.infinispan.distribution.RehashTask.call(RehashTask.java:33)
>> > >>> at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > >>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > >>> at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> > >>> at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> > >>> at java.lang.Thread.run(Thread.java:619)
>> > >>> Caused by: org.infinispan.util.concurrent.TimeoutException:
Timed out after 120 seconds waiting for a response from asrab01-58616
>> > >>> at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:304)
>> > >>> at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:120)
>> > >>> ... 12 more
>> > >>> 2010-11-11 15:06:38,657 INFO [STDOUT] (Rehasher-asrab02-38676)
20101111 150638 INFO [org.infinispan.distribution.TransactionLoggerImpl]
(Rehasher-asrab02-38676) Stopping transaction logging
>> > >>> 2010-11-11 15:06:38,657 INFO [STDOUT] (Rehasher-asrab02-38676)
20101111 150638 INFO [org.infinispan.distribution.JoinTask] (Rehasher-asrab02-38676)
asrab02-38676 completed join rehash!
>> > >>>
>> > >>> The first node prints out the following:
>> > >>>
>> > >>> 2010-11-11 15:01:55,725 INFO [STDOUT]
(Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) 20101111 150155 INFO
[org.infinispan.distribution.DistributionManagerImpl]
(Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) Detected a view change. Member
list changed from [asrab01-58616] to [asrab01-58616, asrab02-38676]
>> > >>> .....
>> > >>> 2010-11-11 15:01:55,731 INFO [STDOUT]
(Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) 20101111 150155 INFO
[org.infinispan.distribution.DistributionManagerImpl]
(Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) This is a JOIN event! Wait for
notification from new joiner asrab02-38676
>> > >>>
>> > >>>
>> > >>> What does it mean ?
>> > >>>
>> > >>> Thanks in advance
>> > >>>
>> > >>> Tristan
>> > >>> _______________________________________________
>> > >>> infinispan-dev mailing list
>> > >>> infinispan-dev(a)lists.jboss.org
>> > >>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> > >>
>> > >> --
>> > >> Galder Zamarreño
>> > >> Sr. Software Engineer
>> > >> Infinispan, JBoss Cache
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> infinispan-dev mailing list
>> > >> infinispan-dev(a)lists.jboss.org
>> > >>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> > >
>> > >
>> > > _______________________________________________
>> > > infinispan-dev mailing list
>> > > infinispan-dev(a)lists.jboss.org
>> > >
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >
>> > --
>> > Galder Zamarreño
>> > Sr. Software Engineer
>> > Infinispan, JBoss Cache
>> >
>> >
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev(a)lists.jboss.org
>> >
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev(a)lists.jboss.org
>> >
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Galder Zamarreño
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik(a)jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
>
http://www.infinispan.org
>
http://www.jbosscache.org
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev