[infinispan-dev] Exception during rehash

Tristan Tarrant tristan.tarrant at gmail.com
Mon Nov 22 12:49:16 EST 2010


I have attached the trace to  https://jira.jboss.org/browse/ISPN-765

If you need more info, just ask :)

<https://jira.jboss.org/browse/ISPN-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564464#comment-12564464>
Tristan

On Mon, Nov 22, 2010 at 17:53, Manik Surtani <manik at jboss.org> wrote:

> I believe the rehasher actually does do this.  Do you have a test that
> consistently reproduces this?
>
> On 20 Nov 2010, at 15:36, Tristan Tarrant wrote:
>
> I can reproduce something similar every time. With the latest 4.2.x git, if
> I wait for each node to initialize before starting the next everything is
> fine: with an empty cache the time to start each node is in the order of a
> couple of seconds. If instead, I start two or more nodes at the same time,
> Infinispan seizes up for a much longer time. In the worst case I have seen
> Infinispan time out after 20 minutes. I can provide exact traces (I am not
> on my work computer at the moment), but I think that the rehasher should
> queue up other nodes while it is busy rehashing. This is quite a serious
> issue in my opinion and we should try and fix it before CR.
>
> Tristan
>
> On Mon, Nov 15, 2010 at 11:58, Galder Zamarreño <galder at redhat.com> wrote:
>
>> Sorry, I mean to say that "I'm pretty sure this is NOT a network issue".
>>
>> It might be mitigated that way yeah, but it'd be helpful for you to help
>> us adding some TRACE logging for org.infinispan and attaching the startup
>> log files of all parties involved if you can replicate the issue easily.
>>
>> On Nov 12, 2010, at 4:31 PM, Tristan Tarrant wrote:
>>
>> > Network issue is a bit vague. Would the problem be mitigated by manually
>> starting each cache one at a time ?
>> >
>> > Tristan
>> >
>> > On Fri, Nov 12, 2010 at 16:09, Galder Zamarreño <galder at redhat.com>
>> wrote:
>> > I'm pretty sure this is a network issue. I saw it myself when testing
>> concurrent Hot Rod startups, but only happened once of all the times I
>> tried.
>> >
>> > On Nov 12, 2010, at 4:07 PM, Vladimir Blagojevic wrote:
>> >
>> > > This looks more like a network related issue than anything else. Can
>> we move it jboss forums?
>> > >
>> > > On 2010-11-12, at 11:52 AM, Galder Zamarreño wrote:
>> > >
>> > >> Could be the same root cause behind
>> https://jira.jboss.org/browse/ISPN-762 ?
>> > >>
>> > >> Mircea/Vladimir, did you guys change the rehashing code for beta1? I
>> believe you did. If so, could you look at this?
>> > >>
>> > >> On Nov 11, 2010, at 3:25 PM, Tristan Tarrant wrote:
>> > >>
>> > >>> Dear all,
>> > >>> we are using 4.2.0.BETA1 and we are getting the following message
>> when a second node is started.
>> > >>>
>> > >>> 2010-11-11 15:04:38,397 INFO  [STDOUT] (main) 20101111 150438 INFO
>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (main) Cache
>> local address is asrab02-38676, physical addresses are [10.100.0.6:55448]
>> > >>> 2010-11-11 15:06:38,656 INFO  [STDOUT] (Rehasher-asrab02-38676)
>> 20101111 150638 ERROR [org.infinispan.distribution.JoinTask]
>> (Rehasher-asrab02-38676) Caught exception!
>> > >>> org.infinispan.CacheException:
>> org.infinispan.util.concurrent.TimeoutException: Timed out after 120 seconds
>> waiting for a response from asrab01-58616
>> > >>>   at
>> org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:122)
>> > >>>   at
>> org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:403)
>> > >>>   at
>> org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:101)
>> > >>>   at
>> org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:125)
>> > >>>   at
>> org.infinispan.distribution.JoinTask.retrieveOldCH(JoinTask.java:187)
>> > >>>   at
>> org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:87)
>> > >>>   at org.infinispan.distribution.RehashTask.call(RehashTask.java:53)
>> > >>>   at org.infinispan.distribution.RehashTask.call(RehashTask.java:33)
>> > >>>   at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > >>>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > >>>   at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> > >>>   at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> > >>>   at java.lang.Thread.run(Thread.java:619)
>> > >>> Caused by: org.infinispan.util.concurrent.TimeoutException: Timed
>> out after 120 seconds waiting for a response from asrab01-58616
>> > >>>   at
>> org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:304)
>> > >>>   at
>> org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:120)
>> > >>>   ... 12 more
>> > >>> 2010-11-11 15:06:38,657 INFO  [STDOUT] (Rehasher-asrab02-38676)
>> 20101111 150638 INFO  [org.infinispan.distribution.TransactionLoggerImpl]
>> (Rehasher-asrab02-38676) Stopping transaction logging
>> > >>> 2010-11-11 15:06:38,657 INFO  [STDOUT] (Rehasher-asrab02-38676)
>> 20101111 150638 INFO  [org.infinispan.distribution.JoinTask]
>> (Rehasher-asrab02-38676) asrab02-38676 completed join rehash!
>> > >>>
>> > >>> The first node prints out the following:
>> > >>>
>> > >>> 2010-11-11 15:01:55,725 INFO  [STDOUT]
>> (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) 20101111 150155
>> INFO  [org.infinispan.distribution.DistributionManagerImpl]
>> (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) Detected a view
>> change.  Member list changed from [asrab01-58616] to [asrab01-58616,
>> asrab02-38676]
>> > >>> .....
>> > >>> 2010-11-11 15:01:55,731 INFO  [STDOUT]
>> (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) 20101111 150155
>> INFO  [org.infinispan.distribution.DistributionManagerImpl]
>> (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) This is a JOIN
>> event!  Wait for notification from new joiner asrab02-38676
>> > >>>
>> > >>>
>> > >>> What does it mean ?
>> > >>>
>> > >>> Thanks in advance
>> > >>>
>> > >>> Tristan
>> > >>> _______________________________________________
>> > >>> infinispan-dev mailing list
>> > >>> infinispan-dev at lists.jboss.org
>> > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> > >>
>> > >> --
>> > >> Galder Zamarreño
>> > >> Sr. Software Engineer
>> > >> Infinispan, JBoss Cache
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> infinispan-dev mailing list
>> > >> infinispan-dev at lists.jboss.org
>> > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> > >
>> > >
>> > > _______________________________________________
>> > > infinispan-dev mailing list
>> > > infinispan-dev at lists.jboss.org
>> > > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >
>> > --
>> > Galder Zamarreño
>> > Sr. Software Engineer
>> > Infinispan, JBoss Cache
>> >
>> >
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Galder Zamarreño
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20101122/c05e3c80/attachment.html 


More information about the infinispan-dev mailing list