[infinispan-dev] Exception during rehash

Tristan Tarrant tristan.tarrant at gmail.com
Sat Nov 20 10:36:32 EST 2010


I can reproduce something similar every time. With the latest 4.2.x git, if
I wait for each node to initialize before starting the next everything is
fine: with an empty cache the time to start each node is in the order of a
couple of seconds. If instead, I start two or more nodes at the same time,
Infinispan seizes up for a much longer time. In the worst case I have seen
Infinispan time out after 20 minutes. I can provide exact traces (I am not
on my work computer at the moment), but I think that the rehasher should
queue up other nodes while it is busy rehashing. This is quite a serious
issue in my opinion and we should try and fix it before CR.

Tristan

On Mon, Nov 15, 2010 at 11:58, Galder Zamarreño <galder at redhat.com> wrote:

> Sorry, I mean to say that "I'm pretty sure this is NOT a network issue".
>
> It might be mitigated that way yeah, but it'd be helpful for you to help us
> adding some TRACE logging for org.infinispan and attaching the startup log
> files of all parties involved if you can replicate the issue easily.
>
> On Nov 12, 2010, at 4:31 PM, Tristan Tarrant wrote:
>
> > Network issue is a bit vague. Would the problem be mitigated by manually
> starting each cache one at a time ?
> >
> > Tristan
> >
> > On Fri, Nov 12, 2010 at 16:09, Galder Zamarreño <galder at redhat.com>
> wrote:
> > I'm pretty sure this is a network issue. I saw it myself when testing
> concurrent Hot Rod startups, but only happened once of all the times I
> tried.
> >
> > On Nov 12, 2010, at 4:07 PM, Vladimir Blagojevic wrote:
> >
> > > This looks more like a network related issue than anything else. Can we
> move it jboss forums?
> > >
> > > On 2010-11-12, at 11:52 AM, Galder Zamarreño wrote:
> > >
> > >> Could be the same root cause behind
> https://jira.jboss.org/browse/ISPN-762 ?
> > >>
> > >> Mircea/Vladimir, did you guys change the rehashing code for beta1? I
> believe you did. If so, could you look at this?
> > >>
> > >> On Nov 11, 2010, at 3:25 PM, Tristan Tarrant wrote:
> > >>
> > >>> Dear all,
> > >>> we are using 4.2.0.BETA1 and we are getting the following message
> when a second node is started.
> > >>>
> > >>> 2010-11-11 15:04:38,397 INFO  [STDOUT] (main) 20101111 150438 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (main) Cache
> local address is asrab02-38676, physical addresses are [10.100.0.6:55448]
> > >>> 2010-11-11 15:06:38,656 INFO  [STDOUT] (Rehasher-asrab02-38676)
> 20101111 150638 ERROR [org.infinispan.distribution.JoinTask]
> (Rehasher-asrab02-38676) Caught exception!
> > >>> org.infinispan.CacheException:
> org.infinispan.util.concurrent.TimeoutException: Timed out after 120 seconds
> waiting for a response from asrab01-58616
> > >>>   at
> org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:122)
> > >>>   at
> org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:403)
> > >>>   at
> org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:101)
> > >>>   at
> org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:125)
> > >>>   at
> org.infinispan.distribution.JoinTask.retrieveOldCH(JoinTask.java:187)
> > >>>   at
> org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:87)
> > >>>   at org.infinispan.distribution.RehashTask.call(RehashTask.java:53)
> > >>>   at org.infinispan.distribution.RehashTask.call(RehashTask.java:33)
> > >>>   at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > >>>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > >>>   at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > >>>   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > >>>   at java.lang.Thread.run(Thread.java:619)
> > >>> Caused by: org.infinispan.util.concurrent.TimeoutException: Timed out
> after 120 seconds waiting for a response from asrab01-58616
> > >>>   at
> org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:304)
> > >>>   at
> org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:120)
> > >>>   ... 12 more
> > >>> 2010-11-11 15:06:38,657 INFO  [STDOUT] (Rehasher-asrab02-38676)
> 20101111 150638 INFO  [org.infinispan.distribution.TransactionLoggerImpl]
> (Rehasher-asrab02-38676) Stopping transaction logging
> > >>> 2010-11-11 15:06:38,657 INFO  [STDOUT] (Rehasher-asrab02-38676)
> 20101111 150638 INFO  [org.infinispan.distribution.JoinTask]
> (Rehasher-asrab02-38676) asrab02-38676 completed join rehash!
> > >>>
> > >>> The first node prints out the following:
> > >>>
> > >>> 2010-11-11 15:01:55,725 INFO  [STDOUT]
> (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) 20101111 150155
> INFO  [org.infinispan.distribution.DistributionManagerImpl]
> (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) Detected a view
> change.  Member list changed from [asrab01-58616] to [asrab01-58616,
> asrab02-38676]
> > >>> .....
> > >>> 2010-11-11 15:01:55,731 INFO  [STDOUT]
> (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) 20101111 150155
> INFO  [org.infinispan.distribution.DistributionManagerImpl]
> (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) This is a JOIN
> event!  Wait for notification from new joiner asrab02-38676
> > >>>
> > >>>
> > >>> What does it mean ?
> > >>>
> > >>> Thanks in advance
> > >>>
> > >>> Tristan
> > >>> _______________________________________________
> > >>> infinispan-dev mailing list
> > >>> infinispan-dev at lists.jboss.org
> > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> > >>
> > >> --
> > >> Galder Zamarreño
> > >> Sr. Software Engineer
> > >> Infinispan, JBoss Cache
> > >>
> > >>
> > >> _______________________________________________
> > >> infinispan-dev mailing list
> > >> infinispan-dev at lists.jboss.org
> > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> > >
> > >
> > > _______________________________________________
> > > infinispan-dev mailing list
> > > infinispan-dev at lists.jboss.org
> > > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> > --
> > Galder Zamarreño
> > Sr. Software Engineer
> > Infinispan, JBoss Cache
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20101120/c0cef1f9/attachment.html 


More information about the infinispan-dev mailing list