[infinispan-dev] Exception during rehash

Tristan Tarrant tristan.tarrant at gmail.com
Tue Nov 23 09:50:20 EST 2010


Manik, did you have an opportunity to read the trace ?

Tristan

On Mon, Nov 22, 2010 at 18:49, Tristan Tarrant <tristan.tarrant at gmail.com>wrote:

> I have attached the trace to  https://jira.jboss.org/browse/ISPN-765
>
> If you need more info, just ask :)
>
>
> <https://jira.jboss.org/browse/ISPN-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564464#comment-12564464>
> Tristan
>
> On Mon, Nov 22, 2010 at 17:53, Manik Surtani <manik at jboss.org> wrote:
>
>> I believe the rehasher actually does do this.  Do you have a test that
>> consistently reproduces this?
>>
>> On 20 Nov 2010, at 15:36, Tristan Tarrant wrote:
>>
>> I can reproduce something similar every time. With the latest 4.2.x git,
>> if I wait for each node to initialize before starting the next everything is
>> fine: with an empty cache the time to start each node is in the order of a
>> couple of seconds. If instead, I start two or more nodes at the same time,
>> Infinispan seizes up for a much longer time. In the worst case I have seen
>> Infinispan time out after 20 minutes. I can provide exact traces (I am not
>> on my work computer at the moment), but I think that the rehasher should
>> queue up other nodes while it is busy rehashing. This is quite a serious
>> issue in my opinion and we should try and fix it before CR.
>>
>> Tristan
>>
>> On Mon, Nov 15, 2010 at 11:58, Galder Zamarreño <galder at redhat.com>wrote:
>>
>>> Sorry, I mean to say that "I'm pretty sure this is NOT a network issue".
>>>
>>> It might be mitigated that way yeah, but it'd be helpful for you to help
>>> us adding some TRACE logging for org.infinispan and attaching the startup
>>> log files of all parties involved if you can replicate the issue easily.
>>>
>>> On Nov 12, 2010, at 4:31 PM, Tristan Tarrant wrote:
>>>
>>> > Network issue is a bit vague. Would the problem be mitigated by
>>> manually starting each cache one at a time ?
>>> >
>>> > Tristan
>>> >
>>> > On Fri, Nov 12, 2010 at 16:09, Galder Zamarreño <galder at redhat.com>
>>> wrote:
>>> > I'm pretty sure this is a network issue. I saw it myself when testing
>>> concurrent Hot Rod startups, but only happened once of all the times I
>>> tried.
>>> >
>>> > On Nov 12, 2010, at 4:07 PM, Vladimir Blagojevic wrote:
>>> >
>>> > > This looks more like a network related issue than anything else. Can
>>> we move it jboss forums?
>>> > >
>>> > > On 2010-11-12, at 11:52 AM, Galder Zamarreño wrote:
>>> > >
>>> > >> Could be the same root cause behind
>>> https://jira.jboss.org/browse/ISPN-762 ?
>>> > >>
>>> > >> Mircea/Vladimir, did you guys change the rehashing code for beta1? I
>>> believe you did. If so, could you look at this?
>>> > >>
>>> > >> On Nov 11, 2010, at 3:25 PM, Tristan Tarrant wrote:
>>> > >>
>>> > >>> Dear all,
>>> > >>> we are using 4.2.0.BETA1 and we are getting the following message
>>> when a second node is started.
>>> > >>>
>>> > >>> 2010-11-11 15:04:38,397 INFO  [STDOUT] (main) 20101111 150438 INFO
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (main) Cache
>>> local address is asrab02-38676, physical addresses are [10.100.0.6:55448
>>> ]
>>> > >>> 2010-11-11 15:06:38,656 INFO  [STDOUT] (Rehasher-asrab02-38676)
>>> 20101111 150638 ERROR [org.infinispan.distribution.JoinTask]
>>> (Rehasher-asrab02-38676) Caught exception!
>>> > >>> org.infinispan.CacheException:
>>> org.infinispan.util.concurrent.TimeoutException: Timed out after 120 seconds
>>> waiting for a response from asrab01-58616
>>> > >>>   at
>>> org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:122)
>>> > >>>   at
>>> org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:403)
>>> > >>>   at
>>> org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:101)
>>> > >>>   at
>>> org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:125)
>>> > >>>   at
>>> org.infinispan.distribution.JoinTask.retrieveOldCH(JoinTask.java:187)
>>> > >>>   at
>>> org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:87)
>>> > >>>   at
>>> org.infinispan.distribution.RehashTask.call(RehashTask.java:53)
>>> > >>>   at
>>> org.infinispan.distribution.RehashTask.call(RehashTask.java:33)
>>> > >>>   at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>> > >>>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>> > >>>   at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>> > >>>   at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>> > >>>   at java.lang.Thread.run(Thread.java:619)
>>> > >>> Caused by: org.infinispan.util.concurrent.TimeoutException: Timed
>>> out after 120 seconds waiting for a response from asrab01-58616
>>> > >>>   at
>>> org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:304)
>>> > >>>   at
>>> org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:120)
>>> > >>>   ... 12 more
>>> > >>> 2010-11-11 15:06:38,657 INFO  [STDOUT] (Rehasher-asrab02-38676)
>>> 20101111 150638 INFO  [org.infinispan.distribution.TransactionLoggerImpl]
>>> (Rehasher-asrab02-38676) Stopping transaction logging
>>> > >>> 2010-11-11 15:06:38,657 INFO  [STDOUT] (Rehasher-asrab02-38676)
>>> 20101111 150638 INFO  [org.infinispan.distribution.JoinTask]
>>> (Rehasher-asrab02-38676) asrab02-38676 completed join rehash!
>>> > >>>
>>> > >>> The first node prints out the following:
>>> > >>>
>>> > >>> 2010-11-11 15:01:55,725 INFO  [STDOUT]
>>> (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) 20101111 150155
>>> INFO  [org.infinispan.distribution.DistributionManagerImpl]
>>> (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) Detected a view
>>> change.  Member list changed from [asrab01-58616] to [asrab01-58616,
>>> asrab02-38676]
>>> > >>> .....
>>> > >>> 2010-11-11 15:01:55,731 INFO  [STDOUT]
>>> (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) 20101111 150155
>>> INFO  [org.infinispan.distribution.DistributionManagerImpl]
>>> (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) This is a JOIN
>>> event!  Wait for notification from new joiner asrab02-38676
>>> > >>>
>>> > >>>
>>> > >>> What does it mean ?
>>> > >>>
>>> > >>> Thanks in advance
>>> > >>>
>>> > >>> Tristan
>>> > >>> _______________________________________________
>>> > >>> infinispan-dev mailing list
>>> > >>> infinispan-dev at lists.jboss.org
>>> > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> > >>
>>> > >> --
>>> > >> Galder Zamarreño
>>> > >> Sr. Software Engineer
>>> > >> Infinispan, JBoss Cache
>>> > >>
>>> > >>
>>> > >> _______________________________________________
>>> > >> infinispan-dev mailing list
>>> > >> infinispan-dev at lists.jboss.org
>>> > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> > >
>>> > >
>>> > > _______________________________________________
>>> > > infinispan-dev mailing list
>>> > > infinispan-dev at lists.jboss.org
>>> > > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >
>>> > --
>>> > Galder Zamarreño
>>> > Sr. Software Engineer
>>> > Infinispan, JBoss Cache
>>> >
>>> >
>>> > _______________________________________________
>>> > infinispan-dev mailing list
>>> > infinispan-dev at lists.jboss.org
>>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >
>>> > _______________________________________________
>>> > infinispan-dev mailing list
>>> > infinispan-dev at lists.jboss.org
>>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> --
>>> Galder Zamarreño
>>> Sr. Software Engineer
>>> Infinispan, JBoss Cache
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>>  --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20101123/764ec050/attachment-0001.html 


More information about the infinispan-dev mailing list