[infinispan-issues] [JBoss JIRA] Commented: (ISPN-765) Concurrent startup leading to rehashing issues
Tristan Tarrant (JIRA)
jira-events at lists.jboss.org
Mon Nov 15 01:47:43 EST 2010
[ https://jira.jboss.org/browse/ISPN-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563310#comment-12563310 ]
Tristan Tarrant commented on ISPN-765:
--------------------------------------
We got this too:
2010-11-11 15:04:38,397 INFO [STDOUT] (main) 20101111 150438 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (main) Cache local address is asrab02-38676, physical addresses are [10.100.0.6:55448]
2010-11-11 15:06:38,656 INFO [STDOUT] (Rehasher-asrab02-38676) 20101111 150638 ERROR [org.infinispan.distribution.JoinTask] (Rehasher-asrab02-38676) Caught exception!
org.infinispan.CacheException: org.infinispan.util.concurrent.TimeoutException: Timed out after 120 seconds waiting for a response from asrab01-58616
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:122)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:403)
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:101)
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:125)
at org.infinispan.distribution.JoinTask.retrieveOldCH(JoinTask.java:187)
at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:87)
at org.infinispan.distribution.RehashTask.call(RehashTask.java:53)
at org.infinispan.distribution.RehashTask.call(RehashTask.java:33)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.infinispan.util.concurrent.TimeoutException: Timed out after 120 seconds waiting for a response from asrab01-58616
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:304)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:120)
... 12 more
2010-11-11 15:06:38,657 INFO [STDOUT] (Rehasher-asrab02-38676) 20101111 150638 INFO [org.infinispan.distribution.TransactionLoggerImpl] (Rehasher-asrab02-38676) Stopping transaction logging
2010-11-11 15:06:38,657 INFO [STDOUT] (Rehasher-asrab02-38676) 20101111 150638 INFO [org.infinispan.distribution.JoinTask] (Rehasher-asrab02-38676) asrab02-38676 completed join rehash!
The first node prints out the following:
2010-11-11 15:01:55,725 INFO [STDOUT] (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) 20101111 150155 INFO [org.infinispan.distribution.DistributionManagerImpl] (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) Detected a view change. Member list changed from [asrab01-58616] to [asrab01-58616, asrab02-38676]
.....
2010-11-11 15:01:55,731 INFO [STDOUT] (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) 20101111 150155 INFO [org.infinispan.distribution.DistributionManagerImpl] (Incoming-1,ngs-infinispan-preprod_cluster,asrab01-58616) This is a JOIN event! Wait for notification from new joiner asrab02-38676
> Concurrent startup leading to rehashing issues
> ----------------------------------------------
>
> Key: ISPN-765
> URL: https://jira.jboss.org/browse/ISPN-765
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 4.2.0.BETA1
> Reporter: Galder Zamarreño
> Assignee: Mircea Markus
> Fix For: 4.2.0.CR1
>
>
> From ISPN-762, there seems to be a rehashing problem when several distributed caches are started concurrently:
> Logs can be found in: https://jira.jboss.org/secure/attachment/12338527/topologycache.diff
> "Even though all the caches are defined in the right place, there is still a problem during starting 3 nodes at a time. Look at the errors coming from the log files:
> * wp-60814 log:
> 2010-11-08 14:29:26,687 INFO [org.infinispan.distribution.DistributionManagerImpl] (Incoming-2,wp-60814) This is a JOIN event! Wait for notification from new joiner wp-59660
> 2010-11-08 14:31:26,668 ERROR [org.infinispan.distribution.JoinTask] (Rehasher-wp-60814) Caught exception!
> org.infinispan.CacheException: org.infinispan.util.concurrent.TimeoutException: Timed out after 120 seconds waiting for a response from wp-34493
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:122)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:403)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:101)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:125)
> at org.infinispan.distribution.JoinTask.retrieveOldCH(JoinTask.java:192)
> at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:87)
> at org.infinispan.distribution.RehashTask.call(RehashTask.java:53)
> at org.infinispan.distribution.RehashTask.call(RehashTask.java:33)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: org.infinispan.util.concurrent.TimeoutException: Timed out after 120 seconds waiting for a response from wp-34493
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:304)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:120)
> ... 12 more
> * wp-34493 log:
> 2010-11-08 14:29:26,654 INFO [org.infinispan.distribution.DistributionManagerImpl] (Incoming-2,wp-34493) This is a JOIN event! Wait for notification from
> new joiner wp-59660
> 2010-11-08 14:29:36,759 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (InfinispanServer-Main) Execution error:
> org.infinispan.util.concurrent.TimeoutException: Timed out waiting for valid responses!
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:421)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:101)
> ...
> at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:273)
> at org.infinispan.CacheDelegate.putIfAbsent(CacheDelegate.java:453)
> at org.infinispan.CacheSupport.putIfAbsent(CacheSupport.java:40)
> at org.infinispan.server.hotrod.HotRodServer$$anonfun$1.apply(HotRodServer.scala:85)
> at org.infinispan.server.hotrod.HotRodServer$$anonfun$1.apply(HotRodServer.scala:75)
> at org.infinispan.server.hotrod.HotRodServer.isViewUpdated(HotRodServer.scala:102)
> at org.infinispan.server.hotrod.HotRodServer.org$infinispan$server$hotrod$HotRodServer$$updateTopologyView(HotRodServer.scala:97)
> at org.infinispan.server.hotrod.HotRodServer.addSelfToTopologyView(HotRodServer.scala:75)
> at org.infinispan.server.hotrod.HotRodServer.startTransport(HotRodServer.scala:63)
> at org.infinispan.server.core.AbstractProtocolServer.start(AbstractProtocolServer.scala:70)
> at org.infinispan.server.hotrod.HotRodServer.start(HotRodServer.scala:44)
> ...
> * wp-59660 log:
> 2010-11-08 14:29:26,699 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (InfinispanServer-Main) Received new cluster view: [wp-34493|2] [wp-34493, wp-60814, wp-59660]
> 2010-11-08 14:29:26,756 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (InfinispanServer-Main) Cache local address is wp-59660, physical addresses are [10.0.36.136:34302]
> 2010-11-08 14:29:26,851 ERROR [org.infinispan.remoting.InboundInvocationHandlerImpl] (OOB-2,wp-59660) Defined caches: [___hotRodTopologyCache]
> 2010-11-08 14:29:26,851 INFO [org.infinispan.remoting.InboundInvocationHandlerImpl] (OOB-2,wp-59660) Will try and wait for the cache to start
> 2010-11-08 14:29:56,864 INFO [org.infinispan.remoting.InboundInvocationHandlerImpl] (OOB-2,wp-59660) Cache named ___hotRodTopologyCache does not exist on this cache manager!
> 2010-11-08 14:49:26,897 ERROR [org.infinispan.distribution.JoinTask] (Rehasher-wp-59660) Caught exception!
> org.infinispan.CacheException: Unable to retrieve old consistent hash from coordinator even after several attempts at sleeping and retrying!
> at org.infinispan.distribution.JoinTask.retrieveOldCH(JoinTask.java:218)
> at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:87)
> at org.infinispan.distribution.RehashTask.call(RehashTask.java:53)
> at org.infinispan.distribution.RehashTask.call(RehashTask.java:33)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> The first node reaches rehashRpcTimeout=120000 while waiting on the old hash. The second one reaches 10s timeout during adding himself to the topology view. And the third one fails on retrieving old hash, but after much longer time = 20min.
> It looks as though we had a kind of deadlock."
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list