[infinispan-issues] [JBoss JIRA] (ISPN-2572) "CacheException: Initial state transfer timed out for cache" reliably on AS7 testsuite
Dan Berindei (JIRA)
jira-events at lists.jboss.org
Tue Dec 4 16:24:21 EST 2012
[ https://issues.jboss.org/browse/ISPN-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739489#comment-12739489 ]
Dan Berindei commented on ISPN-2572:
------------------------------------
Rado, the problem was not with the JGroups view, which added a new node-1 node, but with JGroups view 10, which removed the old node-1. The logs would be a little clearer if the node names in AS7 had a random component, like they do in standalone Infinispan.
See this line in your log:
{noformat}
15:46:57,474 TRACE [org.infinispan.topology.DefaultRebalancePolicy] (OOB-13,null) Cache repl status changed: joiners=[node-1/ejb], topology=CacheTopology{id=7, currentCH=ReplicatedConsistentHash{members=[node-0/ejb, node-1/ejb]}, pendingCH=ReplicatedConsistentHash{members=[node-0/ejb, node-1/ejb]}}
{noformat}
node-0 is still waiting for a rebalance confirmation from the old node-1, but that node-1 is no longer alive. It left a few seconds before, in fact we can see it sent an explicit LEAVE message:
{noformat}
15:46:53,582 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-15,null) Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=repl, type=LEAVE, sender=node-1/ejb, joinInfo=null, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=9} [sender=node-1/ejb]
{noformat}
However, the cache topology was not updated after the leave (just as it wasn't updated after JGroups view 10 was installed):
{noformat}
15:46:53,582 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (OOB-15,null) Cache repl members list was updated, but the cache topology doesn't need to change: CacheTopology{id=7, currentCH=ReplicatedConsistentHash{members=[node-0/ejb, node-1/ejb]}, pendingCH=ReplicatedConsistentHash{members=[node-0/ejb, node-1/ejb]}}
15:46:53,939 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-17,null) ISPN000094: Received new cluster view: [node-0/ejb|10] [node-0/ejb]
{noformat}
It looks like the members list in CacheStatus and the CacheTopology got out of sync somehow, and because of this the old node-1 didn't get removed from the CacheTopology after leaving. Since the coordinator expects a rebalance confirmation from all the cache topology members, that rebalance never finished.
I'll change make some changes to log the cache status members as well as the cache topology and I'll get back to you to run the tests again.
> "CacheException: Initial state transfer timed out for cache" reliably on AS7 testsuite
> --------------------------------------------------------------------------------------
>
> Key: ISPN-2572
> URL: https://issues.jboss.org/browse/ISPN-2572
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.2.0.Beta4
> Reporter: Radoslav Husar
> Assignee: Dan Berindei
> Priority: Blocker
> Fix For: 5.2.0.Beta6
>
>
> While running AS7 testsuite with speedups implemented in my branch (https://github.com/jbossas/jboss-as/pull/3381) we are contantly seeing (log below) on Windows 2008.
> Run:
> http://teamcity.cafe-babe.org/viewLog.html?buildId=1689&tab=buildResultsDiv&buildTypeId=bt2
> {code}
> 16:34:46,092 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 13) MSC00001: Failed to start service jboss.infinispan.ejb.remote-connector-client-mappings: org.jboss.msc.service.StartException in service jboss.infinispan.ejb.remote-connector-client-mappings: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
> at org.jboss.as.clustering.msc.AsynchronousService$1.run(AsynchronousService.java:87)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_32]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_32]
> at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_32]
> at org.jboss.threads.JBossThread.run(JBossThread.java:122) [jboss-threads-2.0.0.GA.jar:2.0.0.GA]
> Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
> at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
> at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:883)
> at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:654)
> at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:643)
> at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:546)
> at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:199)
> at org.infinispan.CacheImpl.start(CacheImpl.java:520)
> at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:690)
> at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:653)
> at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:549)
> at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:563)
> at org.jboss.as.clustering.infinispan.DefaultEmbeddedCacheManager.getCache(DefaultEmbeddedCacheManager.java:107)
> at org.jboss.as.clustering.infinispan.DefaultEmbeddedCacheManager.getCache(DefaultEmbeddedCacheManager.java:98)
> at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:78)
> at org.jboss.as.clustering.msc.AsynchronousService$1.run(AsynchronousService.java:82)
> ... 4 more
> Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache remote-connector-client-mappings on node-1/ejb
> at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:209)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.6.0_32]
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [rt.jar:1.6.0_32]
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_32]
> at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_32]
> at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
> ... 18 more
> {code}
> Affected version -- current master (say 7dc531002539b078e429418d8ef204e401beafd1).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list