[infinispan-issues] [JBoss JIRA] (ISPN-2415) Initial state transfer timed out - Fail to start 2 nodes after they were killed inside 8-node cluster

Wednesday, 24 October 2012

     [
https://issues.jboss.org/browse/ISPN-2415?page=com.atlassian.jira.plugin....
]

Adrian Nistor resolved ISPN-2415.
---------------------------------

      Assignee: Adrian Nistor  (was: Dan Berindei)
    Resolution: Duplicate Issue

...
 Initial state transfer timed out - Fail to start 2 nodes after they
were killed inside 8-node cluster

-----------------------------------------------------------------------------------------------------

                 Key: ISPN-2415
                 URL: https://issues.jboss.org/browse/ISPN-2415
             Project: Infinispan
          Issue Type: Bug
          Components: State transfer
    Affects Versions: 5.2.0.Beta2
            Reporter: Martin Gencur
            Assignee: Adrian Nistor
            Priority: Critical

 We start 8 nodes, keep them under load, than we kill 2 nodes and later start them again.
However, when we are trying to start them, the following exception is thrown and the test
fails:
 {code}
 10:47:52,830 ERROR [org.radargun.stages.helpers.StartHelper] (pool-1-thread-1) Issues
while instantiating/starting cache wrapper
 org.infinispan.CacheException: Unable to invoke method public void
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete()
throws java.lang.InterruptedException on object of type StateTransferManagerImpl
 	at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
 	at
org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:879)
 	at
org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:650)
 	at
org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:639)
 	at
org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:542)
 	at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:198)
 	at org.infinispan.CacheImpl.start(CacheImpl.java:517)
 	at
org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:689)
 	at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:652)
 	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:548)
 	at org.radargun.cachewrappers.InfinispanWrapper.setUpCache(InfinispanWrapper.java:125)
 	at org.radargun.cachewrappers.InfinispanWrapper.setUp(InfinispanWrapper.java:74)
 	at org.radargun.stages.helpers.StartHelper.start(StartHelper.java:63)
 	at org.radargun.stages.StartClusterStage.executeOnSlave(StartClusterStage.java:47)
 	at org.radargun.Slave$2.run(Slave.java:103)
 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 	at java.lang.Thread.run(Thread.java:662)
 Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache
testCache on edg-perf02-25863
 	at
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:202)
 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 	at java.lang.reflect.Method.invoke(Method.java:597)
 	at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
 	... 20 more
 {code}
 The problem happens at nodes edg-perf02 and edg-perf03 under this Jenkins run:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/ispn-52-radargun-res...
 Debug log can be found at those machines.
 A few more hints:
 - there are individual exceptions/errors extracted from the log - available in the
"Build artifacts"
 - this job passed only once, fails otherwise
 - state transfer timeout is the default one (4 min?)
 - version of Infinspan: 5.2.0-SNAPSHOT, HEAD=d4581e570 - ISPN-2387 ClusteredGetCommand
should not be a VisitableCommand
 Infinispan configuration:
 {code:xml}
 <infinispan
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="urn:infinispan:config:5.2
http://www.infinispan.org/schemas/infinispan-config-5.2.xsd"
       xmlns="urn:infinispan:config:5.2">
    <global>
       <globalJmxStatistics
             enabled="true"
             jmxDomain="jboss.infinispan" 
             cacheManagerName="default"/>
       <transport clusterName="default"
distributedSyncTimeout="600000">
          <properties>
             <property name="configurationFile"
value="jgroups-udp-custom.xml" />
          </properties>
       </transport>
    </global>
    <default>
       <transaction

transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup"
           transactionMode="TRANSACTIONAL" />
       <jmxStatistics enabled="true"/>
       <clustering mode="distribution">
          <l1 enabled="false" />
          <hash numOwners="3" numSegments="512" />
          <sync replTimeout="60000"/>
       </clustering>
       <locking lockAcquisitionTimeout="3000"
concurrencyLevel="1000" />
    </default>

    <namedCache name="testCache" />
    <namedCache name="memcachedCache" />
 </infinispan>
 {code}
 Test scenario (description of RadarGun's job):
 {code:xml}
 <bench-config>
    <master bindAddress="${127.0.0.1:master.address}"
port="${2103:master.port}" />
    <benchmark initSize="${8:slaves}" maxSize="${8:slaves}"
increment="1">
       <DestroyWrapper runOnAllSlaves="true" />
       <StartCluster
          staggerSlaveStartup="true"
          delayAfterFirstSlaveStarts="5000"
          delayBetweenStartingSlaves="500" />
       <ClusterValidation
          partialReplication="false" />
       <StartBackgroundStats
          numThreads="10"
          numEntries="${1000:numEntries}"
          entrySize="1024"
          puts="1"
          gets="2"
          statsIterationDuration="${1000:statsIterationDuration}"
          delayBetweenRequests="100"
          transactionSize="${30:transactionSize}"
          startStressors="true" />
       <!-- Synchronously start stat threads -->
       <StartBackgroundStats
          startStats="true" />
       <Sleep
          time="120000" />
       <Kill
          slaves="1,2" />
       <Sleep
          time="120000" />
       <StartCluster
          slaves="1,2"
          staggerSlaveStartup="false" />
       <Sleep
          time="120000" />
       <StopBackgroundStats />
       <ReportBackgroundStats />
    </benchmark>
    <products>
       <infinispan52>
           <config name="distributed-udp-numowners-3.xml"
cache="testCache"/>
       </infinispan52>
    </products>
    <reports />
 </bench-config>
 {code} 
 If any further information is needed, let me know. 
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] (ISPN-2415) Initial state transfer timed out - Fail to start 2 nodes after they were killed inside 8-node cluster