]
Dan Berindei commented on ISPN-4831:
------------------------------------
Marta, could you try with 6.0.2 (or even better, 7.0.0.CR2)?
5.2.6 is quite old, and we support only the latest released version (soon to be
7.0.0.Final).
Node cannot join cluster correctly after restart
------------------------------------------------
Key: ISPN-4831
URL:
https://issues.jboss.org/browse/ISPN-4831
Project: Infinispan
Issue Type: Bug
Components: State Transfer
Affects Versions: 5.2.6.Final
Environment: 2 node cluster with following confiiguration:
<?xml version="1.0" encoding="UTF-8"?>
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:5.0
http://www.infinispan.org/schemas/infinispan-config-5.0.xsd"
xmlns="urn:infinispan:config:5.0">
<global>
<transport clusterName="cluster">
<properties>
<property name="configurationFile"
value="cluster-configuration.xml"/>
</properties>
</transport>
<globalJmxStatistics enabled="true"
jmxDomain="distCache"/>
</global>
<default>
<locking isolationLevel="REPEATABLE_READ"
lockAcquisitionTimeout="30000"
writeSkewCheck="false"
concurrencyLevel="512"
useLockStriping="false"/>
<clustering mode="distribution">
<sync replTimeout="120000"/>
<l1 enabled="true"/>
<hash numOwners="2"/>
</clustering>
<jmxStatistics enabled="true"/>
<invocationBatching enabled="true"/>
</default>
<namedCache
name="eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_distnamespace">
</namedCache>
<namedCache
name="eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index">
</namedCache>
</infinispan>
Reporter: Marta Sedlakova
Attachments: configuration.txt, node1-configuration.xml, node1WithRestart.txt,
node2-configuration.xml, node2WithoutRestart.log
Node is not able to join the cluster again after restart.
Node 1 was restarted, later it seems that the cluster was disconnected for a while and
node1 is failed to become a coordinator.
Then the node1 was restarted again, but some state transfer timeouts and the cluster was
not connected any more, until both nodes restart:
Node 1 (the one with restart at 3:05 and 3:07):
Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770,
LOUAPPWPS983-56765]
Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
startJGroupsChannelIfNeeded
INFO: ISPN000079: Cache local address is LOUAPPWPS983-56765, physical addresses are
[133.27.18.204:7800]
Sep 29, 2014 3:05:21 AM org.infinispan.factories.GlobalComponentRegistry start
INFO: ISPN000128: Infinispan version: Infinispan 'Delirium' 5.2.6.Final
Sep 29, 2014 3:05:22 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 3:05:22 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 3:05:23 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 3:05:23 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 3:05:23 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 3:05:23 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 3:06:18 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS983-56765|16]
[LOUAPPWPS983-56765]
Sep 29, 2014 3:06:38 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000093: Received new, MERGED cluster view: MergeView::[LOUAPPWPS983-56765|17]
[LOUAPPWPS983-56765, LOUAPPWPS984-56770], subgroups=[LOUAPPWPS984-56770|15]
[LOUAPPWPS984-56770], [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
INFO: ISPN000080: Disconnecting and closing JGroups Channel
Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
INFO: ISPN000082: Stopping the RpcDispatcher
Sep 29, 2014 3:08:45 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport start
INFO: ISPN000078: Starting JGroups Channel
Sep 29, 2014 3:08:46 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|19] [LOUAPPWPS984-56770,
LOUAPPWPS983-54866]
Sep 29, 2014 3:08:47 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
startJGroupsChannelIfNeeded
INFO: ISPN000079: Cache local address is LOUAPPWPS983-54866, physical addresses are
[133.27.18.204:7800]
Sep 29, 2014 3:08:47 AM org.infinispan.factories.GlobalComponentRegistry start
INFO: ISPN000128: Infinispan version: Infinispan 'Delirium' 5.2.6.Final
Sep 29, 2014 3:08:47 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 3:08:47 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 3:12:47 AM org.apache.catalina.core.ApplicationContext log
SEVERE: StandardWrapper.Throwable
org.infinispan.CacheException: Unable to invoke method public void
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete()
throws java.lang.InterruptedException on object of type StateTransferManagerImpl
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
at
org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:886)
at
org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:657)
at
org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:646)
at
org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:549)
at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:217)
at org.infinispan.CacheImpl.start(CacheImpl.java:582)
at
org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:686)
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:649)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:545)
at org.infinispan.rest.StartupListener$$anonfun$init$1.apply(StartupListener.scala:66)
at org.infinispan.rest.StartupListener$$anonfun$init$1.apply(StartupListener.scala:65)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1156)
at org.infinispan.rest.StartupListener.init(StartupListener.scala:65)
at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1280)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1193)
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1088)
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:5176)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5460)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:633)
at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1120)
at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:1678)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache
eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index on LOUAPPWPS983-54866
at
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:216)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
... 30 more
Node 2 (without restart):
Sep 29, 2014 3:03:54 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|14]
[LOUAPPWPS984-56770]
Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770,
LOUAPPWPS983-56765]
Sep 29, 2014 3:06:38 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000093: Received new, MERGED cluster view: MergeView::[LOUAPPWPS983-56765|17]
[LOUAPPWPS983-56765, LOUAPPWPS984-56770], subgroups=[LOUAPPWPS984-56770|15]
[LOUAPPWPS984-56770], [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|18]
[LOUAPPWPS984-56770]
Sep 29, 2014 3:07:33 AM org.infinispan.topology.ClusterTopologyManagerImpl handleNewView
ERROR: ISPN000196: Failed to recover cluster state after the current node became the
coordinator
java.lang.IllegalStateException: Trying to set a topology with invalid members for cache
eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index: members = [LOUAPPWPS984-56770],
topology = CacheTopology{id=37, currentCH=DefaultConsistentHash{numSegments=60,
numOwners=1, members=[LOUAPPWPS984-56770, LOUAPPWPS983-56765]}, pendingCH=null}
at
org.infinispan.topology.ClusterCacheStatus.updateCacheTopology(ClusterCacheStatus.java:165)
at
org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheStatusAfterMerge(ClusterTopologyManagerImpl.java:315)
at
org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:236)
at
org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.handleViewChange(ClusterTopologyManagerImpl.java:579)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
org.infinispan.notifications.AbstractListenerImpl$ListenerInvocation$1.run(AbstractListenerImpl.java:212)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Sep 29, 2014 3:08:46 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|19] [LOUAPPWPS984-56770,
LOUAPPWPS983-54866]
Sep 29, 2014 3:12:48 AM org.infinispan.statetransfer.StateProviderImpl getCacheTopology
WARN: ISPN000211: Transactions were requested by node LOUAPPWPS983-54866 with topology
30, older than the local topology (35)
Sep 29, 2014 3:12:48 AM org.infinispan.statetransfer.StateProviderImpl getCacheTopology
WARN: ISPN000212: Segments were requested by node LOUAPPWPS983-54866 with topology 30,
older than the local topology (35)
Sep 29, 2014 12:42:27 PM org.infinispan.transaction.TransactionTable shutDownGracefully
WARN: ISPN000100: Stopping, but there are 0 local transactions and 16 remote transactions
that did not finish in time.
Sep 29, 2014 12:42:27 PM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
INFO: ISPN000080: Disconnecting and closing JGroups Channel
Sep 29, 2014 12:42:29 PM org.jgroups.logging.JDKLogImpl warn
WARNING: LOUAPPWPS984-56770: failed to collect all ACKs (expected=1) for view
[LOUAPPWPS983-54866|20] after 2000ms, missing ACKs from [LOUAPPWPS983-54866]
Sep 29, 2014 12:42:30 PM org.jgroups.logging.JDKLogImpl error