]
Dennis Reed resolved ISPN-4831.
-------------------------------
Resolution: Done
Wrong member list was passed to cacheStatus.updateClusterMembers in
ClusterTopologyManagerImpl#updateCacheStatusAfterMerge
Fixed in ISPN-2935.
Node cannot join cluster correctly after restart
------------------------------------------------
Key: ISPN-4831
URL:
https://issues.jboss.org/browse/ISPN-4831
Project: Infinispan
Issue Type: Bug
Components: State Transfer
Affects Versions: 5.2.6.Final
Environment: 2 node cluster with following confiiguration:
<?xml version="1.0" encoding="UTF-8"?>
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:5.0
http://www.infinispan.org/schemas/infinispan-config-5.0.xsd"
xmlns="urn:infinispan:config:5.0">
<global>
<transport clusterName="cluster">
<properties>
<property name="configurationFile"
value="cluster-configuration.xml"/>
</properties>
</transport>
<globalJmxStatistics enabled="true"
jmxDomain="distCache"/>
</global>
<default>
<locking isolationLevel="REPEATABLE_READ"
lockAcquisitionTimeout="30000"
writeSkewCheck="false"
concurrencyLevel="512"
useLockStriping="false"/>
<clustering mode="distribution">
<sync replTimeout="120000"/>
<l1 enabled="true"/>
<hash numOwners="2"/>
</clustering>
<jmxStatistics enabled="true"/>
<invocationBatching enabled="true"/>
</default>
<namedCache
name="eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_distnamespace">
</namedCache>
<namedCache
name="eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index">
</namedCache>
</infinispan>
Reporter: Marta Sedlakova
Attachments: configuration.txt, node1-configuration.xml, node1WithRestart.txt,
node2-configuration.xml, node2WithoutRestart.log
Node is not able to join the cluster again after restart.
Node 1 was restarted, later it seems that the cluster was disconnected for a while and
node1 is failed to become a coordinator.
Then the node1 was restarted again, but some state transfer timeouts and the cluster was
not connected any more, until both nodes restart:
Node 1 (the one with restart at 3:05 and 3:07):
Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770,
LOUAPPWPS983-56765]
Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
startJGroupsChannelIfNeeded
INFO: ISPN000079: Cache local address is LOUAPPWPS983-56765, physical addresses are
[133.27.18.204:7800]
Sep 29, 2014 3:05:21 AM org.infinispan.factories.GlobalComponentRegistry start
INFO: ISPN000128: Infinispan version: Infinispan 'Delirium' 5.2.6.Final
Sep 29, 2014 3:05:22 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 3:05:22 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 3:05:23 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 3:05:23 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 3:05:23 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 3:05:23 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 3:06:18 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS983-56765|16]
[LOUAPPWPS983-56765]
Sep 29, 2014 3:06:38 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000093: Received new, MERGED cluster view: MergeView::[LOUAPPWPS983-56765|17]
[LOUAPPWPS983-56765, LOUAPPWPS984-56770], subgroups=[LOUAPPWPS984-56770|15]
[LOUAPPWPS984-56770], [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
INFO: ISPN000080: Disconnecting and closing JGroups Channel
Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
INFO: ISPN000082: Stopping the RpcDispatcher
Sep 29, 2014 3:08:45 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport start
INFO: ISPN000078: Starting JGroups Channel
Sep 29, 2014 3:08:46 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|19] [LOUAPPWPS984-56770,
LOUAPPWPS983-54866]
Sep 29, 2014 3:08:47 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
startJGroupsChannelIfNeeded
INFO: ISPN000079: Cache local address is LOUAPPWPS983-54866, physical addresses are
[133.27.18.204:7800]
Sep 29, 2014 3:08:47 AM org.infinispan.factories.GlobalComponentRegistry start
INFO: ISPN000128: Infinispan version: Infinispan 'Delirium' 5.2.6.Final
Sep 29, 2014 3:08:47 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 3:08:47 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 3:12:47 AM org.apache.catalina.core.ApplicationContext log
SEVERE: StandardWrapper.Throwable
org.infinispan.CacheException: Unable to invoke method public void
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete()
throws java.lang.InterruptedException on object of type StateTransferManagerImpl
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
at
org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:886)
at
org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:657)
at
org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:646)
at
org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:549)
at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:217)
at org.infinispan.CacheImpl.start(CacheImpl.java:582)
at
org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:686)
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:649)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:545)
at org.infinispan.rest.StartupListener$$anonfun$init$1.apply(StartupListener.scala:66)
at org.infinispan.rest.StartupListener$$anonfun$init$1.apply(StartupListener.scala:65)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1156)
at org.infinispan.rest.StartupListener.init(StartupListener.scala:65)
at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1280)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1193)
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1088)
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:5176)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5460)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:633)
at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1120)
at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:1678)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache
eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index on LOUAPPWPS983-54866
at
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:216)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
... 30 more
Node 2 (without restart):
Sep 29, 2014 3:03:54 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|14]
[LOUAPPWPS984-56770]
Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770,
LOUAPPWPS983-56765]
Sep 29, 2014 3:06:38 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000093: Received new, MERGED cluster view: MergeView::[LOUAPPWPS983-56765|17]
[LOUAPPWPS983-56765, LOUAPPWPS984-56770], subgroups=[LOUAPPWPS984-56770|15]
[LOUAPPWPS984-56770], [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|18]
[LOUAPPWPS984-56770]
Sep 29, 2014 3:07:33 AM org.infinispan.topology.ClusterTopologyManagerImpl handleNewView
ERROR: ISPN000196: Failed to recover cluster state after the current node became the
coordinator
java.lang.IllegalStateException: Trying to set a topology with invalid members for cache
eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index: members = [LOUAPPWPS984-56770],
topology = CacheTopology{id=37, currentCH=DefaultConsistentHash{numSegments=60,
numOwners=1, members=[LOUAPPWPS984-56770, LOUAPPWPS983-56765]}, pendingCH=null}
at
org.infinispan.topology.ClusterCacheStatus.updateCacheTopology(ClusterCacheStatus.java:165)
at
org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheStatusAfterMerge(ClusterTopologyManagerImpl.java:315)
at
org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:236)
at
org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.handleViewChange(ClusterTopologyManagerImpl.java:579)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
org.infinispan.notifications.AbstractListenerImpl$ListenerInvocation$1.run(AbstractListenerImpl.java:212)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Sep 29, 2014 3:08:46 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport
viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|19] [LOUAPPWPS984-56770,
LOUAPPWPS983-54866]
Sep 29, 2014 3:12:48 AM org.infinispan.statetransfer.StateProviderImpl getCacheTopology
WARN: ISPN000211: Transactions were requested by node LOUAPPWPS983-54866 with topology
30, older than the local topology (35)
Sep 29, 2014 3:12:48 AM org.infinispan.statetransfer.StateProviderImpl getCacheTopology
WARN: ISPN000212: Segments were requested by node LOUAPPWPS983-54866 with topology 30,
older than the local topology (35)
Sep 29, 2014 12:42:27 PM org.infinispan.transaction.TransactionTable shutDownGracefully
WARN: ISPN000100: Stopping, but there are 0 local transactions and 16 remote transactions
that did not finish in time.
Sep 29, 2014 12:42:27 PM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
INFO: ISPN000080: Disconnecting and closing JGroups Channel
Sep 29, 2014 12:42:29 PM org.jgroups.logging.JDKLogImpl warn
WARNING: LOUAPPWPS984-56770: failed to collect all ACKs (expected=1) for view
[LOUAPPWPS983-54866|20] after 2000ms, missing ACKs from [LOUAPPWPS983-54866]
Sep 29, 2014 12:42:30 PM org.jgroups.logging.JDKLogImpl error