[infinispan-issues] [JBoss JIRA] (ISPN-4831) Node cannot join cluster correctly after restart

Dennis Reed (JIRA) issues at jboss.org
Wed Feb 4 19:10:49 EST 2015


     [ https://issues.jboss.org/browse/ISPN-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Reed resolved ISPN-4831.
-------------------------------
    Resolution: Done


Wrong member list was passed to cacheStatus.updateClusterMembers in ClusterTopologyManagerImpl#updateCacheStatusAfterMerge

Fixed in ISPN-2935.

> Node cannot join cluster correctly after restart
> ------------------------------------------------
>
>                 Key: ISPN-4831
>                 URL: https://issues.jboss.org/browse/ISPN-4831
>             Project: Infinispan
>          Issue Type: Bug
>          Components: State Transfer
>    Affects Versions: 5.2.6.Final
>         Environment: 2 node cluster with following confiiguration:
> <?xml version="1.0" encoding="UTF-8"?>
> <infinispan
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>         xsi:schemaLocation="urn:infinispan:config:5.0 http://www.infinispan.org/schemas/infinispan-config-5.0.xsd"
>         xmlns="urn:infinispan:config:5.0">
>     <global>
>         <transport clusterName="cluster">
>             <properties>
>           <property name="configurationFile" value="cluster-configuration.xml"/>
>             </properties>
>         </transport>
>         <globalJmxStatistics enabled="true" jmxDomain="distCache"/>
>     </global>
>     <default>
>         <locking isolationLevel="REPEATABLE_READ"
>                  lockAcquisitionTimeout="30000"
>                  writeSkewCheck="false"
>                  concurrencyLevel="512"
>                  useLockStriping="false"/>
>         <clustering mode="distribution">
>             <sync replTimeout="120000"/>
>             <l1 enabled="true"/>
>             <hash numOwners="2"/>
>         </clustering>
>         <jmxStatistics enabled="true"/>
>         <invocationBatching enabled="true"/>
>     </default>
>     <namedCache name="eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_distnamespace">
>     </namedCache>
>     
>     <namedCache name="eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index">
>     </namedCache>
>     
> </infinispan>
>            Reporter: Marta Sedlakova
>         Attachments: configuration.txt, node1-configuration.xml, node1WithRestart.txt, node2-configuration.xml, node2WithoutRestart.log
>
>
> Node is not able to join the cluster again after restart.
> Node 1 was restarted, later it seems that the cluster was disconnected for a while and node1 is failed to become a coordinator.
> Then the node1 was restarted again, but some state transfer timeouts and the cluster was not connected any more, until both nodes restart:
> Node 1 (the one with restart at 3:05 and 3:07):
> Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
> INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770, LOUAPPWPS983-56765]
> Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport startJGroupsChannelIfNeeded
> INFO: ISPN000079: Cache local address is LOUAPPWPS983-56765, physical addresses are [133.27.18.204:7800]
> Sep 29, 2014 3:05:21 AM org.infinispan.factories.GlobalComponentRegistry start
> INFO: ISPN000128: Infinispan version: Infinispan 'Delirium' 5.2.6.Final
> Sep 29, 2014 3:05:22 AM org.infinispan.factories.TransactionManagerFactory construct
> INFO: ISPN000161: Using a batchMode transaction manager
> Sep 29, 2014 3:05:22 AM org.infinispan.jmx.CacheJmxRegistration start
> INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
> Sep 29, 2014 3:05:23 AM org.infinispan.factories.TransactionManagerFactory construct
> INFO: ISPN000161: Using a batchMode transaction manager
> Sep 29, 2014 3:05:23 AM org.infinispan.jmx.CacheJmxRegistration start
> INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
> Sep 29, 2014 3:05:23 AM org.infinispan.factories.TransactionManagerFactory construct
> INFO: ISPN000161: Using a batchMode transaction manager
> Sep 29, 2014 3:05:23 AM org.infinispan.jmx.CacheJmxRegistration start
> INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
> Sep 29, 2014 3:06:18 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
> INFO: ISPN000094: Received new cluster view: [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
> Sep 29, 2014 3:06:38 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
> INFO: ISPN000093: Received new, MERGED cluster view: MergeView::[LOUAPPWPS983-56765|17] [LOUAPPWPS983-56765, LOUAPPWPS984-56770], subgroups=[LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770], [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
> Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
> INFO: ISPN000080: Disconnecting and closing JGroups Channel
> Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
> INFO: ISPN000082: Stopping the RpcDispatcher
> Sep 29, 2014 3:08:45 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport start
> INFO: ISPN000078: Starting JGroups Channel
> Sep 29, 2014 3:08:46 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
> INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|19] [LOUAPPWPS984-56770, LOUAPPWPS983-54866]
> Sep 29, 2014 3:08:47 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport startJGroupsChannelIfNeeded
> INFO: ISPN000079: Cache local address is LOUAPPWPS983-54866, physical addresses are [133.27.18.204:7800]
> Sep 29, 2014 3:08:47 AM org.infinispan.factories.GlobalComponentRegistry start
> INFO: ISPN000128: Infinispan version: Infinispan 'Delirium' 5.2.6.Final
> Sep 29, 2014 3:08:47 AM org.infinispan.factories.TransactionManagerFactory construct
> INFO: ISPN000161: Using a batchMode transaction manager
> Sep 29, 2014 3:08:47 AM org.infinispan.jmx.CacheJmxRegistration start
> INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
> Sep 29, 2014 3:12:47 AM org.apache.catalina.core.ApplicationContext log
> SEVERE: StandardWrapper.Throwable
> org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
> 	at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
> 	at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:886)
> 	at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:657)
> 	at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:646)
> 	at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:549)
> 	at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:217)
> 	at org.infinispan.CacheImpl.start(CacheImpl.java:582)
> 	at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:686)
> 	at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:649)
> 	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:545)
> 	at org.infinispan.rest.StartupListener$$anonfun$init$1.apply(StartupListener.scala:66)
> 	at org.infinispan.rest.StartupListener$$anonfun$init$1.apply(StartupListener.scala:65)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> 	at scala.collection.AbstractIterator.foreach(Iterator.scala:1156)
> 	at org.infinispan.rest.StartupListener.init(StartupListener.scala:65)
> 	at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1280)
> 	at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1193)
> 	at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1088)
> 	at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:5176)
> 	at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5460)
> 	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
> 	at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
> 	at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
> 	at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:633)
> 	at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1120)
> 	at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:1678)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> 	at java.util.concurrent.FutureTask.run(Unknown Source)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> 	at java.lang.Thread.run(Unknown Source)
> Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index on LOUAPPWPS983-54866
> 	at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:216)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> 	at java.lang.reflect.Method.invoke(Unknown Source)
> 	at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
> 	... 30 more
> Node 2 (without restart):
> Sep 29, 2014 3:03:54 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
> INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|14] [LOUAPPWPS984-56770]
> Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
> INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770, LOUAPPWPS983-56765]
> Sep 29, 2014 3:06:38 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
> INFO: ISPN000093: Received new, MERGED cluster view: MergeView::[LOUAPPWPS983-56765|17] [LOUAPPWPS983-56765, LOUAPPWPS984-56770], subgroups=[LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770], [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
> Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
> INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|18] [LOUAPPWPS984-56770]
> Sep 29, 2014 3:07:33 AM org.infinispan.topology.ClusterTopologyManagerImpl handleNewView
> ERROR: ISPN000196: Failed to recover cluster state after the current node became the coordinator
> java.lang.IllegalStateException: Trying to set a topology with invalid members for cache eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index: members = [LOUAPPWPS984-56770], topology = CacheTopology{id=37, currentCH=DefaultConsistentHash{numSegments=60, numOwners=1, members=[LOUAPPWPS984-56770, LOUAPPWPS983-56765]}, pendingCH=null}
> 	at org.infinispan.topology.ClusterCacheStatus.updateCacheTopology(ClusterCacheStatus.java:165)
> 	at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheStatusAfterMerge(ClusterTopologyManagerImpl.java:315)
> 	at org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:236)
> 	at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.handleViewChange(ClusterTopologyManagerImpl.java:579)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> 	at java.lang.reflect.Method.invoke(Unknown Source)
> 	at org.infinispan.notifications.AbstractListenerImpl$ListenerInvocation$1.run(AbstractListenerImpl.java:212)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> 	at java.lang.Thread.run(Unknown Source)
> Sep 29, 2014 3:08:46 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
> INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|19] [LOUAPPWPS984-56770, LOUAPPWPS983-54866]
> Sep 29, 2014 3:12:48 AM org.infinispan.statetransfer.StateProviderImpl getCacheTopology
> WARN: ISPN000211: Transactions were requested by node LOUAPPWPS983-54866 with topology 30, older than the local topology (35)
> Sep 29, 2014 3:12:48 AM org.infinispan.statetransfer.StateProviderImpl getCacheTopology
> WARN: ISPN000212: Segments were requested by node LOUAPPWPS983-54866 with topology 30, older than the local topology (35)
> Sep 29, 2014 12:42:27 PM org.infinispan.transaction.TransactionTable shutDownGracefully
> WARN: ISPN000100: Stopping, but there are 0 local transactions and 16 remote transactions that did not finish in time.
> Sep 29, 2014 12:42:27 PM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
> INFO: ISPN000080: Disconnecting and closing JGroups Channel
> Sep 29, 2014 12:42:29 PM org.jgroups.logging.JDKLogImpl warn
> WARNING: LOUAPPWPS984-56770: failed to collect all ACKs (expected=1) for view [LOUAPPWPS983-54866|20] after 2000ms, missing ACKs from [LOUAPPWPS983-54866]
> Sep 29, 2014 12:42:30 PM org.jgroups.logging.JDKLogImpl error



--
This message was sent by Atlassian JIRA
(v6.3.11#6341)


More information about the infinispan-issues mailing list