[infinispan-issues] [JBoss JIRA] (ISPN-9544) Fork reuse breaks cache topology management

Dan Berindei (JIRA) issues at jboss.org
Wed Sep 26 16:24:00 EDT 2018


Dan Berindei created ISPN-9544:
----------------------------------

             Summary: Fork reuse breaks cache topology management
                 Key: ISPN-9544
                 URL: https://issues.jboss.org/browse/ISPN-9544
             Project: Infinispan
          Issue Type: Bug
          Components: Server
    Affects Versions: 9.4.0.CR3
            Reporter: Dan Berindei
            Assignee: Dan Berindei


Before {{FORK}} was introduced, {{ClsuterTopologyManagerImpl}} and {{LocalTopologyManagerImpl}} assumed that the coordinator would always reply to other members' requests. After the introduction of {{FORK}} we added some hacks to work around the fact that the coordinator may not yet have a {{ForkChannel}} with our ID running **yet**, but we still expect the {{FORK}} setup to be symmetric after a reasonable amount of time.

Stopping a {{FORK}} and starting it back without restarting the underlying channel also doesn't work, because a {{FORK}} start/stop does not trigger a new view. When a node sends a request to the coordinator and receives back a {{CacheNotFoundResponse}}, it assumes that it will also receive a new view, but if the {{CacheNotFoundResponse}} was a consequence of stopping a single {{DefaultCacheManager}}/{{ForkChannel}}, that view will never arrive.

We don't restart individual cache managers in our tests, but the spark connector test suite does it, and it sometimes fails because of it:

{noformat}
2018-09-26 21:18:03,035 INFO  [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel cluster: [server2|6] (3) [server2, server0, server1]
2018-09-26 21:18:05,778 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) server1 sending request 37 to server2: CacheTopologyControlCommand{cache=org.infinispan.spark.suites.DistributedSuite, type=POLICY_GET_STATUS, sender=server1, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=6}
2018-09-26 21:18:05,795 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (jgroups-4,server1) server1 received response for request 37 from server2: CacheNotFoundResponse
2018-09-26 21:18:05,798 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (MSC service thread 1-5) Coordinator left the cluster while querying rebalancing status, retrying
2018-09-26 21:18:05,823 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) server1 sending request 41 to server2: CacheTopologyControlCommand{cache=org.infinispan.spark.suites.DistributedSuite, type=POLICY_GET_STATUS, sender=server1, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=6}
2018-09-26 21:18:05,841 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (jgroups-19,server1) server1 received response for request 41 from server2: CacheNotFoundResponse
2018-09-26 21:18:05,846 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (MSC service thread 1-5) Coordinator left the cluster while querying rebalancing status, retrying
2018-09-26 21:18:05,871 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) Waiting for transaction data for view 7, current view is 6
2018-09-26 21:19:05,779 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-5) MSC000001: Failed to start service jboss.datagrid-infinispan.clustered."org.infinispan.spark.suites.DistributedSuite": org.jboss.msc.service.StartException in service jboss.datagrid-infinispan.clustered."org.infinispan.spark.suites.DistributedSuite": Failed to start service
	at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1728)
	at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1556)
	at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
	at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1985)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1487)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1364)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000451: Timed out waiting for view 7, current view is 6
	at org.infinispan.topology.LocalTopologyManagerImpl.waitForView(LocalTopologyManagerImpl.java:558)
	at org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinatorRetry(LocalTopologyManagerImpl.java:598)
	at org.infinispan.topology.LocalTopologyManagerImpl.isCacheRebalancingEnabled(LocalTopologyManagerImpl.java:580)
	at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:233)
	at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1056)
	at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:451)
	at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:653)
	at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:598)
	at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:481)
	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:465)
	at org.infinispan.manager.impl.AbstractDelegatingEmbeddedCacheManager.getCache(AbstractDelegatingEmbeddedCacheManager.java:157)
	at org.infinispan.server.infinispan.SecurityActions.lambda$startCache$4(SecurityActions.java:122)
	at org.infinispan.security.Security.doPrivileged(Security.java:44)
	at org.infinispan.server.infinispan.SecurityActions.doPrivileged(SecurityActions.java:69)
	at org.infinispan.server.infinispan.SecurityActions.startCache(SecurityActions.java:126)
	at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:87)
	at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1736)
	at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1698)
	... 6 more
{noformat}




--
This message was sent by Atlassian JIRA
(v7.5.0#75005)


More information about the infinispan-issues mailing list