[infinispan-issues] [JBoss JIRA] (ISPN-9544) Asymmetric ForkChannels break cache topology management

Dan Berindei (JIRA) issues at jboss.org
Thu Sep 27 03:41:00 EDT 2018


     [ https://issues.jboss.org/browse/ISPN-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Berindei updated ISPN-9544:
-------------------------------
    Summary: Asymmetric ForkChannels break cache topology management  (was: Fork reuse breaks cache topology management)


> Asymmetric ForkChannels break cache topology management
> -------------------------------------------------------
>
>                 Key: ISPN-9544
>                 URL: https://issues.jboss.org/browse/ISPN-9544
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Server
>    Affects Versions: 9.4.0.CR3
>            Reporter: Dan Berindei
>            Assignee: Dan Berindei
>
> Before {{FORK}} was introduced, {{ClsuterTopologyManagerImpl}} and {{LocalTopologyManagerImpl}} assumed that the coordinator would always reply to other members' requests. After the introduction of {{FORK}} we added some hacks to work around the fact that the coordinator may not yet have a {{ForkChannel}} with our ID running **yet**, but we still expect the {{FORK}} setup to be symmetric after a reasonable amount of time.
> Stopping a {{FORK}} and starting it back without restarting the underlying channel also doesn't work, because a {{FORK}} start/stop does not trigger a new view. When a node sends a request to the coordinator and receives back a {{CacheNotFoundResponse}}, it assumes that it will also receive a new view, but if the {{CacheNotFoundResponse}} was a consequence of stopping a single {{DefaultCacheManager}}/{{ForkChannel}}, that view will never arrive.
> We don't restart individual cache managers in our tests, but the spark connector test suite does it, and it sometimes fails because of it:
> {noformat}
> 2018-09-26 21:18:03,035 INFO  [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel cluster: [server2|6] (3) [server2, server0, server1]
> 2018-09-26 21:18:05,778 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) server1 sending request 37 to server2: CacheTopologyControlCommand{cache=org.infinispan.spark.suites.DistributedSuite, type=POLICY_GET_STATUS, sender=server1, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=6}
> 2018-09-26 21:18:05,795 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (jgroups-4,server1) server1 received response for request 37 from server2: CacheNotFoundResponse
> 2018-09-26 21:18:05,798 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (MSC service thread 1-5) Coordinator left the cluster while querying rebalancing status, retrying
> 2018-09-26 21:18:05,823 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) server1 sending request 41 to server2: CacheTopologyControlCommand{cache=org.infinispan.spark.suites.DistributedSuite, type=POLICY_GET_STATUS, sender=server1, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=6}
> 2018-09-26 21:18:05,841 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (jgroups-19,server1) server1 received response for request 41 from server2: CacheNotFoundResponse
> 2018-09-26 21:18:05,846 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (MSC service thread 1-5) Coordinator left the cluster while querying rebalancing status, retrying
> 2018-09-26 21:18:05,871 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) Waiting for transaction data for view 7, current view is 6
> 2018-09-26 21:19:05,779 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-5) MSC000001: Failed to start service jboss.datagrid-infinispan.clustered."org.infinispan.spark.suites.DistributedSuite": org.jboss.msc.service.StartException in service jboss.datagrid-infinispan.clustered."org.infinispan.spark.suites.DistributedSuite": Failed to start service
> 	at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1728)
> 	at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1556)
> 	at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
> 	at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1985)
> 	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1487)
> 	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1364)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000451: Timed out waiting for view 7, current view is 6
> 	at org.infinispan.topology.LocalTopologyManagerImpl.waitForView(LocalTopologyManagerImpl.java:558)
> 	at org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinatorRetry(LocalTopologyManagerImpl.java:598)
> 	at org.infinispan.topology.LocalTopologyManagerImpl.isCacheRebalancingEnabled(LocalTopologyManagerImpl.java:580)
> 	at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:233)
> 	at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1056)
> 	at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:451)
> 	at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:653)
> 	at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:598)
> 	at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:481)
> 	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:465)
> 	at org.infinispan.manager.impl.AbstractDelegatingEmbeddedCacheManager.getCache(AbstractDelegatingEmbeddedCacheManager.java:157)
> 	at org.infinispan.server.infinispan.SecurityActions.lambda$startCache$4(SecurityActions.java:122)
> 	at org.infinispan.security.Security.doPrivileged(Security.java:44)
> 	at org.infinispan.server.infinispan.SecurityActions.doPrivileged(SecurityActions.java:69)
> 	at org.infinispan.server.infinispan.SecurityActions.startCache(SecurityActions.java:126)
> 	at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:87)
> 	at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1736)
> 	at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1698)
> 	... 6 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)


More information about the infinispan-issues mailing list