[JBoss JIRA] (ISPN-3129) If the status recovery fails for a cache, it stops recovery for all the caches

Monday, 2 December 2013



     [
https://issues.jboss.org/browse/ISPN-3129?page=com.atlassian.jira.plugin....
]

Mircea Markus updated ISPN-3129:
--------------------------------

    Labels: nbst  (was: )

    
...
 If the status recovery fails for a cache, it stops recovery for all
the caches
 ------------------------------------------------------------------------------

                 Key: ISPN-3129
                 URL: https://issues.jboss.org/browse/ISPN-3129
             Project: Infinispan
          Issue Type: Bug
          Components: State transfer
    Affects Versions: 5.2.6.Final
            Reporter: Dan Berindei
            Assignee: Dan Berindei
              Labels: nbst
             Fix For: 5.2.7.Final, 5.3.0.CR1, 5.3.0.Final


 When a node becomes coordinator, it sends a GET_STATUS command to recover the status of
all the caches in the cluster. Then it sends a CH_UPDATE command for each cache, to
install a new topology.
 However, if the CH_UPDATE command fails for one cache, the coordinator will interrupt the
recovery process, and it won't even install the cache topology for the rest of the
caches locally. When a new node joins, it will act as if the cache was just started:
 {code}
 09:05:22,542 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(OOB-1693,shared=udp) Attempting to execute non-CacheRpcCommand command:
CacheTopologyControlCommand{cache=default-host/clusterbench-granular, type=CH_UPDATE,
sender=perf03/web, joinInfo=null, topologyId=21,
currentCH=DefaultConsistentHash{numSegments=80, numOwners=2, members=[perf04/web,
perf01/web, perf02/web]}, pendingCH=null, throwable=null, viewId=7} [sender=perf03/web]
 09:06:21,484 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(OOB-2231,shared=udp) Attempting to execute non-CacheRpcCommand command:
CacheTopologyControlCommand{cache=null, type=GET_STATUS, sender=perf04/web, joinInfo=null,
topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=7}
[sender=perf04/web]
 09:28:21,060 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(OOB-22206,shared=udp) Attempting to execute non-CacheRpcCommand command:
CacheTopologyControlCommand{cache=default-host/clusterbench-granular, type=CH_UPDATE,
sender=perf04/web, joinInfo=null, topologyId=1,
currentCH=DefaultConsistentHash{numSegments=80, numOwners=2, members=[perf04/web,
perf02/web, perf03/web]}, pendingCH=null, throwable=null, viewId=10} [sender=perf04/web]
 {code}
 This is mitigated by the ISPN-2966 fix
(https://github.com/infinispan/infinispan/pull/1742). Since the CH_UPDATE command is now
sent asynchronously, it ignores any exceptions on the targets. But on the 5.2.x branch,
the CH_UPDATE command is sent synchronously, and it can fail with an exception like this:
 {code}
 05:43:56,513 WARN  [org.infinispan.topology.CacheTopologyControlCommand]
(notification-thread-0) ISPN000071: Caught exception when handling command
CacheTopologyControlCommand{cache=default-host/clusterbench, type=CH_UPDATE,
sender=perf21/web, joinInfo=null, topologyId=23,
currentCH=DefaultConsistentHash{numSegments=80, numOwners=2, members=[perf21/web,
perf18/web, perf19/web]}, pendingCH=null, throwable=null, viewId=8}:
java.lang.IllegalArgumentException: The task is already cancelled.
  	at
org.infinispan.statetransfer.InboundTransferTask.cancelSegments(InboundTransferTask.java:167)
  	at
org.infinispan.statetransfer.StateConsumerImpl.cancelTransfers(StateConsumerImpl.java:774)
  	at
org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:314)
  	at
org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:195)
  	at
org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:61)
  	at
org.infinispan.statetransfer.StateTransferManagerImpl$1.updateConsistentHash(StateTransferManagerImpl.java:121)
  	at
org.infinispan.topology.LocalTopologyManagerImpl.handleConsistentHashUpdate(LocalTopologyManagerImpl.java:202)
  	at
org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:165)
  	at
org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137)
  	at
org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:540)
  	at
org.infinispan.topology.ClusterTopologyManagerImpl.broadcastConsistentHashUpdate(ClusterTopologyManagerImpl.java:332)
  	at
org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheStatusAfterMerge(ClusterTopologyManagerImpl.java:319)
  	at
org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:236)
  	at
org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.handleViewChange(ClusterTopologyManagerImpl.java:579)
  	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.6.0_43]
  	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[rt.jar:1.6.0_43]
  	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[rt.jar:1.6.0_43]
  	at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_43]
  	at
org.infinispan.notifications.AbstractListenerImpl$ListenerInvocation$1.run(AbstractListenerImpl.java:212)
  	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
[rt.jar:1.6.0_43]
  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
[rt.jar:1.6.0_43]
  	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_43]
 {code}
 If that happens, cluster recovery is interrupted and some caches may end up in a
corrupted state where it's impossible for new members to join. 
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009