[infinispan-issues] [Red Hat JIRA] (ISPN-12714) Cluster locks can stay locked by crashed nodes

Monday, 8 February 2021

Dan Berindei created ISPN-12714:
-----------------------------------

             Summary: Cluster locks can stay locked by crashed nodes
                 Key: ISPN-12714
                 URL: https://issues.redhat.com/browse/ISPN-12714
             Project: Infinispan
          Issue Type: Bug
          Components: Clustered Locks
    Affects Versions: 12.0.0.Final
            Reporter: Dan Berindei
             Fix For: 12.1.0.Final

When the node that owns a clustered lock leaves the cluster,
{{ClusteredLockImpl.ClusterChangeListener}} is supposed to release the lock. But if the
{{org.infinispan.LOCKS}} cache is in DEGRADED mode, the lock release fails and an error is
logged:

{noformat}
22:01:29,500 ERROR (jgroups-9,Test-NodeD:[]) [CacheManagerNotifierImpl] ISPN000405: Caught
exception while invoking a cache manager listener!
org.infinispan.commons.CacheListenerException: ISPN000280: Caught exception
[org.infinispan.partitionhandling.AvailabilityException] while invoking method [public
void
org.infinispan.lock.impl.lock.ClusteredLockImpl$ClusterChangeListener.viewChange(org.infinispan.notifications.cachemanagerlistener.event.ViewChangedEvent)]
on listener instance:
org.infinispan.lock.impl.lock.ClusteredLockImpl$ClusterChangeListener@3c91530d
	at
org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.lambda$invoke$1(AbstractListenerImpl.java:430)
	at
org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.invoke(AbstractListenerImpl.java:450)
	at
org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl.invokeListener(CacheManagerNotifierImpl.java:157)
	at
org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl.invokeListeners(CacheManagerNotifierImpl.java:84)
	at
org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl.notifyViewChange(CacheManagerNotifierImpl.java:103)
	at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.receiveClusterView(JGroupsTransport.java:737)
        ...
Caused by: org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key
'ClusteredLockKey{name=ConsistentReliabilitySplitBrainTest}' is not available. Not
all owners are in this partition
	at
org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl.doCheck(PartitionHandlingManagerImpl.java:272)
	at
org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl.checkRead(PartitionHandlingManagerImpl.java:114)
	at
org.infinispan.factories.InternalCacheFactory$PartitionHandlingCache.get(InternalCacheFactory.java:308)
	at
org.infinispan.factories.InternalCacheFactory$PartitionHandlingCache.get(InternalCacheFactory.java:306)
	at
org.infinispan.factories.InternalCacheFactory$AbstractGetAdvancedCache.containsKey(InternalCacheFactory.java:257)
	at
org.infinispan.cache.impl.AbstractDelegatingCache.containsKey(AbstractDelegatingCache.java:384)
	at org.infinispan.cache.impl.EncoderCache.containsKey(EncoderCache.java:618)
	at
org.infinispan.lock.impl.manager.EmbeddedClusteredLockManager.isDefined(EmbeddedClusteredLockManager.java:157)
	at
org.infinispan.lock.impl.lock.ClusteredLockImpl$ClusterChangeListener.viewChange(ClusteredLockImpl.java:335)
	at jdk.internal.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
	at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at
org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.lambda$invoke$1(AbstractListenerImpl.java:424)
{noformat}

When the cache goes back to AVAILABLE mode, there is no other check to see if the lock
owner has come back into the cluster or not, so the lock may stay forever owned by a
crashed node.

E.g. the initial cluster is ABCD, D owns clustered lock L
# The cluster splits into 3 partitions: AB, C, D
# LOCKS cache enters DEGRADED mode
# A and B try to unlock L, but fail
# D crashes
# C merges back with AB
# LOCKS cache becomes AVAILABLE
# L remains owned by D

Unlocking the locks on cluster view changes is also problematic. Because the LOCKS cache
enters DEGRADED mode *after* the cluster view change, if the LOCKS cache is distributed,
then it is theoretically possible for a lock to be unlocked and then for its owner to
merge back:

E.g. the initial cluster is ABCD, D owns clustered lock L
# The cluster splits into 2 partitions: AB and CD
# A and B are the 2 owners of L, and A unlocks L
# The LOCKS cache enters DEGRADED mode
# The partitions merge back
# The LOCKS cache becomes AVAILABLE again
# D thinks it still owns L, but other nodes are able to acquire it

--
This message was sent by Atlassian Jira
(v8.13.1#813001)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [Red Hat JIRA] (ISPN-12714) Cluster locks can stay locked by crashed nodes