Dan Berindei created ISPN-12714:
-----------------------------------
Summary: Cluster locks can stay locked by crashed nodes
Key: ISPN-12714
URL:
https://issues.redhat.com/browse/ISPN-12714
Project: Infinispan
Issue Type: Bug
Components: Clustered Locks
Affects Versions: 12.0.0.Final
Reporter: Dan Berindei
Fix For: 12.1.0.Final
When the node that owns a clustered lock leaves the cluster,
{{ClusteredLockImpl.ClusterChangeListener}} is supposed to release the lock. But if the
{{org.infinispan.LOCKS}} cache is in DEGRADED mode, the lock release fails and an error is
logged:
{noformat}
22:01:29,500 ERROR (jgroups-9,Test-NodeD:[]) [CacheManagerNotifierImpl] ISPN000405: Caught
exception while invoking a cache manager listener!
org.infinispan.commons.CacheListenerException: ISPN000280: Caught exception
[org.infinispan.partitionhandling.AvailabilityException] while invoking method [public
void
org.infinispan.lock.impl.lock.ClusteredLockImpl$ClusterChangeListener.viewChange(org.infinispan.notifications.cachemanagerlistener.event.ViewChangedEvent)]
on listener instance:
org.infinispan.lock.impl.lock.ClusteredLockImpl$ClusterChangeListener@3c91530d
at
org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.lambda$invoke$1(AbstractListenerImpl.java:430)
at
org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.invoke(AbstractListenerImpl.java:450)
at
org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl.invokeListener(CacheManagerNotifierImpl.java:157)
at
org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl.invokeListeners(CacheManagerNotifierImpl.java:84)
at
org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl.notifyViewChange(CacheManagerNotifierImpl.java:103)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.receiveClusterView(JGroupsTransport.java:737)
...
Caused by: org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key
'ClusteredLockKey{name=ConsistentReliabilitySplitBrainTest}' is not available. Not
all owners are in this partition
at
org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl.doCheck(PartitionHandlingManagerImpl.java:272)
at
org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl.checkRead(PartitionHandlingManagerImpl.java:114)
at
org.infinispan.factories.InternalCacheFactory$PartitionHandlingCache.get(InternalCacheFactory.java:308)
at
org.infinispan.factories.InternalCacheFactory$PartitionHandlingCache.get(InternalCacheFactory.java:306)
at
org.infinispan.factories.InternalCacheFactory$AbstractGetAdvancedCache.containsKey(InternalCacheFactory.java:257)
at
org.infinispan.cache.impl.AbstractDelegatingCache.containsKey(AbstractDelegatingCache.java:384)
at org.infinispan.cache.impl.EncoderCache.containsKey(EncoderCache.java:618)
at
org.infinispan.lock.impl.manager.EmbeddedClusteredLockManager.isDefined(EmbeddedClusteredLockManager.java:157)
at
org.infinispan.lock.impl.lock.ClusteredLockImpl$ClusterChangeListener.viewChange(ClusteredLockImpl.java:335)
at jdk.internal.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at
org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.lambda$invoke$1(AbstractListenerImpl.java:424)
{noformat}
When the cache goes back to AVAILABLE mode, there is no other check to see if the lock
owner has come back into the cluster or not, so the lock may stay forever owned by a
crashed node.
E.g. the initial cluster is ABCD, D owns clustered lock L
# The cluster splits into 3 partitions: AB, C, D
# LOCKS cache enters DEGRADED mode
# A and B try to unlock L, but fail
# D crashes
# C merges back with AB
# LOCKS cache becomes AVAILABLE
# L remains owned by D
Unlocking the locks on cluster view changes is also problematic. Because the LOCKS cache
enters DEGRADED mode *after* the cluster view change, if the LOCKS cache is distributed,
then it is theoretically possible for a lock to be unlocked and then for its owner to
merge back:
E.g. the initial cluster is ABCD, D owns clustered lock L
# The cluster splits into 2 partitions: AB and CD
# A and B are the 2 owners of L, and A unlocks L
# The LOCKS cache enters DEGRADED mode
# The partitions merge back
# The LOCKS cache becomes AVAILABLE again
# D thinks it still owns L, but other nodes are able to acquire it
--
This message was sent by Atlassian Jira
(v8.13.1#813001)