[jboss-jira] [JBoss JIRA] (JGRP-2360) DeadLock while acqiring a distributed lock consecutively by the same thread in a loop
Daniel Klosinski (Jira)
issues at jboss.org
Mon Jul 15 16:16:01 EDT 2019
Daniel Klosinski created JGRP-2360:
--------------------------------------
Summary: DeadLock while acqiring a distributed lock consecutively by the same thread in a loop
Key: JGRP-2360
URL: https://issues.jboss.org/browse/JGRP-2360
Project: JGroups
Issue Type: Bug
Affects Versions: 4.1.1, 3.6.18
Environment: JGroups-4.1.1-Final
Red Hat 4.4.7-23
JDK 1.8.0_202
Reporter: Daniel Klosinski
Assignee: Bela Ban
Attachments: DLTest.java, DistributedLockRepoducer.zip, log.log
Deadlock intermittently happens when trying to acquire a distributed lock by the same VM, consecutively by the same thread in a loop. Here is a code snippet for which this issue can occur :
{code}
for(String s : list){
Lock lock=lock_service.getLock("test_lock_name");
lock.lock();
//perform bussines logic
lock.unlock();
}
{code}
During the troubleshooting, I found out that lock_id is not being incremented for the new distributed lock. In the first two loop iterations everything was fine. At the third iteration lock_id didn't get increased:
{code}
2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: GRANT_LOCK[test_lock_name, lock_id=1, owner=svc-2-sps-34594::1]
2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: LOCK_GRANTED[test_lock_name, lock_id=1, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: RELEASE_LOCK[test_lock_name, lock_id=1, owner=svc-2-sps-34594::1]
2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: RELEASE_LOCK_OK[test_lock_name, lock_id=1, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: GRANT_LOCK[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1]
2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: LOCK_GRANTED[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: RELEASE_LOCK[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1]
2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: RELEASE_LOCK_OK[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: GRANT_LOCK[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1]
2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: CREATE_LOCK[test_lock_name, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: LOCK_GRANTED[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
{code}
I've added few extra loggers into Jgroups-4.1.1.Final code and I realized that the second client lock was not removed from the client lock table before the creation of 3rd client lock. The issue lays in below piece of code. Owner consists of address and threadID. If the same thread, on the same VM, creates distributed lock consecutively and if there is an existing entry in the client lock table for the same owner, the new lock won't be created. The old client lock will be used to acquire a new distributed lock :
{code}
protected synchronized ClientLock getLock(String name, Owner owner, boolean create_if_absent) {
Map<Owner,ClientLock> owners=table.get(name);
if(owners == null) {
if(!create_if_absent)
return null;
owners=Util.createConcurrentMap(20);
Map<Owner,ClientLock> existing=table.putIfAbsent(name,owners);
if(existing != null)
owners=existing;
}
ClientLock lock=owners.get(owner);
if(lock == null) {
if(!create_if_absent)
return null;
lock=createLock(name, owner);
owners.put(owner, lock);
}
return lock;
}
{code}
I believe that this issue was introduced by the fix for JGRP-2234 and it is caused by the race condition. The logic that deletes client lock from the client lock table is now executed when the client's VM receives RELEASE_LOCK_OK message from the coordinator. Previously this deletion was executed by the thread in which unlock() method was called. Now, it is executed by the separate thread wich handles RELEASE_LOCK_OK from the coordinator and this is why we have a care condition here.
I am attaching a simple program which can be used to reproduce and generated logs.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
More information about the jboss-jira
mailing list