[jboss-jira] [JBoss JIRA] (JGRP-2234) Unlocked locks stay locked forever

Tue Feb 6 07:24:00 EST 2018

    [ https://issues.jboss.org/browse/JGRP-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529155#comment-13529155 ] 

Bela Ban edited comment on JGRP-2234 at 2/6/18 7:23 AM:
--------------------------------------------------------

Clients need to have the following information:
* Locks they acquired
* Pending lock requests; locks which they want to acquire but for which they haven't yet received a LOCK_GRANTED response
* Pending lock release requests; lock that have been released, but for which no RELEASE_LOCK_OK response has been received
* Ditto for conditions, but we'll tackle them in a second stage

The reconciliation protocol queues all new requests on the coord and asks all members for their lock information. Once the coord has received this information from all members, it applies this and the drains the queue of pending requests.

It is important that the requests are ordered per member, ie. a release(L) cannot come before a lock(L).

Since {{CENTRAL_LOCK}} allows for multiple members to hold the same lock in a split brain scenario, we need to think about how to handle merging where the coord detects that multiple members hold the same lock...

was (Author: belaban):
Clients need to have the following information:
* Locks they acquired
* Pending lock requests; locks which they want to acquire but for which they haven't yet received a LOCK_GRANTED response
* Pending lock release requests; lock that have been released, but for which no RELEASE_LOCK_OK response has been received

The reconciliation protocol queues all new requests on the coord and asks all members for their lock information. Once the coord has received this information from all members, it applies this and the drains the queue of pending requests.

It is important that the requests are ordered per member, ie. a release(L) cannot come before a lock(L).

Since {{CENTRAL_LOCK}} allows for multiple members to hold the same lock in a split brain scenario, we need to think about how to handle merging where the coord detects that multiple members hold the same lock...

> Unlocked locks stay locked forever
> ----------------------------------
>
>                 Key: JGRP-2234
>                 URL: https://issues.jboss.org/browse/JGRP-2234
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Bram Klein Gunnewiek
>            Assignee: Bela Ban
>             Fix For: 4.0.11
>
>         Attachments: ClusterSplitLockTest.java, jg_clusterlock_output_testfail.txt
>
>
> As discussed in the mailing list we have issues where locks from the central lock protocol stay locked forever when the coordinator of the cluster disconnects. We can reproduce this with the attached ClusterSplitLockTest.java. Its a race condition and we need to run the test a lot of times (sometimes > 20) before we encounter a failure. 
> What we think is happening: 
> In a three node cluster (node A, B and C where node A is the coordinator) unlock requests from B and/or C can be missed when node A leaves and B and/or C don't have the new view installed yet. When, for example, node B takes over coordination it creates the lock table based on the back-ups. Lets say node C has locked the lock with name 'lockX'. Node C performs an unlock of 'lockX' just after node A (gracefully) leaves and sends the unlock request to node A since node C doesn't have the correct view installed yet. Node B has recreated the lock table where 'lockX' is locked by Node C. Node C doesn't resend the unlock request so 'lockX' gets locked forever.
> Attached is the testng test we wrote and the output of a test failure.

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)