[jboss-jira] [JBoss JIRA] (JGRP-2234) Unlocked locks stay locked forever

Mon Jan 22 16:14:00 EST 2018

    [ https://issues.jboss.org/browse/JGRP-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13522178#comment-13522178 ] 

David White commented on JGRP-2234:
-----------------------------------

Wow - I'm glad someone else has run into this issue and has been able to replicate a stand-alone test case.  I too was desperately trying to figure out if this was just a configuration issue, or a bug in JGroups.  We have a 5 node cluster using the CENTRAL_LOCK with the Lock Service and one backup node. All 5 nodes are competing for access for the same resource to lock.

We see a similar scenario where if the Coordinator and backup nodes leave the cluster, sometimes the unlock request from a node is "successfully" processed but yet a dirty lock state is transferred to the node becoming the new coordinator. Subsequently, that resource is permanently locked.

It looks like Bela has fixed this in version 4.0.10. Hopefully, our issue is the same one that has now been fixed.

> Unlocked locks stay locked forever
> ----------------------------------
>
>                 Key: JGRP-2234
>                 URL: https://issues.jboss.org/browse/JGRP-2234
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Bram Klein Gunnewiek
>            Assignee: Bela Ban
>             Fix For: 4.0.10
>
>         Attachments: ClusterSplitLockTest.java, jg_clusterlock_output_testfail.txt
>
>
> As discussed in the mailing list we have issues where locks from the central lock protocol stay locked forever when the coordinator of the cluster disconnects. We can reproduce this with the attached ClusterSplitLockTest.java. Its a race condition and we need to run the test a lot of times (sometimes > 20) before we encounter a failure. 
> What we think is happening: 
> In a three node cluster (node A, B and C where node A is the coordinator) unlock requests from B and/or C can be missed when node A leaves and B and/or C don't have the new view installed yet. When, for example, node B takes over coordination it creates the lock table based on the back-ups. Lets say node C has locked the lock with name 'lockX'. Node C performs an unlock of 'lockX' just after node A (gracefully) leaves and sends the unlock request to node A since node C doesn't have the correct view installed yet. Node B has and recreated the lock table where 'lockX' is locked by Node C. Node C doesn't resend the unlock request so 'lockX' gets locked forever.
> Attached is the testng test we wrote and the output of a test failure.

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)