[jboss-jira] [JBoss JIRA] (JGRP-1610) LockingService and rpc on the same cluster, tryLock() hangs

Fri Apr 26 12:20:53 EDT 2013

    [ https://issues.jboss.org/browse/JGRP-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770446#comment-12770446 ] 

Bela Ban commented on JGRP-1610:
--------------------------------

OK, it is clear why this fails:
- A invokes a *synchronous* (blocking) RPC on B
- UNICAST delivers the RPC from A at B
- The RPC tries to acquire the lock from A (who already holds it)
- A sends back a LOCK-DENIED unicast to B
- However, because B is already processing the RPC from A; the LOCK-DENIED messages won't get processed until the RPC returns. However, the RPC will only return when the lock has been granted or denied
==> A classic deadlock !

SOLUTION:
#1 Invoke the initial RPC as ASYNC RPC, or hold a future to it
#2 JGroups sends all messages which have no ordering contraints as OOBs, e.g. LOCK-GRANTED / LOCK-DENIED

> LockingService and rpc on the same cluster, tryLock() hangs
> -----------------------------------------------------------
>
>                 Key: JGRP-1610
>                 URL: https://issues.jboss.org/browse/JGRP-1610
>             Project: JGroups
>          Issue Type: Feature Request
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.3
>
>         Attachments: RpcLockingTest.java
>
>
> Hi,
> Yes, the sequence diagram only depicted the second part of my description. 
> Anyway, I've attached a test file that reproduce the problem. 
> It contains two test cases, one where the coordinator of the lock is the one who 
> sends the message first, and a second case where the non-coordinator sends 
> the message first.
> In the first case the receiver, non-coordinator, will hang in tryLock. In the second 
> case though, everything works fine. 
> Regards,
> Daniel Olausson
> On 25 March 2013 16:15, Bela Ban <belaban at yahoo.com> wrote:
>     Hi Daniel,
>     the sequence diagram differs from your description, can you submit a
>     test case (e.g. copy MessageDispatcherRSVPTest and modify it), so I can
>     take a look ?
>     I assume your RPCs are blocking (sync) and non-OOB ? Could be a
>     recursive invocation, where FIFO order (default) leads to a distributed
>     deadlock.
>     A test case would clarify what you want to do, and if I can reproduce
>     the problem, I can fix it.
>     On 3/25/13 1:54 PM, Daniel Olausson wrote:
>     > Hi,
>     >
>     > We trying to use the same channel for our lockingService and
>     > rpcDispatcher. But we are noticing some weird behavior.
>     >
>     > The end result is that lock.tryLock(lockName) never returns, which it
>     > should always do.
>     >
>     > This happens when we do the following:
>     >
>     > On computer A, we lock the lock.
>     > Do a rpc to a function on computer B, this function tries to take the
>     > lock(lock.tryLock(lockName)), but it can't because the lock is locked.
>     > This is correct behavior.
>     > Computer A unlocks the lock.
>     >
>     > On computer B we now do the same procedure, we lock the lock and do a
>     > rpc to computer A, but here is when the strange thing happens. Computer
>     > A tries to take the lock by executing tryLock, but it never returns.
>     >
>     > Here is a sequence diagram:
>     > http://www.websequencediagrams.com/cgi-bin/cdraw?lz=dGl0bGUgQXV0aGVudGljYXRpb24gU2VxdWVuY2UKCkNoYW5uZWwgMSAtPiAABAk6IGNlbnRyYWxMb2NrLnRyeUxvY2soKQAiDS0-KwAoCTI6IHJwY0Rpc3BhdGhlci5jYWxsbWV0aG9kKCJmb28iKQBfCTIAXAwyAFAYbm90ZSByaWdodCBvZiAiAF4JIjogAIEECSBibG9ja3MgZm9yZXZlcgBWDC0-LQCBQAxmb28gcmV0dXJucwoK&s=default
>     >
>     >
>     > In this example we use the standard udp.xml with <CENTRAL_LOCK/> added
>     > on the top of the stack. Everything works if we use PEER_LOCK but then
>     > we need the messages to arrive in the same order everywhere, e.g. atomic
>     > broadcast.
>     >
>     > It also works if we use different clusters for locking and rpc, but it
>     > would be convenient if we could use the same cluster.
>     >
>     >
>     > Is it recommended to use the same channel for different services?
>     >

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira