Jason T. Greene wrote:
That should be ok though because the CCL will still timeout on that
lock. The original problem was that the same thread with the CCL lock
was blocking on an FC lock so that the CCL lock would never be released
(since the FC lock was higher in the stack).
See my previous reply. Yes, it blocked on FC.down() because it didn't
receive credits in up(). But up() wasn't called because there was a
replication message ahead of it in the queue that blocked on the FQN
held by the CCL.
So to tackle this, my suggestion were, in this order:
#1 Don't hold a lock while making a synchronous cluster method call.
That's a big no no, especially in pre-2.5 releases. We had lots of bugs
in the clustering code due to such code. Then Brian cleaned up all of
it... :-)
#2 The timeout mechanism in JGroups which uses threads. Ugly, and a
hack, and only needed for 2.4. As I argued, this will avoid the
deadlock, but it will constantly time out (assuming some traffic).
The root cause of this is #1
--
Bela Ban
Lead JGroups / JBoss Clustering team
JBoss - a division of Red Hat