[
http://jira.jboss.com/jira/browse/JGRP-770?page=comments#action_12416332 ]
Vladimir Blagojevic commented on JGRP-770:
------------------------------------------
Here is why there is a deadlock using flush-udp.xml:
Incoming thread T goes up the stack, locks window in NAKACK or UNICAST and then attempts
locking of a lock L in GroupRequest#viewChange (for example sake). At the same time user
thread pushing a message down through castMessage already locked L through
GroupRequest#doExecute but gets blocked in FLUSH#down. However, FLUSH cannot get unblocked
since T already holds a lock on a window in NAKACK and thus FLUSH message cannot arrive to
unblock FLUSH since it cannot obtain a lock on that same window in NAKACK held by thread
T.Bingo, a deadlock!
This is why ChannelConcurrencyTest using a MessageDispatcher works with udp but does not
with flush-udp. Also for the same reason basic ChannelConcurrencyTest without a
MessageDispatcher works on both flush-udp and udp.
For the sake of proof - I reorganized GroupRequest#execute so that message is sent down
without holding a lock L. As expected ChannelConcurrencyTest started to work -
consistently!
Concurrent startup of many channels doesn't stabilize
-----------------------------------------------------
Key: JGRP-770
URL:
http://jira.jboss.com/jira/browse/JGRP-770
Project: JGroups
Issue Type: Bug
Reporter: Bela Ban
Assigned To: Vladimir Blagojevic
Fix For: 2.7, 2.6.3
[Brian Goose]
Jgroups 2.6.2, jdk1.6u4, and OS "Linux bubba 2.6.24-16-generic #1 SMP
Thu Apr 10 13:23:42 UTC 2008 i686 GNU/Linux"
I have 8 machines that all join a channel when they startup. Usually,
they eventually merge into one view with 8 machines in it, but sometimes
I end up with one view of 7 members and one view of 1 member. It never
merges, even after waiting for an hour or two.
I've created a unit test that demonstrates this problem while using the
flush-udp stack. About 20% of the time the views will never merge into
one big view, but instead stays as two separate views, continually
trying to merge and failing. Do I have something set up wrong here?
Should this test work reliably?
I'm running with these java switches:
-ea
-Djava.net.preferIPv4Stack=true
-Dorg.apache.commons.logging.LogFactory=org.apache.commons.logging.impl.
LogFactoryImpl
The test class is attached.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira