[jboss-jira] [JBoss JIRA] Commented: (JGRP-1028) RpcDispatcher locked
Bela Ban (JIRA)
jira-events at lists.jboss.org
Wed Nov 4 06:03:05 EST 2009
[ https://jira.jboss.org/jira/browse/JGRP-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12493022#action_12493022 ]
Bela Ban commented on JGRP-1028:
--------------------------------
You need to move the JChannel.connect() *after* the creation of RPCDispatcher then this works, e.g.:
@Test
public void testTransfer() throws Exception {
JChannel channel=new JChannel(STACK);
RpcDispatcher dispatcher=new RpcDispatcher(channel, null, null, this);
channel.connect("TestChannel");
> RpcDispatcher locked
> --------------------
>
> Key: JGRP-1028
> URL: https://jira.jboss.org/jira/browse/JGRP-1028
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.7
> Reporter: Holger Rehn
> Assignee: Bela Ban
> Fix For: 2.8
>
> Attachments: JGroupsBlockingTest.zip
>
>
> This follows up forum post https://sourceforge.net/forum/message.php?msg_id=7585933.
> Summary:
> There are circumstances that lead to RpcDispatcher being locked without obvious reason in the following state:
> "ICP1" daemon prio=10 tid=0x000000002ce25000 nid=0x125c waiting on condition [0x00000000377df000..0x00000000377df810] java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x0000000024b45ee0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
> at org.jgroups.blocks.GroupRequest.collectResponses(GroupRequest.java:512)
> at org.jgroups.blocks.GroupRequest.execute(GroupRequest.java:266)
> at org.jgroups.blocks.GroupRequest.execute(GroupRequest.java:231)
> at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:576)
> at org.jgroups.blocks.RpcDispatcher.callRemoteMethod(RpcDispatcher.java:323)
> ...
> The problem appears with different versions of JGroups (2.7, 2.8 at least) and all the different stack configurations I tried so far (see attached project for a few examples).
> As I already described in the mentioned thread, one of our test machines seems to play a special role for reproducing the error. Now that I have a working test for this I discovered another weird thing: the direction for sending data plays an important role. If my workstation is sending data to the test server, JGroups stalls after a few seconds. If the server is sending data to my machine, the problem simply disappears.
> However, I have found another way to get JGroups blocking that I hope could have the same cause. Please find attached an Eclipse project containing a simple test class you could either start as a Java application or as a TestNG/JUnit test. Simply run this on different machines or multiple instances on the same machine (I've tested with 4 different machines including a VMWare system, any combination of these and every machine on it's own showed the described blocking immediately).
> The test just joins a channel and if it's the first member, it goes into "primary" mode, meaning it will start sending byte arrays via RPCDispatcher to any other member of the channel as soon as there is at least one other member. If an instance of the test finds itself to be the last member in the channel, it also switches to primary mode to allow test nodes to be started and stopped arbitrarily.
> The problem(s) showed by this test are:
> 1. If timing is right and the data send in the RPC is small enough, JGroups blocks instantly without getting even a single RPC through. This can be reproduced on any machine (remove the Thread.sleep(1) from line 80 if you don't get the error).
> 2. JGroups blocks at some point after a random number of RPC calls with random data succeeded. After the first blocked call has been aborted by a TimeoutException, any further RPC will end up blocking, as well. This seems to depend on the environment and currently can only be safely reproduced here when the RPC target is a certain machine. This also doesn't seem to happen if using fixed size RPC data that is known to get through at least once. You could try reproducing this in your environment by commenting/uncommenting a few lines in the test class (see code comments).
> I really hope both problems have the same cause and since none of them is exactly what we encounter in our product (no TimeoutException there, blocking after a fixed amount of calls) I can just pray fixing problem 1 will solve the others as well.
> Cheers,
> momo
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list