[jboss-jira] [JBoss JIRA] Assigned: (JGRP-775) Race condition in RouterStubTest

Bela Ban (JIRA) jira-events at lists.jboss.org
Tue Jul 15 12:09:34 EDT 2008


     [ http://jira.jboss.com/jira/browse/JGRP-775?page=all ]

Bela Ban reassigned JGRP-775:
-----------------------------

    Assignee: Vladimir Blagojevic  (was: Bela Ban)

> Race condition in RouterStubTest
> --------------------------------
>
>                 Key: JGRP-775
>                 URL: http://jira.jboss.com/jira/browse/JGRP-775
>             Project: JGroups
>          Issue Type: Bug
>         Environment: Fedora 8, x86
>            Reporter: Richard Achmatowicz
>         Assigned To: Vladimir Blagojevic
>            Priority: Minor
>             Fix For: 2.6.4, 2.7
>
>
> An intermittent failure was observed with RouterStubTest on Fedora:
> I've been investigating further the problem I had with RouterStubTest on Fedora. It's a repeatable test failure. I think I have found the problem.
> The test involves a test client using RouterStub to communicate with a running instance of GossipRouter:
> (i) one router stub connects to a group X on the GossipRouter
> (ii) another router stub connects to the same group X on the GossipRouter
> (iii) the first router stub sends a message to X
> (iii) the second router stub receives the message from X
> The test would fail by the second router stub blocking on a message that never arrives. This failure would happen intermittently.
> When a router stub connects to a group via a GossipRouter, a SocketThread is initialised at the GossipRouter to represent that router stub and process any incoming messages from that router stub. The SocketThreads for other peers also need to be in place once routing decisions have been made and messages are to be sent out to their destinations.
> By looking at the trace log, I found out that it is possible for the SocketThread representing the first router stub to be initialised and for the first message to be sent *before* the second router stub gets initialised, despite the fact that the two router stubs are initialised on the client side before the message is sent. Here is an example trace where this happens:
>     [junit] 6 [INFO] RouterStubTest.test_CONNECT_Route_To_All(): - running test_CONNECT_Route_To_All
>     [junit] 10 [DEBUG] GossipRouter.mainLoop(): - CONNECT(TESTGROUP, 127.0.0.1:32810)
>     [junit] -- my address is 127.0.0.1:32810
>     [junit] -- my address is 127.0.0.1:51927
>     [junit] 22 [DEBUG] GossipRouter$SocketThread.run(): - socket thread(127.0.0.1:32810): blocking on read
>     [junit] 27 [DEBUG] GossipRouter$SocketThread.run(): - request routed by 127.0.0.1:32810
>     [junit] 27 [DEBUG] GossipRouter$SocketThread.run(): - socket thread(127.0.0.1:32810): blocking on read
>     [junit] 28 [DEBUG] GossipRouter.mainLoop(): - CONNECT(TESTGROUP, 127.0.0.1:51927)
>     [junit] 29 [DEBUG] GossipRouter$SocketThread.run(): - socket thread(127.0.0.1:51927): blocking on read
> (the second peer blocks indefinitely here...)
>     [junit] 59964 [DEBUG] GossipRouter.sweep(): - removed 127.0.0.1:32810 (59952 msecs old)
>     [junit] 59965 [DEBUG] GossipRouter.sweep(): - removed 127.0.0.1:51927 (59935 msecs old)
>     [junit] 59966 [DEBUG] GossipRouter.sweep(): - done (removed 2 entries)
>     [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
>     [junit] Test org.jgroups.tests.stack.RouterStubTest FAILED (timeout)
> You can see that 127.0.0.1:32810 blocks on a read after initialization of the SocketThread, reads the sent request and routes it in the next line, and then waits for a further request from the first router stub. After all this happens, the second router stub tries to connect and then blocks waiting for a message which will never arrive.
> The test failure was fixed by adding in a delay of 1 second between the connect calls of the two router stubs and the sending of the first message. 
> Bela suggested fixing this in some way; possibly by preventing a RouterStub returning from a connect(String groupname) call before the GossipRouter has completed initialization of the group member on the GossipRouter.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list