[jboss-jira] [JBoss JIRA] Created: (JGRP-775) Race condition in RouterStubTest

Richard Achmatowicz (JIRA) jira-events at lists.jboss.org
Wed Jun 4 09:49:21 EDT 2008


Race condition in RouterStubTest
--------------------------------

                 Key: JGRP-775
                 URL: http://jira.jboss.com/jira/browse/JGRP-775
             Project: JGroups
          Issue Type: Bug
         Environment: Fedora 8, x86
            Reporter: Richard Achmatowicz
         Assigned To: Bela Ban
            Priority: Minor
             Fix For: 2.6.4, 2.7


An intermittent failure was observed with RouterStubTest on Fedora:

I've been investigating further the problem I had with RouterStubTest on Fedora. It's a repeatable test failure. I think I have found the problem.

The test involves a test client using RouterStub to communicate with a running instance of GossipRouter:
(i) one router stub connects to a group X on the GossipRouter
(ii) another router stub connects to the same group X on the GossipRouter
(iii) the first router stub sends a message to X
(iii) the second router stub receives the message from X
The test would fail by the second router stub blocking on a message that never arrives. This failure would happen intermittently.

When a router stub connects to a group via a GossipRouter, a SocketThread is initialised at the GossipRouter to represent that router stub and process any incoming messages from that router stub. The SocketThreads for other peers also need to be in place once routing decisions have been made and messages are to be sent out to their destinations.

By looking at the trace log, I found out that it is possible for the SocketThread representing the first router stub to be initialised and for the first message to be sent *before* the second router stub gets initialised, despite the fact that the two router stubs are initialised on the client side before the message is sent. Here is an example trace where this happens:

    [junit] 6 [INFO] RouterStubTest.test_CONNECT_Route_To_All(): - running test_CONNECT_Route_To_All
    [junit] 10 [DEBUG] GossipRouter.mainLoop(): - CONNECT(TESTGROUP, 127.0.0.1:32810)
    [junit] -- my address is 127.0.0.1:32810
    [junit] -- my address is 127.0.0.1:51927
    [junit] 22 [DEBUG] GossipRouter$SocketThread.run(): - socket thread(127.0.0.1:32810): blocking on read
    [junit] 27 [DEBUG] GossipRouter$SocketThread.run(): - request routed by 127.0.0.1:32810
    [junit] 27 [DEBUG] GossipRouter$SocketThread.run(): - socket thread(127.0.0.1:32810): blocking on read
    [junit] 28 [DEBUG] GossipRouter.mainLoop(): - CONNECT(TESTGROUP, 127.0.0.1:51927)
    [junit] 29 [DEBUG] GossipRouter$SocketThread.run(): - socket thread(127.0.0.1:51927): blocking on read
(the second peer blocks indefinitely here...)
    [junit] 59964 [DEBUG] GossipRouter.sweep(): - removed 127.0.0.1:32810 (59952 msecs old)
    [junit] 59965 [DEBUG] GossipRouter.sweep(): - removed 127.0.0.1:51927 (59935 msecs old)
    [junit] 59966 [DEBUG] GossipRouter.sweep(): - done (removed 2 entries)
    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
    [junit] Test org.jgroups.tests.stack.RouterStubTest FAILED (timeout)

You can see that 127.0.0.1:32810 blocks on a read after initialization of the SocketThread, reads the sent request and routes it in the next line, and then waits for a further request from the first router stub. After all this happens, the second router stub tries to connect and then blocks waiting for a message which will never arrive.

The test failure was fixed by adding in a delay of 1 second between the connect calls of the two router stubs and the sending of the first message. 
Bela suggested fixing this in some way; possibly by preventing a RouterStub returning from a connect(String groupname) call before the GossipRouter has completed initialization of the group member on the GossipRouter.



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list