[jboss-jira] [JBoss JIRA] Updated: (JGRP-527) MuxChannel stuck

Bryce Alcock (JIRA) jira-events at lists.jboss.org
Thu Jun 14 17:33:11 EDT 2007


     [ http://jira.jboss.com/jira/browse/JGRP-527?page=all ]

Bryce Alcock updated JGRP-527:
------------------------------

    Attachment: dist.tar.gz

Reproduction Steps:

Modify the src/ProtocolStacks.xml file
every where where you see 168.162.76.174 put your first non-local non-loopback network device ipAddress.
If you don't know what that IP is you can run the command once and look for 
the IP that is printed in the GMS output (e.g.
# java -jar JGroupsMuliplexerTest.jar s3

will give something like this, and that is the IP Address that 
must match all the entries in the ProtocolStack.xml that are currently
set to 168.162.76.174.
-------------------------------------------------------
GMS: address is 168.162.76.174:7843
-------------------------------------------------------
)

Once you have done that you are ready.

STEPS:
1.  Open 5 windows. (Unix Prompts)
2.  In the first type 'java -jar JGroupsMuliplexerTest.jar s3'
3.  In the second type 'java -jar JGroupsMuliplexerTest.jar s3 go'
4.  In the Thrid type  'java -jar JGroupsMuliplexerTest.jar m1'
5.  Wait until you see something like (it is not really important 400 or higher is plenty to see the hang.)  
    WHandler size: 502
    WHandler size: 503
6.  in the 4th window type 'java -jar JGroupsMuliplexerTest.jar w2'
7.  kill the job in the second window. (e.g. very shortly after that starts up and connects go to the 5th window and type the following commands:
    'ps -auxwww |grep "s3 go" |grep -v grep'
    'kill -9 [pid from above]
8.  Wait until you see a 
    WHandler size: 0
9.  try to restart the one in the second window with the command:
    'java -jar JGroupsMuliplexerTest.jar s3 go'
10.  type the following commands:
     'ps -auxwww |grep "s3 go" |grep -v grep'
     'kill -3 [pid from above]'

Note Kill -3 will not actaully kill the process, but will only dump the 
current Thread traces.
you will be able to keep doing it repeatedly, and you can see the stuck thread.

LOOK FOR :
"***************BLA-STUCK THREAD**********************" prio=1 tid=0x08449430 nid=0x107c in Object.wait() [0xb0c82000..0xb0c83030]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x889cc130> (a org.jgroups.util.Promise)
        at org.jgroups.util.Promise.doWait(Promise.java:104)
        at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:60)
        at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:28)
        - locked <0x889cc130> (a org.jgroups.util.Promise)
        at org.jgroups.mux.Multiplexer.fetchServiceInformation(
   at org.jgroups.JChannelFactory.connect(JChannelFactory.java:355)
        - locked <0x88f8c010> (a org.jgroups.JChannelFactory$Entry)
        at org.jgroups.mux.MuxChannel.connect(MuxChannel.java:126)
        - locked <0x889cc1f0> (a org.jgroups.mux.MuxChannel)
        at jgroupsmuliplexertest.GenericMultiplexer.connect(GenericMultiplexer.java:83)
        at jgroupsmuliplexertest.MainTest$1.run(MainTest.java:59)



That is the Stuck Thread....



> MuxChannel stuck
> ----------------
>
>                 Key: JGRP-527
>                 URL: http://jira.jboss.com/jira/browse/JGRP-527
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Bela Ban
>         Assigned To: Vladimir Blagojevic
>             Fix For: 2.5
>
>         Attachments: dist.tar.gz
>
>
> [from Bryce Alcock]
> JGroups Users:
> I am getting what appears to me on the surface to be a dead lock.
> Here is the stack trace:
> "***************BLA-STUCK THREAD**********************" prio=1 tid=0x08448ed8 nid=0x78dc in Object.wait() [0xb0c10000..0
> xb0c110b0]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x88f500c8> (a org.jgroups.util.Promise)
>         at org.jgroups.util.Promise.doWait(Promise.java:104)
>         at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:60)
>         at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:28)
>         - locked <0x88f500c8> (a org.jgroups.util.Promise)
>         at org.jgroups.mux.Multiplexer.fetchServiceInformation(Multiplexer.java:196)
>         at org.jgroups.JChannelFactory.connect(JChannelFactory.java:355)
>         - locked <0x88f37fe0> (a org.jgroups.JChannelFactory$Entry)
>         at org.jgroups.mux.MuxChannel.connect(MuxChannel.java:126)
>         - locked <0x88f63740> (a org.jgroups.mux.MuxChannel)
>         at scheduledtaskexecuteframework.group.GenericMultiplexer.connect(GenericMultiplexer.java:83)
>         at scheduledtaskexecuteframework.schedule.test.MainTest$1.run(MainTest.java:60)
> i?
> The Senarios is easily reproduced in my system:
> I have 2 members of a MuxChannel join and do some work.
> I then have a 3rd join.
> then I have the second member quit.
> wait about 5 mins and have the second member join.
> the second member will get stuck like this.
> However,
> If I dont have the 3rd member join, and just have the second member leave wait five mins and come back
> things work fine every time.
> here is the line of code that is both holding the mutex and asking for it (apperently in different threads)
>  byte[] state=(byte[])service_state_promise.getResultWithTimeout(2000);
> I am more then willing to give more details about the situation, however,
> I am looking for Ideas on how to debug this.
> I am using java 1.5.0_11
> I am using JGroups-2.4.1-sp3
> Bryce

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list