[jboss-jira] [JBoss JIRA] Commented: (JGRP-527) MuxChannel stuck

Bryce Alcock (JIRA) jira-events at lists.jboss.org
Sun Jun 10 01:07:12 EDT 2007


    [ http://jira.jboss.com/jira/browse/JGRP-527?page=comments#action_12364806 ] 
            
Bryce Alcock commented on JGRP-527:
-----------------------------------

Ok,  I have done some more investigation:
Here is my conclusion:

Looking at the code below, an infinite loop is set up:
here is the senario.

If the member joining is not the coordinator you will fall down in the 
line 196.  If that line causes a time out exception, (Which is what happens 
in the getResultWithTimeout(8000); )  
you will miss the break on line 210, which will send you back up to the top.
This will continue forever (At least from what I can tell.)

I am not sure if this should be restructured. But at least in my case 
this is causing an infinite loop.

mux.Multiplexer.java   -------------------------------------------------------------------------
            /**
    176      * Fetches the map of services and hosts from the coordinator (Multiplexer). No-op if we are the coordinator
    177      */
    178     public void fetchServiceInformation() throws Exception {
    179         while(true) {
    180             Address coord=getCoordinator(), local_address=channel != null? channel.getLocalAddress() : null;
    181             boolean is_coord=coord != null && local_address != null && local_address.equals(coord);
    182             if(is_coord) {
    183                 if(log.isTraceEnabled())
    184                     log.trace("I'm coordinator, will not fetch service state information");
    185                 break;
    186             }
    187 
    188             ServiceInfo si=new ServiceInfo(ServiceInfo.STATE_REQ, null, null, null);
    189             MuxHeader hdr=new MuxHeader(si);
    190             Message state_req=new Message(coord, null, null);
    191             state_req.putHeader(NAME, hdr);
    192             service_state_promise.reset();
    193             channel.send(state_req);
    194 
    195             try {
    196                 byte[] state=(byte[])service_state_promise.getResultWithTimeout(8000);
    197                 if(state != null) {
    198                     Map new_state=(Map)Util.objectFromByteBuffer(state);
    199                     synchronized(service_state) {
    200                         service_state.clear();
    201                         service_state.putAll(new_state);
    202                     }
    203                     if(log.isTraceEnabled())
    204                         log.trace("service state was set successfully (" + service_state.size() + " entries)");
    205                 }
    206                 else {
    207                     if(log.isWarnEnabled())
    208                         log.warn("received service state was null");
    209                 }
    210                 break;
    211             }
    212             catch(TimeoutException e) { 
    213                 if(log.isTraceEnabled())
    214                     log.trace("timed out waiting for service state from " + coord + ", retrying");
    215             }
    216         }







> MuxChannel stuck
> ----------------
>
>                 Key: JGRP-527
>                 URL: http://jira.jboss.com/jira/browse/JGRP-527
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Bela Ban
>         Assigned To: Vladimir Blagojevic
>             Fix For: 2.5
>
>
> [from Bryce Alcock]
> JGroups Users:
> I am getting what appears to me on the surface to be a dead lock.
> Here is the stack trace:
> "***************BLA-STUCK THREAD**********************" prio=1 tid=0x08448ed8 nid=0x78dc in Object.wait() [0xb0c10000..0
> xb0c110b0]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x88f500c8> (a org.jgroups.util.Promise)
>         at org.jgroups.util.Promise.doWait(Promise.java:104)
>         at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:60)
>         at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:28)
>         - locked <0x88f500c8> (a org.jgroups.util.Promise)
>         at org.jgroups.mux.Multiplexer.fetchServiceInformation(Multiplexer.java:196)
>         at org.jgroups.JChannelFactory.connect(JChannelFactory.java:355)
>         - locked <0x88f37fe0> (a org.jgroups.JChannelFactory$Entry)
>         at org.jgroups.mux.MuxChannel.connect(MuxChannel.java:126)
>         - locked <0x88f63740> (a org.jgroups.mux.MuxChannel)
>         at scheduledtaskexecuteframework.group.GenericMultiplexer.connect(GenericMultiplexer.java:83)
>         at scheduledtaskexecuteframework.schedule.test.MainTest$1.run(MainTest.java:60)
> i?
> The Senarios is easily reproduced in my system:
> I have 2 members of a MuxChannel join and do some work.
> I then have a 3rd join.
> then I have the second member quit.
> wait about 5 mins and have the second member join.
> the second member will get stuck like this.
> However,
> If I dont have the 3rd member join, and just have the second member leave wait five mins and come back
> things work fine every time.
> here is the line of code that is both holding the mutex and asking for it (apperently in different threads)
>  byte[] state=(byte[])service_state_promise.getResultWithTimeout(2000);
> I am more then willing to give more details about the situation, however,
> I am looking for Ideas on how to debug this.
> I am using java 1.5.0_11
> I am using JGroups-2.4.1-sp3
> Bryce

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list