[jboss-jira] [JBoss JIRA] Commented: (JGRP-527) MuxChannel stuck

Bryce Alcock (JIRA) jira-events at lists.jboss.org
Sun Jun 10 13:45:11 EDT 2007


    [ http://jira.jboss.com/jira/browse/JGRP-527?page=comments#action_12364808 ] 
            
Bryce Alcock commented on JGRP-527:
-----------------------------------

I made some more interesting discoveries:

Basically I loaded up the Multiplexer.java with some System.outs, (here are my changes)
and here is the result.
Note that this just repeats forever. (Well I actaully did let it run overnight just to see, and sure enough, the next morning it was still spitting out the same messages...)

    /**
     * Fetches the map of services and hosts from the coordinator (Multiplexer). No-op if we are the coordinator
     */
    public void fetchServiceInformation() throws Exception {
        while(true) {
            Address coord=getCoordinator(), local_address=channel != null? channel.getLocalAddress() : null;
            boolean is_coord=coord != null && local_address != null && local_address.equals(coord);
            if(is_coord) {
                if(log.isTraceEnabled())
                    log.trace("I'm coordinator, will not fetch service state information");
                break;
            }

            ServiceInfo si=new ServiceInfo(ServiceInfo.STATE_REQ, null, null, null);
            MuxHeader hdr=new MuxHeader(si);
            Message state_req=new Message(coord, null, null);
            state_req.putHeader(NAME, hdr);
            service_state_promise.reset();
            System.out.println("state_req.toString():" +state_req.toString());
            channel.send(state_req);
            System.out.println("channel.toString():" +channel.toString());
            JChannel chan = (JChannel) channel;
            System.out.println("protocolSpec"+chan.printProtocolSpec(true));
            System.out.println("chan.toString(treu) True....."+chan.toString(true));
            System.out.println("chan.toString"+chan.toString());

            try {
                System.out.println("service_state_promise is of type" + service_state_promise.getClass().getName());
                System.out.println("service_state_promise toString()" + service_state_promise.toString());

                byte[] state=(byte[])service_state_promise.getResultWithTimeout(8000);
                if(state != null) {
                    Map new_state=(Map)Util.objectFromByteBuffer(state);
                    synchronized(service_state) {
                        service_state.clear();
                        service_state.putAll(new_state);
                    }
                    if(log.isTraceEnabled())
                        log.trace("service state was set successfully (" + service_state.size() + " entries)");
                }
                else {
                    if(log.isWarnEnabled())
                        log.warn("received service state was null");
                }
                break;
            }
            catch(TimeoutException e) {
                if(log.isTraceEnabled())
                    log.trace("timed out waiting for service state from " + coord + ", retrying");
            }
        }
    }





state_req.toString():[dst: 192.168.1.77:7820, src: <null> (1 headers), size = 0 bytes]
channel.toString():org.jgroups.JChannel at 1977b9b
protocolSpecFRAG2
frag_size=65000

FC
min_threshold=0.10
max_credits=2000000

GMS
shun=true
print_local_addr=true
view_bundling=true
join_timeout=300
join_retry_timeout=2000

STABLE
max_bytes=400000
stability_delay=1000
desired_avg_gossip=50000

NAKACK
max_xmit_size=60000
retransmit_timeout=300,600,1200,2400,4800
use_mcast_xmit=false
discard_delivered_msgs=true
gc_lag=0

VERIFY_SUSPECT
timeout=1500

FD
max_tries=5
shun=true
timeout=10000

FD_SOCK

MERGE2
max_interval=100000
min_interval=2000

TCPPING
port_range=5
num_initial_members=3
initial_hosts=192.168.1.77[7820]
timeout=30

TCP
discard_incompatible_packets=true
sock_conn_timeout=300
bind_addr=192.168.1.77
use_send_queues=false
start_port=7820
recv_buf_size=20000000
skip_suspected_members=true
send_buf_size=640000
use_incoming_packet_handler=true
loopback=fasle


chan.toString(treu) True.....local_addr=192.168.1.77:7821
cluster_name=dispatcher_7840
my_view=[192.168.1.77:7820|6] [192.168.1.77:7820, 192.168.1.77:7822, 192.168.1.77:7821]
connected=true
closed=false
incoming queue size=0
receive_blocks=true
receive_local_msgs=false
auto_reconnect=false
auto_getstate=false
state_transfer_supported=false
props=scheduledtaskexecuteframework.TCP(use_send_queues=false;bind_addr=first-non-local;sock_conn_timeout=300;loopback=fasle;skip_suspected_members=true;discard_incompatible_packets=true;recv_buf_size=20000000;start_port=7820;use_incoming_packet_handler=true;send_buf_size=640000):TCPPING(num_initial_members=3;initial_hosts=192.168.1.77[7820];port_range=5;timeout=30):MERGE2(min_interval=2000;max_interval=100000):FD_SOCK:FD(max_tries=5;timeout=10000;shun=true):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(gc_lag=0;use_mcast_xmit=false;retransmit_timeout=300,600,1200,2400,4800;discard_delivered_msgs=true;max_xmit_size=60000):pbcast.STABLE(desired_avg_gossip=50000;max_bytes=400000;stability_delay=1000):pbcast.GMS(print_local_addr=true;view_bundling=true;join_timeout=300;join_retry_timeout=2000;shun=true):FC(max_credits=2000000;min_threshold=0.10):FRAG2(frag_size=65000)

chan.toStringorg.jgroups.JChannel at 1977b9b
service_state_promise is of typeorg.jgroups.util.Promise
service_state_promise toString()hasResult=false,result=null
_gteResultsWithTimeout8000
_gteResultsWithTimeout start:1181496411298
while timeout, time_to_wait, start, current_time, elapsed8000,8000,1181496411298,1181496411298,0
timeout <=0 doWait(timeout) called
doWait(time_to_wait) : 8000
time_to_wait=timeout - (System.currentTimeMillis() - start);
-6=8000 - (1181496419304 - 1181496411298)
while timeout, time_to_wait, start, current_time, elapsed8000,-6,1181496411298,1181496419304,8006
timeout <=0 doWait(timeout) called
******HURRAY timeout_occurred...

> MuxChannel stuck
> ----------------
>
>                 Key: JGRP-527
>                 URL: http://jira.jboss.com/jira/browse/JGRP-527
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Bela Ban
>         Assigned To: Vladimir Blagojevic
>             Fix For: 2.5
>
>
> [from Bryce Alcock]
> JGroups Users:
> I am getting what appears to me on the surface to be a dead lock.
> Here is the stack trace:
> "***************BLA-STUCK THREAD**********************" prio=1 tid=0x08448ed8 nid=0x78dc in Object.wait() [0xb0c10000..0
> xb0c110b0]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x88f500c8> (a org.jgroups.util.Promise)
>         at org.jgroups.util.Promise.doWait(Promise.java:104)
>         at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:60)
>         at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:28)
>         - locked <0x88f500c8> (a org.jgroups.util.Promise)
>         at org.jgroups.mux.Multiplexer.fetchServiceInformation(Multiplexer.java:196)
>         at org.jgroups.JChannelFactory.connect(JChannelFactory.java:355)
>         - locked <0x88f37fe0> (a org.jgroups.JChannelFactory$Entry)
>         at org.jgroups.mux.MuxChannel.connect(MuxChannel.java:126)
>         - locked <0x88f63740> (a org.jgroups.mux.MuxChannel)
>         at scheduledtaskexecuteframework.group.GenericMultiplexer.connect(GenericMultiplexer.java:83)
>         at scheduledtaskexecuteframework.schedule.test.MainTest$1.run(MainTest.java:60)
> i?
> The Senarios is easily reproduced in my system:
> I have 2 members of a MuxChannel join and do some work.
> I then have a 3rd join.
> then I have the second member quit.
> wait about 5 mins and have the second member join.
> the second member will get stuck like this.
> However,
> If I dont have the 3rd member join, and just have the second member leave wait five mins and come back
> things work fine every time.
> here is the line of code that is both holding the mutex and asking for it (apperently in different threads)
>  byte[] state=(byte[])service_state_promise.getResultWithTimeout(2000);
> I am more then willing to give more details about the situation, however,
> I am looking for Ideas on how to debug this.
> I am using java 1.5.0_11
> I am using JGroups-2.4.1-sp3
> Bryce

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list