[jboss-jira] [JBoss JIRA] Commented: (JGRP-527) MuxChannel stuck
Bryce Alcock (JIRA)
jira-events at lists.jboss.org
Sun Jun 10 01:07:12 EDT 2007
[ http://jira.jboss.com/jira/browse/JGRP-527?page=comments#action_12364806 ]
Bryce Alcock commented on JGRP-527:
-----------------------------------
Ok, I have done some more investigation:
Here is my conclusion:
Looking at the code below, an infinite loop is set up:
here is the senario.
If the member joining is not the coordinator you will fall down in the
line 196. If that line causes a time out exception, (Which is what happens
in the getResultWithTimeout(8000); )
you will miss the break on line 210, which will send you back up to the top.
This will continue forever (At least from what I can tell.)
I am not sure if this should be restructured. But at least in my case
this is causing an infinite loop.
mux.Multiplexer.java -------------------------------------------------------------------------
/**
176 * Fetches the map of services and hosts from the coordinator (Multiplexer). No-op if we are the coordinator
177 */
178 public void fetchServiceInformation() throws Exception {
179 while(true) {
180 Address coord=getCoordinator(), local_address=channel != null? channel.getLocalAddress() : null;
181 boolean is_coord=coord != null && local_address != null && local_address.equals(coord);
182 if(is_coord) {
183 if(log.isTraceEnabled())
184 log.trace("I'm coordinator, will not fetch service state information");
185 break;
186 }
187
188 ServiceInfo si=new ServiceInfo(ServiceInfo.STATE_REQ, null, null, null);
189 MuxHeader hdr=new MuxHeader(si);
190 Message state_req=new Message(coord, null, null);
191 state_req.putHeader(NAME, hdr);
192 service_state_promise.reset();
193 channel.send(state_req);
194
195 try {
196 byte[] state=(byte[])service_state_promise.getResultWithTimeout(8000);
197 if(state != null) {
198 Map new_state=(Map)Util.objectFromByteBuffer(state);
199 synchronized(service_state) {
200 service_state.clear();
201 service_state.putAll(new_state);
202 }
203 if(log.isTraceEnabled())
204 log.trace("service state was set successfully (" + service_state.size() + " entries)");
205 }
206 else {
207 if(log.isWarnEnabled())
208 log.warn("received service state was null");
209 }
210 break;
211 }
212 catch(TimeoutException e) {
213 if(log.isTraceEnabled())
214 log.trace("timed out waiting for service state from " + coord + ", retrying");
215 }
216 }
> MuxChannel stuck
> ----------------
>
> Key: JGRP-527
> URL: http://jira.jboss.com/jira/browse/JGRP-527
> Project: JGroups
> Issue Type: Bug
> Reporter: Bela Ban
> Assigned To: Vladimir Blagojevic
> Fix For: 2.5
>
>
> [from Bryce Alcock]
> JGroups Users:
> I am getting what appears to me on the surface to be a dead lock.
> Here is the stack trace:
> "***************BLA-STUCK THREAD**********************" prio=1 tid=0x08448ed8 nid=0x78dc in Object.wait() [0xb0c10000..0
> xb0c110b0]
> at java.lang.Object.wait(Native Method)
> - waiting on <0x88f500c8> (a org.jgroups.util.Promise)
> at org.jgroups.util.Promise.doWait(Promise.java:104)
> at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:60)
> at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:28)
> - locked <0x88f500c8> (a org.jgroups.util.Promise)
> at org.jgroups.mux.Multiplexer.fetchServiceInformation(Multiplexer.java:196)
> at org.jgroups.JChannelFactory.connect(JChannelFactory.java:355)
> - locked <0x88f37fe0> (a org.jgroups.JChannelFactory$Entry)
> at org.jgroups.mux.MuxChannel.connect(MuxChannel.java:126)
> - locked <0x88f63740> (a org.jgroups.mux.MuxChannel)
> at scheduledtaskexecuteframework.group.GenericMultiplexer.connect(GenericMultiplexer.java:83)
> at scheduledtaskexecuteframework.schedule.test.MainTest$1.run(MainTest.java:60)
> i?
> The Senarios is easily reproduced in my system:
> I have 2 members of a MuxChannel join and do some work.
> I then have a 3rd join.
> then I have the second member quit.
> wait about 5 mins and have the second member join.
> the second member will get stuck like this.
> However,
> If I dont have the 3rd member join, and just have the second member leave wait five mins and come back
> things work fine every time.
> here is the line of code that is both holding the mutex and asking for it (apperently in different threads)
> byte[] state=(byte[])service_state_promise.getResultWithTimeout(2000);
> I am more then willing to give more details about the situation, however,
> I am looking for Ideas on how to debug this.
> I am using java 1.5.0_11
> I am using JGroups-2.4.1-sp3
> Bryce
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list