[jboss-dev] Clustering bootstrap still taking a long time

Brian Stansberry brian.stansberry at redhat.com
Thu Nov 8 10:22:34 EST 2007


Is your server bound to your VPN interface?

 > GMS: address is 10.11.14.31:32796

This sounds very similar to what the dev90 machine was experiencing when 
  doing hudson testsuite runs. There, multicast wasn't working properly 
on the bound interface. FLUSH counts on getting back a response to its 
own multicast, which it would never get.  So, every startFlush would 
hang for timeout * retries.

This gives me a chance to bring up something with the JGroups guys:

The behavior we see when FLUSH fails seems not so great -- long delay 
and then a WARN.  At least in some cases, like initial channel 
connection, it would be better if it the connect() just failed.  The 
channel is not usable.

Adrian wrote:
> The problem with the hang when booting the all configuration
> seems to be gone now, but I'm still seeing the clustering
> take a long time to bootstrap.
> 
> It always seems to be stuck here:
> 
> "main" prio=1 tid=0x80b31a10 nid=0x44a4 in Object.wait()
> [0x804d4000..0x804d6fb0]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x98e62928> (a org.jgroups.util.Promise)
>         at org.jgroups.util.Promise.doWait(Promise.java:104)
>         at
> org.jgroups.util.Promise._getResultWithTimeout(Promise.java:60)
>         at
> org.jgroups.util.Promise.getResultWithTimeout(Promise.java:28)
>         - locked <0x98e62928> (a org.jgroups.util.Promise)
>         at org.jgroups.protocols.pbcast.FLUSH.startFlush(FLUSH.java:207)
> 
> It is doing this multiple times which leads to
> a long total boot time:
> 
> 15:37:50,931 INFO  [ServerImpl] JBoss (Microcontainer) [5.0.0.Beta3
> (build: SVNTag=JBoss_5_0_0_Beta3 date=200711081439)] Started in
> 5m:14s:880ms
> 
> Is there a problem with "flush" when there are no other members in the
> cluster?
> 
> e.g.
> 
> 15:34:14,466 WARN  [JChannelFactory] Flush failed at 10.11.14.31:32795
> DefaultPartition-JMS-CTRL
> 15:34:19,527 INFO  [STDOUT]                  
> -------------------------------------------------------
> GMS: address is 10.11.14.31:32796
> -------------------------------------------------------
> 15:34:43,545 WARN  [JChannelFactory] Flush failed at 10.11.14.31:32796
> DefaultPartition-JMS-DATA
> 

-- 
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberry at redhat.com



More information about the jboss-development mailing list