[jboss-dev] Clustering bootstrap still taking a long time
Brian Stansberry
brian.stansberry at redhat.com
Thu Nov 8 10:22:34 EST 2007
Is your server bound to your VPN interface?
> GMS: address is 10.11.14.31:32796
This sounds very similar to what the dev90 machine was experiencing when
doing hudson testsuite runs. There, multicast wasn't working properly
on the bound interface. FLUSH counts on getting back a response to its
own multicast, which it would never get. So, every startFlush would
hang for timeout * retries.
This gives me a chance to bring up something with the JGroups guys:
The behavior we see when FLUSH fails seems not so great -- long delay
and then a WARN. At least in some cases, like initial channel
connection, it would be better if it the connect() just failed. The
channel is not usable.
Adrian wrote:
> The problem with the hang when booting the all configuration
> seems to be gone now, but I'm still seeing the clustering
> take a long time to bootstrap.
>
> It always seems to be stuck here:
>
> "main" prio=1 tid=0x80b31a10 nid=0x44a4 in Object.wait()
> [0x804d4000..0x804d6fb0]
> at java.lang.Object.wait(Native Method)
> - waiting on <0x98e62928> (a org.jgroups.util.Promise)
> at org.jgroups.util.Promise.doWait(Promise.java:104)
> at
> org.jgroups.util.Promise._getResultWithTimeout(Promise.java:60)
> at
> org.jgroups.util.Promise.getResultWithTimeout(Promise.java:28)
> - locked <0x98e62928> (a org.jgroups.util.Promise)
> at org.jgroups.protocols.pbcast.FLUSH.startFlush(FLUSH.java:207)
>
> It is doing this multiple times which leads to
> a long total boot time:
>
> 15:37:50,931 INFO [ServerImpl] JBoss (Microcontainer) [5.0.0.Beta3
> (build: SVNTag=JBoss_5_0_0_Beta3 date=200711081439)] Started in
> 5m:14s:880ms
>
> Is there a problem with "flush" when there are no other members in the
> cluster?
>
> e.g.
>
> 15:34:14,466 WARN [JChannelFactory] Flush failed at 10.11.14.31:32795
> DefaultPartition-JMS-CTRL
> 15:34:19,527 INFO [STDOUT]
> -------------------------------------------------------
> GMS: address is 10.11.14.31:32796
> -------------------------------------------------------
> 15:34:43,545 WARN [JChannelFactory] Flush failed at 10.11.14.31:32796
> DefaultPartition-JMS-DATA
>
--
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberry at redhat.com
More information about the jboss-development
mailing list