.
Vladimir's written a unit test that shows the combination of the way JBM
uses a shared channel and the other AS services use it leads to the
behavior we're seeing.
The JBM guys fixing JBMESSAGING-1120 will make the problem go away. I've
asked Vladimir look into why the JBM usage caused Channel.getState() to
incorrectly return true, which is what led to the hang.
Brian Stansberry wrote:
Adrian wrote:
> On Tue, 2007-10-23 at 09:58 -0500, Brian Stansberry wrote:
>> Adrian wrote:
>>> When I look at this code, it looks like it is doing an infinte wait,
>>> shouldn't this have a timeout?
>>>
>>>
http://fisheye.jboss.org/browse/JBossCache/core/tags/2.0.0.GA/src/org/jbo...
>>>
>>>
>> Arguably yes, although only as a second line of defense against a
>> bug. Which there appears to be here. You only get to the wait call if
>> channel.getState() returns true (line 1270 in the above linked rev.)
>> The getState call has a timeout. In your case it should *not* have
>> returned true, as your node is the only cluster member.
>>
>> Vladimir/Manik, does anything here ring a bell? E.g. was there any
>> change in the Channel.getState() behavior in JG 2.5.0? Adrian's
>> seeing this in AS trunk, which is using JBC 2.0.0.GA and JG 2.5.0.GA.
>>
>
> Make sure you do an rm -rf thirdparty or use a clean checkout
> so you can be sure you have all the latest jars.
>
Yeah, I did.
>> Adrian, has some kind of parallelization of deployment been introduced?
>
> No. But each service is free to do some startup asynch by forking
> a thread.
>
>> Your logging seems to show JBoss Messaging deployment proceeding in
>> parallel with deployment of cluster-beans.xml. When I run "./run.sh
>> -b localhost -c all" I don't see this problem (and the JBM logging
>> occurs after the cluster-beans.xml stuff.)
>
> So you think this could be a race? I don't see anything in the
> DEBUG messages (or the thread dump) that suggests both are
> active at the same time.
>
> Here's the DEBUG from the last JBoss Messaging logging:
>
Thanks for this. I'd misread your earlier log; saw two sets of ASCII
art showing channels starting, and thought one was JBM deployment, one
from JBC. Actually, they were both from JBM, which creates 2 channels.
The JBC that's hanging actually shares the 2nd channel with JBM. This
is likely where the problem is; some defect in the state transfer
handling with shared channels. When I launch AS, the JBC deployment is
happening before JBM; for whatever reason for you it's JBM then JBC.
I'll add a depends or something to force JBM to go first on my setup and
I bet I'll see the same thing you do.
> 2007-10-23 14:28:34,335 DEBUG
> [org.jboss.jms.server.connectionfactory.ConnectionFactory] Started
> jboss.messaging.connectionfactory:service=ConnectionFactory
> 2007-10-23 14:28:34,336 DEBUG
> [org.jboss.system.microcontainer.jmx.ServiceControllerLifecycleCallback]
> Registered MBean jboss.jgroups:service=MultiplexerChannelFactory
> 2007-10-23 14:28:34,337 DEBUG
> [org.jboss.ha.framework.server.JChannelFactory] Ignoring create call;
> current state is Started
> 2007-10-23 14:28:34,337 DEBUG
> [org.jboss.ha.framework.server.JChannelFactory] Ignoring start call;
> current state is Started
> 2007-10-23 14:28:34,362 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
> Registered in JMX under jboss.cache:service=EJB3SFSBClusteredCache
> 2007-10-23 14:28:34,362 DEBUG [org.jboss.system.ServiceController]
> starting service jboss.cache:service=EJB3SFSBClusteredCache
> 2007-10-23 14:28:34,362 DEBUG [org.jboss.system.ServiceController]
> Registering service jboss.cache:service=EJB3SFSBClusteredCache
> 2007-10-23 14:28:34,373 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
> Registered in JMX under
> jboss.cache:partitionName=DefaultPartition,service=HAPartitionCache
> 2007-10-23 14:28:34,374 DEBUG [org.jboss.system.ServiceController]
> starting service
> jboss.cache:service=HAPartitionCache,partitionName=DefaultPartition
> 2007-10-23 14:28:34,374 DEBUG [org.jboss.system.ServiceController]
> Registering service
> jboss.cache:service=HAPartitionCache,partitionName=DefaultPartition
> 2007-10-23 14:28:34,384 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
> Registered in JMX under jboss.cache:service=ClusteredSSOCache
> 2007-10-23 14:28:34,384 DEBUG [org.jboss.system.ServiceController]
> starting service jboss.cache:service=ClusteredSSOCache
> 2007-10-23 14:28:34,384 DEBUG [org.jboss.system.ServiceController]
> Registering service jboss.cache:service=ClusteredSSOCache
> 2007-10-23 14:28:34,390 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
> Constructing Cache
> 2007-10-23 14:28:34,797 DEBUG
> [org.jboss.cache.factories.InterceptorChainFactory] interceptor chain
> is:
> class org.jboss.cache.interceptors.CallInterceptor
> class org.jboss.cache.interceptors.UnlockInterceptor
> class org.jboss.cache.interceptors.PessimisticLockInterceptor
> class org.jboss.cache.interceptors.ReplicationInterceptor
> class org.jboss.cache.interceptors.NotificationInterceptor
> class org.jboss.cache.interceptors.TxInterceptor
> class org.jboss.cache.interceptors.CacheMgmtInterceptor
> class org.jboss.cache.interceptors.InvocationContextInterceptor
> 2007-10-23 14:28:34,806 DEBUG
> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Not using
> an EvictionPolicy
> 2007-10-23 14:28:34,815 DEBUG
> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] cache
> mode is REPL_ASYNC
> 2007-10-23 14:28:34,815 DEBUG
> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Created
> Multiplexer Channel for cache cluster DefaultPartition-ClusteredSS
> OCache using stack udp
> 2007-10-23 14:28:34,844 DEBUG
> [org.jboss.cache.marshall.VersionAwareMarshaller] Initialised with
> version 2.0.0 and versionInt 20
> 2007-10-23 14:28:34,844 DEBUG
> [org.jboss.cache.marshall.VersionAwareMarshaller] Using default
> marshaller class org.jboss.cache.marshall.CacheMarshaller200
> 2007-10-23 14:28:34,844 DEBUG
> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Using
> marshaller org.jboss.cache.marshall.VersionAwareMarshaller
> 2007-10-23 14:28:34,845 DEBUG
> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Block
> received at 127.0.0.1:32787
> 2007-10-23 14:28:34,899 DEBUG
> [org.jboss.cache.interceptors.InvocationContextInterceptor] Invoked on
> cache instance [127.0.0.1:32787] and InvocationContext [Invocation
> Context{methodCall=MethodName: _block; MethodIdInteger: 43; Args:
> ()transaction=null, globalTransaction=null,
> optionOverrides=Option{failSilently=false, cacheModeLocal
> =false, dataVersion=null, suppressLocking=false,
> forceDataGravitation=false, skipDataGravitation=false},
> originLocal=true, txHasMods=false}]
> 2007-10-23 14:28:34,899 DEBUG
> [org.jboss.cache.interceptors.InvocationContextInterceptor] Setting up
> transactional context.
> 2007-10-23 14:28:34,899 DEBUG
> [org.jboss.cache.interceptors.InvocationContextInterceptor] Setting tx
> as null and gtx as null
> 2007-10-23 14:28:34,900 DEBUG
> [org.jboss.cache.interceptors.ReplicationInterceptor]
> isLocalCommitOrRollback? false; gtx = null
> 2007-10-23 14:28:34,900 DEBUG
> [org.jboss.cache.interceptors.PessimisticLockInterceptor]
> PessimisticLockInterceptor invoked for method MethodName: _block;
> MethodIdInteg
> er: 43; Args: ()
> 2007-10-23 14:28:34,900 DEBUG
> [org.jboss.cache.interceptors.PessimisticLockInterceptor] bypassed
> locking as method _block() doesn't require locking
> 2007-10-23 14:28:34,900 DEBUG
> [org.jboss.cache.interceptors.CallInterceptor] Passing up method
> MethodName: _block; MethodIdInteger: 43; Args: () so it gets invoked
> on cache.
> 2007-10-23 14:28:34,900 DEBUG
> [org.jboss.cache.interceptors.UnlockInterceptor] Attempting to release
> locks on current thread. Lock table is {}
> 2007-10-23 14:28:34,900 DEBUG
> [org.jboss.cache.interceptors.ReplicationInterceptor] Non-tx and non
> crud meth
> 2007-10-23 14:28:34,900 DEBUG
> [org.jboss.cache.interceptors.InvocationContextInterceptor] Resetting
> invocation-scope options
> 2007-10-23 14:28:34,900 DEBUG
> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Block
> processed at 127.0.0.1:32787
> 2007-10-23 14:28:34,902 INFO
> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache]
> viewAccepted(): [127.0.0.1:32787|0] [127.0.0.1:32787]
> 2007-10-23 14:28:34,902 DEBUG
> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] UnBlock
> received at 127.0.0.1:32787
> 2007-10-23 14:28:34,903 DEBUG
> [org.jboss.cache.interceptors.InvocationContextInterceptor] Invoked on
> cache instance [127.0.0.1:32787] and InvocationContext [Invocation
> Context{methodCall=MethodName: _unblock; MethodIdInteger: 44; Args:
> ()transaction=null, globalTransaction=null,
> optionOverrides=Option{failSilently=false, cacheModeLoc
> al=false, dataVersion=null, suppressLocking=false,
> forceDataGravitation=false, skipDataGravitation=false},
> originLocal=true, txHasMods=false}]
> 2007-10-23 14:28:34,903 DEBUG
> [org.jboss.cache.interceptors.InvocationContextInterceptor] Setting up
> transactional context.
> 2007-10-23 14:28:34,903 DEBUG
> [org.jboss.cache.interceptors.InvocationContextInterceptor] Setting tx
> as null and gtx as null
> 2007-10-23 14:28:34,903 DEBUG
> [org.jboss.cache.interceptors.ReplicationInterceptor]
> isLocalCommitOrRollback? false; gtx = null
> 2007-10-23 14:28:34,903 DEBUG
> [org.jboss.cache.interceptors.PessimisticLockInterceptor]
> PessimisticLockInterceptor invoked for method MethodName: _unblock;
> MethodIdInt
> eger: 44; Args: ()
> 2007-10-23 14:28:34,903 DEBUG
> [org.jboss.cache.interceptors.PessimisticLockInterceptor] bypassed
> locking as method _unblock() doesn't require locking
> 2007-10-23 14:28:34,903 DEBUG
> [org.jboss.cache.interceptors.CallInterceptor] Passing up method
> MethodName: _unblock; MethodIdInteger: 44; Args: () so it gets invoked o
> n cache.
> 2007-10-23 14:28:34,903 DEBUG
> [org.jboss.cache.interceptors.UnlockInterceptor] Attempting to release
> locks on current thread. Lock table is {}
> 2007-10-23 14:28:34,903 DEBUG
> [org.jboss.cache.interceptors.ReplicationInterceptor] Non-tx and non
> crud meth
> 2007-10-23 14:28:34,903 DEBUG
> [org.jboss.cache.interceptors.InvocationContextInterceptor] Resetting
> invocation-scope options
> 2007-10-23 14:28:34,903 DEBUG
> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] UnBlock
> processed at 127.0.0.1:32787
> 2007-10-23 14:28:34,903 INFO
> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] CacheImpl
> local address is 127.0.0.1:32787
>
> Got bored waiting here. ;-)
>
> 2007-10-23 14:32:30,094 INFO
> [org.jboss.bootstrap.microcontainer.ServerImpl] JBoss SHUTDOWN
>
>
> There's no JBoss Cache logging before this, only some generic
> clustering logging (I've included where jms is using clustering
> but otherwise stripped its logging):
>
>
> 2007-10-23 14:28:27,360 DEBUG [org.jboss.system.ServiceController]
> starting service jboss.jgroups:service=MultiplexerChannelFactory
> 2007-10-23 14:28:27,360 DEBUG [org.jboss.system.ServiceController]
> Registering service jboss.jgroups:service=MultiplexerChannelFactory
> 2007-10-23 14:28:27,364 DEBUG
> [org.jboss.ha.framework.server.JChannelFactory] Ignoring create call;
> current state is Stopped
>
> ...
>
> 2007-10-23 14:28:27,367 DEBUG
> [org.jboss.ha.framework.server.JChannelFactory] Starting JChannelFactory
> 2007-10-23 14:28:27,367 DEBUG
> [org.jboss.ha.framework.server.JChannelFactory] Started JChannelFactory
> 2007-10-23 14:28:27,368 DEBUG
> [org.jboss.messaging.core.jmx.MessagingPostOfficeService] Starting
> jboss.messaging:service=PostOffice
> 2007-10-23 14:28:27,378 DEBUG
> [org.jboss.messaging.core.jmx.MessagingPostOfficeService]
> org.jboss.messaging.core.jmx.MessagingPostOfficeService@14b7042 uses
> Multiplexe
> rJChannelFactory
>
> 2007-10-23 14:28:28,110 DEBUG
> [org.jboss.ha.framework.server.JChannelFactory] Passing unique node id
> 127.0.0.1:1099 to the channel as additional data
> 2007-10-23 14:28:28,121 DEBUG
> [org.jboss.ha.framework.server.JChannelFactory] Passing unique node id
> 127.0.0.1:1099 to the channel as additional data
> 2007-10-23 14:28:28,214 INFO [STDOUT]
> -------------------------------------------------------
> GMS: address is 127.0.0.1:32786
> -------------------------------------------------------
> 2007-10-23 14:28:30,306 DEBUG
> [org.jboss.messaging.core.impl.postoffice.GroupMember]
> org.jboss.messaging.core.impl.postoffice.GroupMember
> $ControlMembershipListener@334
> cb9 got new view [127.0.0.1:32786|0] [127.0.0.1:32786], old view is null
> 2007-10-23 14:28:30,386 DEBUG
> [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice]
> org.jboss.messaging.core.impl.postoffice.MessagingPostOffice@1c86116:
> 127.
> 0.0.1:32786 joined
> 2007-10-23 14:28:30,386 DEBUG
> [org.jboss.messaging.core.impl.postoffice.GroupMember] First view
> arrived
> 2007-10-23 14:28:30,387 DEBUG
> [org.jboss.messaging.core.impl.postoffice.GroupMember] We are the first
> member of the group so no need to wait for state
> 2007-10-23 14:28:30,391 INFO [STDOUT]
> -------------------------------------------------------
> GMS: address is 127.0.0.1:32787
> -------------------------------------------------------
> 2007-10-23 14:28:32,412 DEBUG
> [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice] Updated
> failover map:
>
> 0->0
>
> 2007-10-23 14:28:32,427 DEBUG
> [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice]
> org.jboss.messaging.core.impl.postoffice.MessagingPostOffice@1c86116
> puts replicant locally: JVMID->6-fnu8e48f-1-yvy7e48f-0gpx6x-d24o4c5
>
> ...
>
> 2007-10-23 14:28:32,872 DEBUG
> [org.jboss.jms.server.endpoint.ServerConnectionFactoryEndpoint]
> updateClusteredClients being called!!! clientFactoriesToUpdate.size = 0
>
> ,,,
>
> 2007-10-23 14:28:34,335 DEBUG
> [org.jboss.jms.server.connectionfactory.ConnectionFactory] Started
> jboss.messaging.connectionfactory:service=ConnectionFactory
> 2007-10-23 14:28:34,336 DEBUG
> [org.jboss.system.microcontainer.jmx.ServiceControllerLifecycleCallback]
> Registered MBean jboss.jgroups:service=MultiplexerChannelFactory
> 2007-10-23 14:28:34,337 DEBUG
> [org.jboss.ha.framework.server.JChannelFactory] Ignoring create call;
> current state is Started
> 2007-10-23 14:28:34,337 DEBUG
> [org.jboss.ha.framework.server.JChannelFactory] Ignoring start call;
> current state is Started
>
--
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberry(a)redhat.com