Are you sure getState() isn't being called? I need to check, but JBM
should always be calling getState() after joining a channel.
(Also.. the erroneous code comes from here
where
Brian explained to us how to use a mux channel, I think it was just copy
and pasted ;) )
Brian Stansberry wrote:
I've added a workaround to the AS[1] that should allow it to
start
properly until JBMESSAGING-1120 is fixed.
[1]
http://jira.jboss.com/jira/browse/JBAS-4909
Brian Stansberry wrote:
> The cause of this is
>
http://jira.jboss.com/jira/browse/JBMESSAGING-1120. Vladimir's
> written a unit test that shows the combination of the way JBM uses a
> shared channel and the other AS services use it leads to the behavior
> we're seeing.
>
> The JBM guys fixing JBMESSAGING-1120 will make the problem go away.
> I've asked Vladimir look into why the JBM usage caused
> Channel.getState() to incorrectly return true, which is what led to
> the hang.
>
> Brian Stansberry wrote:
>> Adrian wrote:
>>> On Tue, 2007-10-23 at 09:58 -0500, Brian Stansberry wrote:
>>>> Adrian wrote:
>>>>> When I look at this code, it looks like it is doing an infinte wait,
>>>>> shouldn't this have a timeout?
>>>>>
>>>>>
http://fisheye.jboss.org/browse/JBossCache/core/tags/2.0.0.GA/src/org/jbo...
>>>>>
>>>>>
>>>> Arguably yes, although only as a second line of defense against a
>>>> bug. Which there appears to be here. You only get to the wait call
>>>> if channel.getState() returns true (line 1270 in the above linked
>>>> rev.) The getState call has a timeout. In your case it should
>>>> *not* have returned true, as your node is the only cluster member.
>>>>
>>>> Vladimir/Manik, does anything here ring a bell? E.g. was there
>>>> any change in the Channel.getState() behavior in JG 2.5.0?
>>>> Adrian's seeing this in AS trunk, which is using JBC 2.0.0.GA and
>>>> JG 2.5.0.GA.
>>>>
>>>
>>> Make sure you do an rm -rf thirdparty or use a clean checkout
>>> so you can be sure you have all the latest jars.
>>>
>>
>> Yeah, I did.
>>
>>>> Adrian, has some kind of parallelization of deployment been
>>>> introduced?
>>>
>>> No. But each service is free to do some startup asynch by forking
>>> a thread.
>>>
>>>> Your logging seems to show JBoss Messaging deployment proceeding
>>>> in parallel with deployment of cluster-beans.xml. When I run
>>>> "./run.sh -b localhost -c all" I don't see this problem
(and the
>>>> JBM logging occurs after the cluster-beans.xml stuff.)
>>>
>>> So you think this could be a race? I don't see anything in the
>>> DEBUG messages (or the thread dump) that suggests both are
>>> active at the same time.
>>>
>>> Here's the DEBUG from the last JBoss Messaging logging:
>>>
>>
>> Thanks for this. I'd misread your earlier log; saw two sets of
>> ASCII art showing channels starting, and thought one was JBM
>> deployment, one from JBC. Actually, they were both from JBM, which
>> creates 2 channels.
>>
>> The JBC that's hanging actually shares the 2nd channel with JBM.
>> This is likely where the problem is; some defect in the state
>> transfer handling with shared channels. When I launch AS, the JBC
>> deployment is happening before JBM; for whatever reason for you it's
>> JBM then JBC. I'll add a depends or something to force JBM to go
>> first on my setup and I bet I'll see the same thing you do.
>>
>>> 2007-10-23 14:28:34,335 DEBUG
>>> [org.jboss.jms.server.connectionfactory.ConnectionFactory] Started
>>> jboss.messaging.connectionfactory:service=ConnectionFactory
>>> 2007-10-23 14:28:34,336 DEBUG
>>> [org.jboss.system.microcontainer.jmx.ServiceControllerLifecycleCallback]
>>>
>>> Registered MBean jboss.jgroups:service=MultiplexerChannelFactory
>>> 2007-10-23 14:28:34,337 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring create call;
>>> current state is Started
>>> 2007-10-23 14:28:34,337 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring start call;
>>> current state is Started
>>> 2007-10-23 14:28:34,362 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
>>> Registered in JMX under jboss.cache:service=EJB3SFSBClusteredCache
>>> 2007-10-23 14:28:34,362 DEBUG [org.jboss.system.ServiceController]
>>> starting service jboss.cache:service=EJB3SFSBClusteredCache
>>> 2007-10-23 14:28:34,362 DEBUG [org.jboss.system.ServiceController]
>>> Registering service jboss.cache:service=EJB3SFSBClusteredCache
>>> 2007-10-23 14:28:34,373 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
>>> Registered in JMX under
>>> jboss.cache:partitionName=DefaultPartition,service=HAPartitionCache
>>> 2007-10-23 14:28:34,374 DEBUG [org.jboss.system.ServiceController]
>>> starting service
>>> jboss.cache:service=HAPartitionCache,partitionName=DefaultPartition
>>> 2007-10-23 14:28:34,374 DEBUG [org.jboss.system.ServiceController]
>>> Registering service
>>> jboss.cache:service=HAPartitionCache,partitionName=DefaultPartition
>>> 2007-10-23 14:28:34,384 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
>>> Registered in JMX under jboss.cache:service=ClusteredSSOCache
>>> 2007-10-23 14:28:34,384 DEBUG [org.jboss.system.ServiceController]
>>> starting service jboss.cache:service=ClusteredSSOCache
>>> 2007-10-23 14:28:34,384 DEBUG [org.jboss.system.ServiceController]
>>> Registering service jboss.cache:service=ClusteredSSOCache
>>> 2007-10-23 14:28:34,390 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
>>> Constructing Cache
>>> 2007-10-23 14:28:34,797 DEBUG
>>> [org.jboss.cache.factories.InterceptorChainFactory] interceptor chain
>>> is:
>>> class org.jboss.cache.interceptors.CallInterceptor
>>> class org.jboss.cache.interceptors.UnlockInterceptor
>>> class org.jboss.cache.interceptors.PessimisticLockInterceptor
>>> class org.jboss.cache.interceptors.ReplicationInterceptor
>>> class org.jboss.cache.interceptors.NotificationInterceptor
>>> class org.jboss.cache.interceptors.TxInterceptor
>>> class org.jboss.cache.interceptors.CacheMgmtInterceptor
>>> class org.jboss.cache.interceptors.InvocationContextInterceptor
>>> 2007-10-23 14:28:34,806 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Not
>>> using
>>> an EvictionPolicy
>>> 2007-10-23 14:28:34,815 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] cache
>>> mode is REPL_ASYNC
>>> 2007-10-23 14:28:34,815 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Created
>>> Multiplexer Channel for cache cluster DefaultPartition-ClusteredSS
>>> OCache using stack udp
>>> 2007-10-23 14:28:34,844 DEBUG
>>> [org.jboss.cache.marshall.VersionAwareMarshaller] Initialised with
>>> version 2.0.0 and versionInt 20
>>> 2007-10-23 14:28:34,844 DEBUG
>>> [org.jboss.cache.marshall.VersionAwareMarshaller] Using default
>>> marshaller class org.jboss.cache.marshall.CacheMarshaller200
>>> 2007-10-23 14:28:34,844 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Using
>>> marshaller org.jboss.cache.marshall.VersionAwareMarshaller
>>> 2007-10-23 14:28:34,845 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Block
>>> received at 127.0.0.1:32787
>>> 2007-10-23 14:28:34,899 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Invoked on
>>> cache instance [127.0.0.1:32787] and InvocationContext [Invocation
>>> Context{methodCall=MethodName: _block; MethodIdInteger: 43; Args:
>>> ()transaction=null, globalTransaction=null,
>>> optionOverrides=Option{failSilently=false, cacheModeLocal
>>> =false, dataVersion=null, suppressLocking=false,
>>> forceDataGravitation=false, skipDataGravitation=false},
>>> originLocal=true, txHasMods=false}]
>>> 2007-10-23 14:28:34,899 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Setting up
>>> transactional context.
>>> 2007-10-23 14:28:34,899 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Setting tx
>>> as null and gtx as null
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.ReplicationInterceptor]
>>> isLocalCommitOrRollback? false; gtx = null
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.PessimisticLockInterceptor]
>>> PessimisticLockInterceptor invoked for method MethodName: _block;
>>> MethodIdInteg
>>> er: 43; Args: ()
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.PessimisticLockInterceptor] bypassed
>>> locking as method _block() doesn't require locking
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.CallInterceptor] Passing up method
>>> MethodName: _block; MethodIdInteger: 43; Args: () so it gets
>>> invoked on cache.
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.UnlockInterceptor] Attempting to release
>>> locks on current thread. Lock table is {}
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.ReplicationInterceptor] Non-tx and non
>>> crud meth
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Resetting
>>> invocation-scope options
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Block
>>> processed at 127.0.0.1:32787
>>> 2007-10-23 14:28:34,902 INFO
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache]
>>> viewAccepted(): [127.0.0.1:32787|0] [127.0.0.1:32787]
>>> 2007-10-23 14:28:34,902 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] UnBlock
>>> received at 127.0.0.1:32787
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Invoked on
>>> cache instance [127.0.0.1:32787] and InvocationContext [Invocation
>>> Context{methodCall=MethodName: _unblock; MethodIdInteger: 44; Args:
>>> ()transaction=null, globalTransaction=null,
>>> optionOverrides=Option{failSilently=false, cacheModeLoc
>>> al=false, dataVersion=null, suppressLocking=false,
>>> forceDataGravitation=false, skipDataGravitation=false},
>>> originLocal=true, txHasMods=false}]
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Setting up
>>> transactional context.
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Setting tx
>>> as null and gtx as null
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.ReplicationInterceptor]
>>> isLocalCommitOrRollback? false; gtx = null
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.PessimisticLockInterceptor]
>>> PessimisticLockInterceptor invoked for method MethodName: _unblock;
>>> MethodIdInt
>>> eger: 44; Args: ()
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.PessimisticLockInterceptor] bypassed
>>> locking as method _unblock() doesn't require locking
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.CallInterceptor] Passing up method
>>> MethodName: _unblock; MethodIdInteger: 44; Args: () so it gets
>>> invoked o
>>> n cache.
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.UnlockInterceptor] Attempting to release
>>> locks on current thread. Lock table is {}
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.ReplicationInterceptor] Non-tx and non
>>> crud meth
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Resetting
>>> invocation-scope options
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] UnBlock
>>> processed at 127.0.0.1:32787
>>> 2007-10-23 14:28:34,903 INFO
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache]
>>> CacheImpl
>>> local address is 127.0.0.1:32787
>>>
>>> Got bored waiting here. ;-)
>>>
>>> 2007-10-23 14:32:30,094 INFO
>>> [org.jboss.bootstrap.microcontainer.ServerImpl] JBoss SHUTDOWN
>>>
>>>
>>> There's no JBoss Cache logging before this, only some generic
>>> clustering logging (I've included where jms is using clustering
>>> but otherwise stripped its logging):
>>>
>>>
>>> 2007-10-23 14:28:27,360 DEBUG [org.jboss.system.ServiceController]
>>> starting service jboss.jgroups:service=MultiplexerChannelFactory
>>> 2007-10-23 14:28:27,360 DEBUG [org.jboss.system.ServiceController]
>>> Registering service jboss.jgroups:service=MultiplexerChannelFactory
>>> 2007-10-23 14:28:27,364 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring create call;
>>> current state is Stopped
>>>
>>> ...
>>>
>>> 2007-10-23 14:28:27,367 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Starting
>>> JChannelFactory
>>> 2007-10-23 14:28:27,367 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Started
>>> JChannelFactory
>>> 2007-10-23 14:28:27,368 DEBUG
>>> [org.jboss.messaging.core.jmx.MessagingPostOfficeService] Starting
>>> jboss.messaging:service=PostOffice
>>> 2007-10-23 14:28:27,378 DEBUG
>>> [org.jboss.messaging.core.jmx.MessagingPostOfficeService]
>>> org.jboss.messaging.core.jmx.MessagingPostOfficeService@14b7042 uses
>>> Multiplexe
>>> rJChannelFactory
>>>
>>> 2007-10-23 14:28:28,110 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Passing unique node id
>>> 127.0.0.1:1099 to the channel as additional data
>>> 2007-10-23 14:28:28,121 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Passing unique node id
>>> 127.0.0.1:1099 to the channel as additional data
>>> 2007-10-23 14:28:28,214 INFO [STDOUT]
>>> -------------------------------------------------------
>>> GMS: address is 127.0.0.1:32786
>>> -------------------------------------------------------
>>> 2007-10-23 14:28:30,306 DEBUG
>>> [org.jboss.messaging.core.impl.postoffice.GroupMember]
>>> org.jboss.messaging.core.impl.postoffice.GroupMember
>>> $ControlMembershipListener@334
>>> cb9 got new view [127.0.0.1:32786|0] [127.0.0.1:32786], old view is
>>> null
>>> 2007-10-23 14:28:30,386 DEBUG
>>> [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice]
>>> org.jboss.messaging.core.impl.postoffice.MessagingPostOffice@1c86116:
>>> 127.
>>> 0.0.1:32786 joined
>>> 2007-10-23 14:28:30,386 DEBUG
>>> [org.jboss.messaging.core.impl.postoffice.GroupMember] First view
>>> arrived
>>> 2007-10-23 14:28:30,387 DEBUG
>>> [org.jboss.messaging.core.impl.postoffice.GroupMember] We are the
>>> first
>>> member of the group so no need to wait for state
>>> 2007-10-23 14:28:30,391 INFO [STDOUT]
>>> -------------------------------------------------------
>>> GMS: address is 127.0.0.1:32787
>>> -------------------------------------------------------
>>> 2007-10-23 14:28:32,412 DEBUG
>>> [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice] Updated
>>> failover map:
>>>
>>> 0->0
>>>
>>> 2007-10-23 14:28:32,427 DEBUG
>>> [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice]
>>> org.jboss.messaging.core.impl.postoffice.MessagingPostOffice@1c86116
>>> puts replicant locally: JVMID->6-fnu8e48f-1-yvy7e48f-0gpx6x-d24o4c5
>>>
>>> ...
>>>
>>> 2007-10-23 14:28:32,872 DEBUG
>>> [org.jboss.jms.server.endpoint.ServerConnectionFactoryEndpoint]
>>> updateClusteredClients being called!!! clientFactoriesToUpdate.size
>>> = 0
>>>
>>> ,,,
>>>
>>> 2007-10-23 14:28:34,335 DEBUG
>>> [org.jboss.jms.server.connectionfactory.ConnectionFactory] Started
>>> jboss.messaging.connectionfactory:service=ConnectionFactory
>>> 2007-10-23 14:28:34,336 DEBUG
>>> [org.jboss.system.microcontainer.jmx.ServiceControllerLifecycleCallback]
>>>
>>> Registered MBean jboss.jgroups:service=MultiplexerChannelFactory
>>> 2007-10-23 14:28:34,337 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring create call;
>>> current state is Started
>>> 2007-10-23 14:28:34,337 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring start call;
>>> current state is Started
>>>
>>
>