Aaah...you're talking about the data channel, you're right: we don't
call getState() on that, we only use the control channel for state
management :)
Tim Fox wrote:
Are you sure getState() isn't being called? I need to check, but
JBM
should always be calling getState() after joining a channel.
(Also.. the erroneous code comes from here
http://www.jboss.com/index.html?module=bb&op=viewtopic&t=118300 where
Brian explained to us how to use a mux channel, I think it was just
copy and pasted ;) )
Brian Stansberry wrote:
> I've added a workaround to the AS[1] that should allow it to start
> properly until JBMESSAGING-1120 is fixed.
>
> [1]
http://jira.jboss.com/jira/browse/JBAS-4909
>
> Brian Stansberry wrote:
>> The cause of this is
>>
http://jira.jboss.com/jira/browse/JBMESSAGING-1120. Vladimir's
>> written a unit test that shows the combination of the way JBM uses a
>> shared channel and the other AS services use it leads to the
>> behavior we're seeing.
>>
>> The JBM guys fixing JBMESSAGING-1120 will make the problem go away.
>> I've asked Vladimir look into why the JBM usage caused
>> Channel.getState() to incorrectly return true, which is what led to
>> the hang.
>>
>> Brian Stansberry wrote:
>>> Adrian wrote:
>>>> On Tue, 2007-10-23 at 09:58 -0500, Brian Stansberry wrote:
>>>>> Adrian wrote:
>>>>>> When I look at this code, it looks like it is doing an infinte
>>>>>> wait,
>>>>>> shouldn't this have a timeout?
>>>>>>
>>>>>>
http://fisheye.jboss.org/browse/JBossCache/core/tags/2.0.0.GA/src/org/jbo...
>>>>>>
>>>>>>
>>>>> Arguably yes, although only as a second line of defense against a
>>>>> bug. Which there appears to be here. You only get to the wait
>>>>> call if channel.getState() returns true (line 1270 in the above
>>>>> linked rev.) The getState call has a timeout. In your case it
>>>>> should *not* have returned true, as your node is the only cluster
>>>>> member.
>>>>>
>>>>> Vladimir/Manik, does anything here ring a bell? E.g. was there
>>>>> any change in the Channel.getState() behavior in JG 2.5.0?
>>>>> Adrian's seeing this in AS trunk, which is using JBC 2.0.0.GA and
>>>>> JG 2.5.0.GA.
>>>>>
>>>>
>>>> Make sure you do an rm -rf thirdparty or use a clean checkout
>>>> so you can be sure you have all the latest jars.
>>>>
>>>
>>> Yeah, I did.
>>>
>>>>> Adrian, has some kind of parallelization of deployment been
>>>>> introduced?
>>>>
>>>> No. But each service is free to do some startup asynch by forking
>>>> a thread.
>>>>
>>>>> Your logging seems to show JBoss Messaging deployment proceeding
>>>>> in parallel with deployment of cluster-beans.xml. When I run
>>>>> "./run.sh -b localhost -c all" I don't see this problem
(and the
>>>>> JBM logging occurs after the cluster-beans.xml stuff.)
>>>>
>>>> So you think this could be a race? I don't see anything in the
>>>> DEBUG messages (or the thread dump) that suggests both are
>>>> active at the same time.
>>>>
>>>> Here's the DEBUG from the last JBoss Messaging logging:
>>>>
>>>
>>> Thanks for this. I'd misread your earlier log; saw two sets of
>>> ASCII art showing channels starting, and thought one was JBM
>>> deployment, one from JBC. Actually, they were both from JBM, which
>>> creates 2 channels.
>>>
>>> The JBC that's hanging actually shares the 2nd channel with JBM.
>>> This is likely where the problem is; some defect in the state
>>> transfer handling with shared channels. When I launch AS, the JBC
>>> deployment is happening before JBM; for whatever reason for you
>>> it's JBM then JBC. I'll add a depends or something to force JBM to
>>> go first on my setup and I bet I'll see the same thing you do.
>>>
>>>> 2007-10-23 14:28:34,335 DEBUG
>>>> [org.jboss.jms.server.connectionfactory.ConnectionFactory] Started
>>>> jboss.messaging.connectionfactory:service=ConnectionFactory
>>>> 2007-10-23 14:28:34,336 DEBUG
>>>> [org.jboss.system.microcontainer.jmx.ServiceControllerLifecycleCallback]
>>>>
>>>> Registered MBean jboss.jgroups:service=MultiplexerChannelFactory
>>>> 2007-10-23 14:28:34,337 DEBUG
>>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring create call;
>>>> current state is Started
>>>> 2007-10-23 14:28:34,337 DEBUG
>>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring start call;
>>>> current state is Started
>>>> 2007-10-23 14:28:34,362 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
>>>> Registered in JMX under jboss.cache:service=EJB3SFSBClusteredCache
>>>> 2007-10-23 14:28:34,362 DEBUG [org.jboss.system.ServiceController]
>>>> starting service jboss.cache:service=EJB3SFSBClusteredCache
>>>> 2007-10-23 14:28:34,362 DEBUG [org.jboss.system.ServiceController]
>>>> Registering service jboss.cache:service=EJB3SFSBClusteredCache
>>>> 2007-10-23 14:28:34,373 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
>>>> Registered in JMX under
>>>> jboss.cache:partitionName=DefaultPartition,service=HAPartitionCache
>>>> 2007-10-23 14:28:34,374 DEBUG [org.jboss.system.ServiceController]
>>>> starting service
>>>> jboss.cache:service=HAPartitionCache,partitionName=DefaultPartition
>>>> 2007-10-23 14:28:34,374 DEBUG [org.jboss.system.ServiceController]
>>>> Registering service
>>>> jboss.cache:service=HAPartitionCache,partitionName=DefaultPartition
>>>> 2007-10-23 14:28:34,384 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
>>>> Registered in JMX under jboss.cache:service=ClusteredSSOCache
>>>> 2007-10-23 14:28:34,384 DEBUG [org.jboss.system.ServiceController]
>>>> starting service jboss.cache:service=ClusteredSSOCache
>>>> 2007-10-23 14:28:34,384 DEBUG [org.jboss.system.ServiceController]
>>>> Registering service jboss.cache:service=ClusteredSSOCache
>>>> 2007-10-23 14:28:34,390 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
>>>> Constructing Cache
>>>> 2007-10-23 14:28:34,797 DEBUG
>>>> [org.jboss.cache.factories.InterceptorChainFactory] interceptor chain
>>>> is:
>>>> class org.jboss.cache.interceptors.CallInterceptor
>>>> class org.jboss.cache.interceptors.UnlockInterceptor
>>>> class org.jboss.cache.interceptors.PessimisticLockInterceptor
>>>> class org.jboss.cache.interceptors.ReplicationInterceptor
>>>> class org.jboss.cache.interceptors.NotificationInterceptor
>>>> class org.jboss.cache.interceptors.TxInterceptor
>>>> class org.jboss.cache.interceptors.CacheMgmtInterceptor
>>>> class org.jboss.cache.interceptors.InvocationContextInterceptor
>>>> 2007-10-23 14:28:34,806 DEBUG
>>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Not
>>>> using
>>>> an EvictionPolicy
>>>> 2007-10-23 14:28:34,815 DEBUG
>>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] cache
>>>> mode is REPL_ASYNC
>>>> 2007-10-23 14:28:34,815 DEBUG
>>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache]
>>>> Created
>>>> Multiplexer Channel for cache cluster DefaultPartition-ClusteredSS
>>>> OCache using stack udp
>>>> 2007-10-23 14:28:34,844 DEBUG
>>>> [org.jboss.cache.marshall.VersionAwareMarshaller] Initialised with
>>>> version 2.0.0 and versionInt 20
>>>> 2007-10-23 14:28:34,844 DEBUG
>>>> [org.jboss.cache.marshall.VersionAwareMarshaller] Using default
>>>> marshaller class org.jboss.cache.marshall.CacheMarshaller200
>>>> 2007-10-23 14:28:34,844 DEBUG
>>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Using
>>>> marshaller org.jboss.cache.marshall.VersionAwareMarshaller
>>>> 2007-10-23 14:28:34,845 DEBUG
>>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Block
>>>> received at 127.0.0.1:32787
>>>> 2007-10-23 14:28:34,899 DEBUG
>>>> [org.jboss.cache.interceptors.InvocationContextInterceptor]
>>>> Invoked on
>>>> cache instance [127.0.0.1:32787] and InvocationContext [Invocation
>>>> Context{methodCall=MethodName: _block; MethodIdInteger: 43; Args:
>>>> ()transaction=null, globalTransaction=null,
>>>> optionOverrides=Option{failSilently=false, cacheModeLocal
>>>> =false, dataVersion=null, suppressLocking=false,
>>>> forceDataGravitation=false, skipDataGravitation=false},
>>>> originLocal=true, txHasMods=false}]
>>>> 2007-10-23 14:28:34,899 DEBUG
>>>> [org.jboss.cache.interceptors.InvocationContextInterceptor]
>>>> Setting up
>>>> transactional context.
>>>> 2007-10-23 14:28:34,899 DEBUG
>>>> [org.jboss.cache.interceptors.InvocationContextInterceptor]
>>>> Setting tx
>>>> as null and gtx as null
>>>> 2007-10-23 14:28:34,900 DEBUG
>>>> [org.jboss.cache.interceptors.ReplicationInterceptor]
>>>> isLocalCommitOrRollback? false; gtx = null
>>>> 2007-10-23 14:28:34,900 DEBUG
>>>> [org.jboss.cache.interceptors.PessimisticLockInterceptor]
>>>> PessimisticLockInterceptor invoked for method MethodName: _block;
>>>> MethodIdInteg
>>>> er: 43; Args: ()
>>>> 2007-10-23 14:28:34,900 DEBUG
>>>> [org.jboss.cache.interceptors.PessimisticLockInterceptor] bypassed
>>>> locking as method _block() doesn't require locking
>>>> 2007-10-23 14:28:34,900 DEBUG
>>>> [org.jboss.cache.interceptors.CallInterceptor] Passing up method
>>>> MethodName: _block; MethodIdInteger: 43; Args: () so it gets
>>>> invoked on cache.
>>>> 2007-10-23 14:28:34,900 DEBUG
>>>> [org.jboss.cache.interceptors.UnlockInterceptor] Attempting to
>>>> release
>>>> locks on current thread. Lock table is {}
>>>> 2007-10-23 14:28:34,900 DEBUG
>>>> [org.jboss.cache.interceptors.ReplicationInterceptor] Non-tx and non
>>>> crud meth
>>>> 2007-10-23 14:28:34,900 DEBUG
>>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Resetting
>>>> invocation-scope options
>>>> 2007-10-23 14:28:34,900 DEBUG
>>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Block
>>>> processed at 127.0.0.1:32787
>>>> 2007-10-23 14:28:34,902 INFO
>>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache]
>>>> viewAccepted(): [127.0.0.1:32787|0] [127.0.0.1:32787]
>>>> 2007-10-23 14:28:34,902 DEBUG
>>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache]
>>>> UnBlock
>>>> received at 127.0.0.1:32787
>>>> 2007-10-23 14:28:34,903 DEBUG
>>>> [org.jboss.cache.interceptors.InvocationContextInterceptor]
>>>> Invoked on
>>>> cache instance [127.0.0.1:32787] and InvocationContext [Invocation
>>>> Context{methodCall=MethodName: _unblock; MethodIdInteger: 44; Args:
>>>> ()transaction=null, globalTransaction=null,
>>>> optionOverrides=Option{failSilently=false, cacheModeLoc
>>>> al=false, dataVersion=null, suppressLocking=false,
>>>> forceDataGravitation=false, skipDataGravitation=false},
>>>> originLocal=true, txHasMods=false}]
>>>> 2007-10-23 14:28:34,903 DEBUG
>>>> [org.jboss.cache.interceptors.InvocationContextInterceptor]
>>>> Setting up
>>>> transactional context.
>>>> 2007-10-23 14:28:34,903 DEBUG
>>>> [org.jboss.cache.interceptors.InvocationContextInterceptor]
>>>> Setting tx
>>>> as null and gtx as null
>>>> 2007-10-23 14:28:34,903 DEBUG
>>>> [org.jboss.cache.interceptors.ReplicationInterceptor]
>>>> isLocalCommitOrRollback? false; gtx = null
>>>> 2007-10-23 14:28:34,903 DEBUG
>>>> [org.jboss.cache.interceptors.PessimisticLockInterceptor]
>>>> PessimisticLockInterceptor invoked for method MethodName: _unblock;
>>>> MethodIdInt
>>>> eger: 44; Args: ()
>>>> 2007-10-23 14:28:34,903 DEBUG
>>>> [org.jboss.cache.interceptors.PessimisticLockInterceptor] bypassed
>>>> locking as method _unblock() doesn't require locking
>>>> 2007-10-23 14:28:34,903 DEBUG
>>>> [org.jboss.cache.interceptors.CallInterceptor] Passing up method
>>>> MethodName: _unblock; MethodIdInteger: 44; Args: () so it gets
>>>> invoked o
>>>> n cache.
>>>> 2007-10-23 14:28:34,903 DEBUG
>>>> [org.jboss.cache.interceptors.UnlockInterceptor] Attempting to
>>>> release
>>>> locks on current thread. Lock table is {}
>>>> 2007-10-23 14:28:34,903 DEBUG
>>>> [org.jboss.cache.interceptors.ReplicationInterceptor] Non-tx and non
>>>> crud meth
>>>> 2007-10-23 14:28:34,903 DEBUG
>>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Resetting
>>>> invocation-scope options
>>>> 2007-10-23 14:28:34,903 DEBUG
>>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache]
>>>> UnBlock
>>>> processed at 127.0.0.1:32787
>>>> 2007-10-23 14:28:34,903 INFO
>>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache]
>>>> CacheImpl
>>>> local address is 127.0.0.1:32787
>>>>
>>>> Got bored waiting here. ;-)
>>>>
>>>> 2007-10-23 14:32:30,094 INFO
>>>> [org.jboss.bootstrap.microcontainer.ServerImpl] JBoss SHUTDOWN
>>>>
>>>>
>>>> There's no JBoss Cache logging before this, only some generic
>>>> clustering logging (I've included where jms is using clustering
>>>> but otherwise stripped its logging):
>>>>
>>>>
>>>> 2007-10-23 14:28:27,360 DEBUG [org.jboss.system.ServiceController]
>>>> starting service jboss.jgroups:service=MultiplexerChannelFactory
>>>> 2007-10-23 14:28:27,360 DEBUG [org.jboss.system.ServiceController]
>>>> Registering service jboss.jgroups:service=MultiplexerChannelFactory
>>>> 2007-10-23 14:28:27,364 DEBUG
>>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring create call;
>>>> current state is Stopped
>>>>
>>>> ...
>>>>
>>>> 2007-10-23 14:28:27,367 DEBUG
>>>> [org.jboss.ha.framework.server.JChannelFactory] Starting
>>>> JChannelFactory
>>>> 2007-10-23 14:28:27,367 DEBUG
>>>> [org.jboss.ha.framework.server.JChannelFactory] Started
>>>> JChannelFactory
>>>> 2007-10-23 14:28:27,368 DEBUG
>>>> [org.jboss.messaging.core.jmx.MessagingPostOfficeService] Starting
>>>> jboss.messaging:service=PostOffice
>>>> 2007-10-23 14:28:27,378 DEBUG
>>>> [org.jboss.messaging.core.jmx.MessagingPostOfficeService]
>>>> org.jboss.messaging.core.jmx.MessagingPostOfficeService@14b7042 uses
>>>> Multiplexe
>>>> rJChannelFactory
>>>>
>>>> 2007-10-23 14:28:28,110 DEBUG
>>>> [org.jboss.ha.framework.server.JChannelFactory] Passing unique
>>>> node id
>>>> 127.0.0.1:1099 to the channel as additional data
>>>> 2007-10-23 14:28:28,121 DEBUG
>>>> [org.jboss.ha.framework.server.JChannelFactory] Passing unique
>>>> node id
>>>> 127.0.0.1:1099 to the channel as additional data
>>>> 2007-10-23 14:28:28,214 INFO [STDOUT]
>>>> -------------------------------------------------------
>>>> GMS: address is 127.0.0.1:32786
>>>> -------------------------------------------------------
>>>> 2007-10-23 14:28:30,306 DEBUG
>>>> [org.jboss.messaging.core.impl.postoffice.GroupMember]
>>>> org.jboss.messaging.core.impl.postoffice.GroupMember
>>>> $ControlMembershipListener@334
>>>> cb9 got new view [127.0.0.1:32786|0] [127.0.0.1:32786], old view
>>>> is null
>>>> 2007-10-23 14:28:30,386 DEBUG
>>>> [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice]
>>>> org.jboss.messaging.core.impl.postoffice.MessagingPostOffice@1c86116:
>>>> 127.
>>>> 0.0.1:32786 joined
>>>> 2007-10-23 14:28:30,386 DEBUG
>>>> [org.jboss.messaging.core.impl.postoffice.GroupMember] First view
>>>> arrived
>>>> 2007-10-23 14:28:30,387 DEBUG
>>>> [org.jboss.messaging.core.impl.postoffice.GroupMember] We are the
>>>> first
>>>> member of the group so no need to wait for state
>>>> 2007-10-23 14:28:30,391 INFO [STDOUT]
>>>> -------------------------------------------------------
>>>> GMS: address is 127.0.0.1:32787
>>>> -------------------------------------------------------
>>>> 2007-10-23 14:28:32,412 DEBUG
>>>> [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice]
>>>> Updated
>>>> failover map:
>>>>
>>>> 0->0
>>>>
>>>> 2007-10-23 14:28:32,427 DEBUG
>>>> [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice]
>>>> org.jboss.messaging.core.impl.postoffice.MessagingPostOffice@1c86116
>>>> puts replicant locally: JVMID->6-fnu8e48f-1-yvy7e48f-0gpx6x-d24o4c5
>>>>
>>>> ...
>>>>
>>>> 2007-10-23 14:28:32,872 DEBUG
>>>> [org.jboss.jms.server.endpoint.ServerConnectionFactoryEndpoint]
>>>> updateClusteredClients being called!!!
>>>> clientFactoriesToUpdate.size = 0
>>>>
>>>> ,,,
>>>>
>>>> 2007-10-23 14:28:34,335 DEBUG
>>>> [org.jboss.jms.server.connectionfactory.ConnectionFactory] Started
>>>> jboss.messaging.connectionfactory:service=ConnectionFactory
>>>> 2007-10-23 14:28:34,336 DEBUG
>>>> [org.jboss.system.microcontainer.jmx.ServiceControllerLifecycleCallback]
>>>>
>>>> Registered MBean jboss.jgroups:service=MultiplexerChannelFactory
>>>> 2007-10-23 14:28:34,337 DEBUG
>>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring create call;
>>>> current state is Started
>>>> 2007-10-23 14:28:34,337 DEBUG
>>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring start call;
>>>> current state is Started
>>>>
>>>
>>
>