[jboss-dev] AS5 testsuite failures

Brian Stansberry brian.stansberry at redhat.com
Wed Oct 24 15:37:43 EDT 2007


I've added a workaround to the AS[1] that should allow it to start 
properly until JBMESSAGING-1120 is fixed.

[1] http://jira.jboss.com/jira/browse/JBAS-4909

Brian Stansberry wrote:
> The cause of this is http://jira.jboss.com/jira/browse/JBMESSAGING-1120. 
> Vladimir's written a unit test that shows the combination of the way JBM 
> uses a shared channel and the other AS services use it leads to the 
> behavior we're seeing.
> 
> The JBM guys fixing JBMESSAGING-1120 will make the problem go away. I've 
> asked Vladimir look into why the JBM usage caused Channel.getState() to 
> incorrectly return true, which is what led to the hang.
> 
> Brian Stansberry wrote:
>> Adrian wrote:
>>> On Tue, 2007-10-23 at 09:58 -0500, Brian Stansberry wrote:
>>>> Adrian wrote:
>>>>> When I look at this code, it looks like it is doing an infinte wait,
>>>>> shouldn't this have a timeout?
>>>>>
>>>>> http://fisheye.jboss.org/browse/JBossCache/core/tags/2.0.0.GA/src/org/jboss/cache/CacheImpl.java?r=4237 
>>>>>
>>>>>
>>>> Arguably yes, although only as a second line of defense against a 
>>>> bug. Which there appears to be here. You only get to the wait call 
>>>> if channel.getState() returns true (line 1270 in the above linked 
>>>> rev.) The getState call has a timeout.  In your case it should *not* 
>>>> have returned true, as your node is the only cluster member.
>>>>
>>>> Vladimir/Manik, does anything here ring a bell?  E.g. was there any 
>>>> change in the Channel.getState() behavior in JG 2.5.0?  Adrian's 
>>>> seeing this in AS trunk, which is using JBC 2.0.0.GA and JG 2.5.0.GA.
>>>>
>>>
>>> Make sure you do an rm -rf thirdparty or use a clean checkout
>>> so you can be sure you have all the latest jars.
>>>
>>
>> Yeah, I did.
>>
>>>> Adrian, has some kind of parallelization of deployment been introduced? 
>>>
>>> No. But each service is free to do some startup asynch by forking
>>> a thread.
>>>
>>>> Your logging seems to show JBoss Messaging deployment proceeding in 
>>>> parallel with deployment of cluster-beans.xml.  When I run "./run.sh 
>>>> -b localhost -c all" I don't see this problem (and the JBM logging 
>>>> occurs after the cluster-beans.xml stuff.)
>>>
>>> So you think this could be a race? I don't see anything in the
>>> DEBUG messages (or the thread dump) that suggests both are
>>> active at the same time.
>>>
>>> Here's the DEBUG from the last JBoss Messaging logging:
>>>
>>
>> Thanks for this.  I'd misread your earlier log; saw two sets of ASCII 
>> art showing channels starting, and thought one was JBM deployment, one 
>> from JBC.  Actually, they were both from JBM, which creates 2 channels.
>>
>> The JBC that's hanging actually shares the 2nd channel with JBM.  This 
>> is likely where the problem is; some defect in the state transfer 
>> handling with shared channels. When I launch AS, the JBC deployment is 
>> happening before JBM; for whatever reason for you it's JBM then JBC. 
>> I'll add a depends or something to force JBM to go first on my setup 
>> and I bet I'll see the same thing you do.
>>
>>> 2007-10-23 14:28:34,335 DEBUG
>>> [org.jboss.jms.server.connectionfactory.ConnectionFactory] Started
>>> jboss.messaging.connectionfactory:service=ConnectionFactory
>>> 2007-10-23 14:28:34,336 DEBUG
>>> [org.jboss.system.microcontainer.jmx.ServiceControllerLifecycleCallback]
>>> Registered MBean jboss.jgroups:service=MultiplexerChannelFactory
>>> 2007-10-23 14:28:34,337 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring create call;
>>> current state is Started
>>> 2007-10-23 14:28:34,337 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring start call;
>>> current state is Started
>>> 2007-10-23 14:28:34,362 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
>>> Registered in JMX under jboss.cache:service=EJB3SFSBClusteredCache
>>> 2007-10-23 14:28:34,362 DEBUG [org.jboss.system.ServiceController]
>>> starting service jboss.cache:service=EJB3SFSBClusteredCache
>>> 2007-10-23 14:28:34,362 DEBUG [org.jboss.system.ServiceController]
>>> Registering service jboss.cache:service=EJB3SFSBClusteredCache
>>> 2007-10-23 14:28:34,373 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
>>> Registered in JMX under
>>> jboss.cache:partitionName=DefaultPartition,service=HAPartitionCache
>>> 2007-10-23 14:28:34,374 DEBUG [org.jboss.system.ServiceController]
>>> starting service
>>> jboss.cache:service=HAPartitionCache,partitionName=DefaultPartition
>>> 2007-10-23 14:28:34,374 DEBUG [org.jboss.system.ServiceController]
>>> Registering service
>>> jboss.cache:service=HAPartitionCache,partitionName=DefaultPartition
>>> 2007-10-23 14:28:34,384 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
>>> Registered in JMX under jboss.cache:service=ClusteredSSOCache
>>> 2007-10-23 14:28:34,384 DEBUG [org.jboss.system.ServiceController]
>>> starting service jboss.cache:service=ClusteredSSOCache
>>> 2007-10-23 14:28:34,384 DEBUG [org.jboss.system.ServiceController]
>>> Registering service jboss.cache:service=ClusteredSSOCache
>>> 2007-10-23 14:28:34,390 DEBUG [org.jboss.cache.jmx.CacheJmxWrapper]
>>> Constructing Cache
>>> 2007-10-23 14:28:34,797 DEBUG
>>> [org.jboss.cache.factories.InterceptorChainFactory] interceptor chain
>>> is:
>>> class org.jboss.cache.interceptors.CallInterceptor
>>> class org.jboss.cache.interceptors.UnlockInterceptor
>>> class org.jboss.cache.interceptors.PessimisticLockInterceptor
>>> class org.jboss.cache.interceptors.ReplicationInterceptor
>>> class org.jboss.cache.interceptors.NotificationInterceptor
>>> class org.jboss.cache.interceptors.TxInterceptor
>>> class org.jboss.cache.interceptors.CacheMgmtInterceptor
>>> class org.jboss.cache.interceptors.InvocationContextInterceptor
>>> 2007-10-23 14:28:34,806 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Not using
>>> an EvictionPolicy
>>> 2007-10-23 14:28:34,815 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] cache
>>> mode is REPL_ASYNC
>>> 2007-10-23 14:28:34,815 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Created
>>> Multiplexer Channel for cache cluster DefaultPartition-ClusteredSS
>>> OCache using stack udp
>>> 2007-10-23 14:28:34,844 DEBUG
>>> [org.jboss.cache.marshall.VersionAwareMarshaller] Initialised with
>>> version 2.0.0 and versionInt 20
>>> 2007-10-23 14:28:34,844 DEBUG
>>> [org.jboss.cache.marshall.VersionAwareMarshaller] Using default
>>> marshaller class org.jboss.cache.marshall.CacheMarshaller200
>>> 2007-10-23 14:28:34,844 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Using
>>> marshaller org.jboss.cache.marshall.VersionAwareMarshaller
>>> 2007-10-23 14:28:34,845 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Block
>>> received at 127.0.0.1:32787
>>> 2007-10-23 14:28:34,899 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Invoked on
>>> cache instance [127.0.0.1:32787] and InvocationContext [Invocation
>>> Context{methodCall=MethodName: _block; MethodIdInteger: 43; Args:
>>> ()transaction=null, globalTransaction=null,
>>> optionOverrides=Option{failSilently=false, cacheModeLocal
>>> =false, dataVersion=null, suppressLocking=false,
>>> forceDataGravitation=false, skipDataGravitation=false},
>>> originLocal=true, txHasMods=false}]
>>> 2007-10-23 14:28:34,899 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Setting up
>>> transactional context.
>>> 2007-10-23 14:28:34,899 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Setting tx
>>> as null and gtx as null
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.ReplicationInterceptor]
>>> isLocalCommitOrRollback? false; gtx = null
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.PessimisticLockInterceptor]
>>> PessimisticLockInterceptor invoked for method MethodName: _block;
>>> MethodIdInteg
>>> er: 43; Args: ()
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.PessimisticLockInterceptor] bypassed
>>> locking as method _block() doesn't require locking
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.CallInterceptor] Passing up method
>>> MethodName: _block; MethodIdInteger: 43; Args: () so it gets invoked 
>>> on cache.
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.UnlockInterceptor] Attempting to release
>>> locks on current thread.  Lock table is {}
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.ReplicationInterceptor] Non-tx and non
>>> crud meth
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Resetting
>>> invocation-scope options
>>> 2007-10-23 14:28:34,900 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] Block
>>> processed at 127.0.0.1:32787
>>> 2007-10-23 14:28:34,902 INFO
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache]
>>> viewAccepted(): [127.0.0.1:32787|0] [127.0.0.1:32787]
>>> 2007-10-23 14:28:34,902 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] UnBlock
>>> received at 127.0.0.1:32787
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Invoked on
>>> cache instance [127.0.0.1:32787] and InvocationContext [Invocation
>>> Context{methodCall=MethodName: _unblock; MethodIdInteger: 44; Args:
>>> ()transaction=null, globalTransaction=null,
>>> optionOverrides=Option{failSilently=false, cacheModeLoc
>>> al=false, dataVersion=null, suppressLocking=false,
>>> forceDataGravitation=false, skipDataGravitation=false},
>>> originLocal=true, txHasMods=false}]
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Setting up
>>> transactional context.
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Setting tx
>>> as null and gtx as null
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.ReplicationInterceptor]
>>> isLocalCommitOrRollback? false; gtx = null
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.PessimisticLockInterceptor]
>>> PessimisticLockInterceptor invoked for method MethodName: _unblock;
>>> MethodIdInt
>>> eger: 44; Args: ()
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.PessimisticLockInterceptor] bypassed
>>> locking as method _unblock() doesn't require locking
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.CallInterceptor] Passing up method
>>> MethodName: _unblock; MethodIdInteger: 44; Args: () so it gets invoked o
>>> n cache.
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.UnlockInterceptor] Attempting to release
>>> locks on current thread.  Lock table is {}
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.ReplicationInterceptor] Non-tx and non
>>> crud meth
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.interceptors.InvocationContextInterceptor] Resetting
>>> invocation-scope options
>>> 2007-10-23 14:28:34,903 DEBUG
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] UnBlock
>>> processed at 127.0.0.1:32787
>>> 2007-10-23 14:28:34,903 INFO
>>> [org.jboss.cache.CacheImpl.DefaultPartition-ClusteredSSOCache] CacheImpl
>>> local address is 127.0.0.1:32787
>>>
>>> Got bored waiting here. ;-)
>>>
>>> 2007-10-23 14:32:30,094 INFO
>>> [org.jboss.bootstrap.microcontainer.ServerImpl] JBoss SHUTDOWN
>>>
>>>
>>> There's no JBoss Cache logging before this, only some generic
>>> clustering logging (I've included where jms is using clustering
>>> but otherwise stripped its logging):
>>>
>>>
>>> 2007-10-23 14:28:27,360 DEBUG [org.jboss.system.ServiceController]
>>> starting service jboss.jgroups:service=MultiplexerChannelFactory
>>> 2007-10-23 14:28:27,360 DEBUG [org.jboss.system.ServiceController]
>>> Registering service jboss.jgroups:service=MultiplexerChannelFactory
>>> 2007-10-23 14:28:27,364 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring create call;
>>> current state is Stopped
>>>
>>> ...
>>>
>>> 2007-10-23 14:28:27,367 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Starting JChannelFactory
>>> 2007-10-23 14:28:27,367 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Started JChannelFactory
>>> 2007-10-23 14:28:27,368 DEBUG
>>> [org.jboss.messaging.core.jmx.MessagingPostOfficeService] Starting
>>> jboss.messaging:service=PostOffice
>>> 2007-10-23 14:28:27,378 DEBUG
>>> [org.jboss.messaging.core.jmx.MessagingPostOfficeService]
>>> org.jboss.messaging.core.jmx.MessagingPostOfficeService at 14b7042 uses
>>> Multiplexe
>>> rJChannelFactory
>>>
>>> 2007-10-23 14:28:28,110 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Passing unique node id
>>> 127.0.0.1:1099 to the channel as additional data
>>> 2007-10-23 14:28:28,121 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Passing unique node id
>>> 127.0.0.1:1099 to the channel as additional data
>>> 2007-10-23 14:28:28,214 INFO  [STDOUT] 
>>> -------------------------------------------------------
>>> GMS: address is 127.0.0.1:32786
>>> -------------------------------------------------------
>>> 2007-10-23 14:28:30,306 DEBUG
>>> [org.jboss.messaging.core.impl.postoffice.GroupMember]
>>> org.jboss.messaging.core.impl.postoffice.GroupMember
>>> $ControlMembershipListener at 334
>>> cb9 got new view [127.0.0.1:32786|0] [127.0.0.1:32786], old view is null
>>> 2007-10-23 14:28:30,386 DEBUG
>>> [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice]
>>> org.jboss.messaging.core.impl.postoffice.MessagingPostOffice at 1c86116:
>>> 127.
>>> 0.0.1:32786 joined
>>> 2007-10-23 14:28:30,386 DEBUG
>>> [org.jboss.messaging.core.impl.postoffice.GroupMember] First view
>>> arrived
>>> 2007-10-23 14:28:30,387 DEBUG
>>> [org.jboss.messaging.core.impl.postoffice.GroupMember] We are the first
>>> member of the group so no need to wait for state
>>> 2007-10-23 14:28:30,391 INFO  [STDOUT] 
>>> -------------------------------------------------------
>>> GMS: address is 127.0.0.1:32787
>>> -------------------------------------------------------
>>> 2007-10-23 14:28:32,412 DEBUG
>>> [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice] Updated
>>> failover map:
>>>
>>>              0->0
>>>
>>> 2007-10-23 14:28:32,427 DEBUG
>>> [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice]
>>> org.jboss.messaging.core.impl.postoffice.MessagingPostOffice at 1c86116
>>> puts replicant locally: JVMID->6-fnu8e48f-1-yvy7e48f-0gpx6x-d24o4c5
>>>
>>> ...
>>>
>>> 2007-10-23 14:28:32,872 DEBUG
>>> [org.jboss.jms.server.endpoint.ServerConnectionFactoryEndpoint]
>>> updateClusteredClients being called!!! clientFactoriesToUpdate.size = 0
>>>
>>> ,,,
>>>
>>> 2007-10-23 14:28:34,335 DEBUG
>>> [org.jboss.jms.server.connectionfactory.ConnectionFactory] Started
>>> jboss.messaging.connectionfactory:service=ConnectionFactory
>>> 2007-10-23 14:28:34,336 DEBUG
>>> [org.jboss.system.microcontainer.jmx.ServiceControllerLifecycleCallback]
>>> Registered MBean jboss.jgroups:service=MultiplexerChannelFactory
>>> 2007-10-23 14:28:34,337 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring create call;
>>> current state is Started
>>> 2007-10-23 14:28:34,337 DEBUG
>>> [org.jboss.ha.framework.server.JChannelFactory] Ignoring start call;
>>> current state is Started
>>>
>>
> 

-- 
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberry at redhat.com



More information about the jboss-development mailing list