[jboss-jira] [JBoss JIRA] Created: (JBCACHE-1252) problem is CacheStoreInterceptor

Wed Dec 26 00:46:46 EST 2007

problem is CacheStoreInterceptor
--------------------------------

                 Key: JBCACHE-1252
                 URL: http://jira.jboss.com/jira/browse/JBCACHE-1252
             Project: JBoss Cache
          Issue Type: Bug
      Security Level: Public (Everyone can see)
    Affects Versions: 1.4.1.SP1
         Environment: Linux 2.6.9-5.ELsmp, JBOSS 4.0.5GA
            Reporter: Nadav Fried
         Assigned To: Manik Surtani

Hi,

When running load on a 4 node clustered jboss(4.0.5.GA) environment, one of the nodes occassionaly goes out of memory.
When looking at the memory dump, i see alot of org.jgroup.Message objects. After inspecting their contents I'm pretty sure these objects are related to jboss cache. Also, in the stack trace I see that one thread is always stuck trying to write to a HashMap in CacheStoreInterceptor.
It seems that the HashMap is not synchronized. Here is the relevant stack trace part:

"MessageDispatcher up processing thread" daemon prio=10 tid=0x6a840000 nid=0x7a7c runnable [0x68cfe000..0x68cff040]
   java.lang.Thread.State: RUNNABLE
        at java.util.HashMap.put(HashMap.java:376)
        at org.jboss.cache.interceptors.CacheStoreInterceptor.prepareCacheLoader(CacheStoreInterceptor.java:299)
        at org.jboss.cache.interceptors.CacheStoreInterceptor.invoke(CacheStoreInterceptor.java:130)
        at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
        at org.jboss.cache.interceptors.TxInterceptor.handleCommitRollback(TxInterceptor.java:660)
        at org.jboss.cache.interceptors.TxInterceptor.runCommitPhase(TxInterceptor.java:702)
        at org.jboss.cache.interceptors.TxInterceptor$RemoteSynchronizationHandler.afterCompletion(TxInterceptor.java:1007)
        at org.jboss.cache.interceptors.OrderedSynchronizationHandler.afterCompletion(OrderedSynchronizationHandler.java:83)
        at org.jboss.tm.TransactionImpl.doAfterCompletion(TransactionImpl.java:1526)
        at org.jboss.tm.TransactionImpl.completeTransaction(TransactionImpl.java:1198)
        at org.jboss.tm.TransactionImpl.commit(TransactionImpl.java:377)
        at org.jboss.cache.interceptors.TxInterceptor.handlePessimisticPrepare(TxInterceptor.java:479)
        at org.jboss.cache.interceptors.TxInterceptor.handleRemotePrepare(TxInterceptor.java:304)
        at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:127)
        at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
        at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:183)
        at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5520)
        at org.jboss.cache.TreeCache._replicate(TreeCache.java:4827)
        at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:236)
        at org.jgroups.blocks.RpcDispatcher.handle(RpcDispatcher.java:220)
        at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:615)
        at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:512)
        at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:326)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUp(MessageDispatcher.java:722)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.access$300(MessageDispatcher.java:554)
        at org.jgroups.blocks.MessageDispatcher$1.run(MessageDispatcher.java:691)
        at java.lang.Thread.run(Thread.java:619)

Here is our configuration(same on all 4 nodes) - (I know it's still the old configuration style but still it should work):

    <!-- ==================================================================== -->
    <!-- Defines TreeCache configuration                                      -->
    <!-- ==================================================================== -->

    <mbean code="org.jboss.cache.TreeCache"
        name="MessagingAS:service=MASClusterCache">

        <depends>jboss:service=Naming</depends>
        <depends>jboss:service=TransactionManager</depends>

        <!--
            Configure the TransactionManager
        -->
        <attribute name="TransactionManagerLookupClass">org.jboss.cache.JBossTransactionManagerLookup</attribute>

        <!--
            Node locking level : SERIALIZABLE
                                 REPEATABLE_READ (default)
                                 READ_COMMITTED
                                 READ_UNCOMMITTED
                                 NONE
        -->

       <attribute name="CacheLoaderClass">org.jboss.cache.loader.FileCacheLoader</attribute>
       <attribute name="CacheLoaderConfig" replace="false">location=/home/TEMP/jbossCache_poller</attribute>

       <!-- this will be the default in the future (elih)
       <attribute name="CacheLoaderConfig" replace="false">
                 location=/HOME/tmp
       </attribute>
       -->

        <attribute name="IsolationLevel">READ_UNCOMMITTED</attribute>

        <!--
             Valid modes are LOCAL
                             REPL_ASYNC
                             REPL_SYNC
                             INVALIDATION_ASYNC
                             INVALIDATION_SYNC
        -->
        <attribute name="CacheMode">REPL_ASYNC</attribute>

        <!-- Name of cluster. Needs to be the same for all clusters, in order
             to find each other
        -->
        <attribute name="ClusterName">MASLOAD2</attribute>

        <!-- JGroups protocol stack properties. Can also be a URL,
             e.g. file:/home/bela/default.xml

           <attribute name="ClusterProperties"></attribute>
        -->

        <attribute name="ClusterConfig">
            <config>
                <!-- UDP: if you have a multihomed machine,
                set the bind_addr attribute to the appropriate NIC IP address -->
                <!-- UDP: On Windows machines, because of the media sense feature
                 being broken with multicast (even after disabling media sense)
                 set the loopback attribute to true -->
                <UDP mcast_addr="228.1.2.5" mcast_port="45577"
                    ip_ttl="64" ip_mcast="true"
                    mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
                    ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
                    loopback="true"/>
                <PING timeout="2000" num_initial_members="3"
                    up_thread="false" down_thread="false"/>
                <MERGE2 min_interval="10000" max_interval="20000"/>
                <FD timeout="10000" max_tries="5" shun="true" up_thread="true" down_thread="true"/>
                <VERIFY_SUSPECT timeout="1500"
                    up_thread="false" down_thread="false"/>
                <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
                    up_thread="false" down_thread="false"/>
                <pbcast.STABLE desired_avg_gossip="20000"
                    up_thread="false" down_thread="false"/>
                <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
                    down_thread="false"/>
                <FRAG frag_size="8192"
                    down_thread="false" up_thread="false"/>
                <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
                    shun="true" print_local_addr="true"/>
                <pbcast.STATE_TRANSFER up_thread="false" down_thread="false"/>
            </config>
        </attribute>

        <!--
            The max amount of time (in milliseconds) we wait until the
            initial state (ie. the contents of the cache) are retrieved from
            existing members in a clustered environment
        -->
        <attribute name="InitialStateRetrievalTimeout">20000</attribute>

        <!--
            Number of milliseconds to wait until all responses for a
            synchronous call have been received.
        -->
        <attribute name="SyncReplTimeout">15000</attribute>

        <!-- Max number of milliseconds to wait for a lock acquisition -->
        <attribute name="LockAcquisitionTimeout">10000</attribute>
   </mbean>

Thanks,
Nadav

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira