problem is CacheStoreInterceptor
--------------------------------
Key: JBCACHE-1252
URL:
http://jira.jboss.com/jira/browse/JBCACHE-1252
Project: JBoss Cache
Issue Type: Bug
Security Level: Public (Everyone can see)
Affects Versions: 1.4.1.SP1
Environment: Linux 2.6.9-5.ELsmp, JBOSS 4.0.5GA
Reporter: Nadav Fried
Assigned To: Manik Surtani
Hi,
When running load on a 4 node clustered jboss(4.0.5.GA) environment, one of the nodes
occassionaly goes out of memory.
When looking at the memory dump, i see alot of org.jgroup.Message objects. After
inspecting their contents I'm pretty sure these objects are related to jboss cache.
Also, in the stack trace I see that one thread is always stuck trying to write to a
HashMap in CacheStoreInterceptor.
It seems that the HashMap is not synchronized. Here is the relevant stack trace part:
"MessageDispatcher up processing thread" daemon prio=10 tid=0x6a840000
nid=0x7a7c runnable [0x68cfe000..0x68cff040]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.put(HashMap.java:376)
at
org.jboss.cache.interceptors.CacheStoreInterceptor.prepareCacheLoader(CacheStoreInterceptor.java:299)
at
org.jboss.cache.interceptors.CacheStoreInterceptor.invoke(CacheStoreInterceptor.java:130)
at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
at
org.jboss.cache.interceptors.TxInterceptor.handleCommitRollback(TxInterceptor.java:660)
at
org.jboss.cache.interceptors.TxInterceptor.runCommitPhase(TxInterceptor.java:702)
at
org.jboss.cache.interceptors.TxInterceptor$RemoteSynchronizationHandler.afterCompletion(TxInterceptor.java:1007)
at
org.jboss.cache.interceptors.OrderedSynchronizationHandler.afterCompletion(OrderedSynchronizationHandler.java:83)
at org.jboss.tm.TransactionImpl.doAfterCompletion(TransactionImpl.java:1526)
at org.jboss.tm.TransactionImpl.completeTransaction(TransactionImpl.java:1198)
at org.jboss.tm.TransactionImpl.commit(TransactionImpl.java:377)
at
org.jboss.cache.interceptors.TxInterceptor.handlePessimisticPrepare(TxInterceptor.java:479)
at
org.jboss.cache.interceptors.TxInterceptor.handleRemotePrepare(TxInterceptor.java:304)
at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:127)
at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
at
org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:183)
at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5520)
at org.jboss.cache.TreeCache._replicate(TreeCache.java:4827)
at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:236)
at org.jgroups.blocks.RpcDispatcher.handle(RpcDispatcher.java:220)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:615)
at
org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:512)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:326)
at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUp(MessageDispatcher.java:722)
at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.access$300(MessageDispatcher.java:554)
at org.jgroups.blocks.MessageDispatcher$1.run(MessageDispatcher.java:691)
at java.lang.Thread.run(Thread.java:619)
Here is our configuration(same on all 4 nodes) - (I know it's still the old
configuration style but still it should work):
<!-- ==================================================================== -->
<!-- Defines TreeCache configuration -->
<!-- ==================================================================== -->
<mbean code="org.jboss.cache.TreeCache"
name="MessagingAS:service=MASClusterCache">
<depends>jboss:service=Naming</depends>
<depends>jboss:service=TransactionManager</depends>
<!--
Configure the TransactionManager
-->
<attribute
name="TransactionManagerLookupClass">org.jboss.cache.JBossTransactionManagerLookup</attribute>
<!--
Node locking level : SERIALIZABLE
REPEATABLE_READ (default)
READ_COMMITTED
READ_UNCOMMITTED
NONE
-->
<attribute
name="CacheLoaderClass">org.jboss.cache.loader.FileCacheLoader</attribute>
<attribute name="CacheLoaderConfig"
replace="false">location=/home/TEMP/jbossCache_poller</attribute>
<!-- this will be the default in the future (elih)
<attribute name="CacheLoaderConfig" replace="false">
location=/HOME/tmp
</attribute>
-->
<attribute
name="IsolationLevel">READ_UNCOMMITTED</attribute>
<!--
Valid modes are LOCAL
REPL_ASYNC
REPL_SYNC
INVALIDATION_ASYNC
INVALIDATION_SYNC
-->
<attribute name="CacheMode">REPL_ASYNC</attribute>
<!-- Name of cluster. Needs to be the same for all clusters, in order
to find each other
-->
<attribute name="ClusterName">MASLOAD2</attribute>
<!-- JGroups protocol stack properties. Can also be a URL,
e.g. file:/home/bela/default.xml
<attribute name="ClusterProperties"></attribute>
-->
<attribute name="ClusterConfig">
<config>
<!-- UDP: if you have a multihomed machine,
set the bind_addr attribute to the appropriate NIC IP address -->
<!-- UDP: On Windows machines, because of the media sense feature
being broken with multicast (even after disabling media sense)
set the loopback attribute to true -->
<UDP mcast_addr="228.1.2.5" mcast_port="45577"
ip_ttl="64" ip_mcast="true"
mcast_send_buf_size="150000"
mcast_recv_buf_size="80000"
ucast_send_buf_size="150000"
ucast_recv_buf_size="80000"
loopback="true"/>
<PING timeout="2000" num_initial_members="3"
up_thread="false" down_thread="false"/>
<MERGE2 min_interval="10000"
max_interval="20000"/>
<FD timeout="10000" max_tries="5"
shun="true" up_thread="true" down_thread="true"/>
<VERIFY_SUSPECT timeout="1500"
up_thread="false" down_thread="false"/>
<pbcast.NAKACK gc_lag="50"
retransmit_timeout="600,1200,2400,4800"
up_thread="false" down_thread="false"/>
<pbcast.STABLE desired_avg_gossip="20000"
up_thread="false" down_thread="false"/>
<UNICAST timeout="600,1200,2400" window_size="100"
min_threshold="10"
down_thread="false"/>
<FRAG frag_size="8192"
down_thread="false" up_thread="false"/>
<pbcast.GMS join_timeout="5000"
join_retry_timeout="2000"
shun="true" print_local_addr="true"/>
<pbcast.STATE_TRANSFER up_thread="false"
down_thread="false"/>
</config>
</attribute>
<!--
The max amount of time (in milliseconds) we wait until the
initial state (ie. the contents of the cache) are retrieved from
existing members in a clustered environment
-->
<attribute
name="InitialStateRetrievalTimeout">20000</attribute>
<!--
Number of milliseconds to wait until all responses for a
synchronous call have been received.
-->
<attribute name="SyncReplTimeout">15000</attribute>
<!-- Max number of milliseconds to wait for a lock acquisition -->
<attribute name="LockAcquisitionTimeout">10000</attribute>
</mbean>
Thanks,
Nadav
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira