[jboss-user] [JBossCache] - State transfer out of Memory.

Wed Aug 29 04:36:30 EDT 2007

I'm using jboss-cache1.4.1-SPA (TreeCache) and jgroups 1.4.1 which it is shipped with, ina cluster environment.
My application run with a heap size of 2G. When I have a cache size  of
372529026 Bytes (JBoss cache prints this value on my log) while slave node is
fetching state I get an outofmemory error due to JGroups. Here's my log:

2007-08-23 12:41:59,164 INFO  [ztc.cache.JBossCachePool] FREE MEM: 1209736176
2007-08-23 12:41:59,164 INFO  [ztc.cache.JBossCachePool] TOTAL MEM: 2029518848
2007-08-23 12:42:04,071 INFO  [ztc.tftpd.TFTPServerWrapper] processRequest(), RRQ by 127.0.0.1
2007-08-23 12:42:04,073 INFO  [PROFILE] configure:1 msecs
2007-08-23 12:42:04,073 INFO  [ztc.tftpd.TFTPServerWrapper] File "/000000000000" not found for client 127.0.0.1.34616
2007-08-23 12:42:14,081 INFO  [ztc.tftpd.TFTPServerWrapper] processRequest(), RRQ by 127.0.0.1
2007-08-23 12:42:14,083 INFO  [PROFILE] configure:1 msecs
2007-08-23 12:42:14,083 INFO  [ztc.tftpd.TFTPServerWrapper] File "/000000000000" not found for client 127.0.0.1.34616
2007-08-23 12:42:24,007 INFO  [org.jboss.cache.TreeCache] viewAccepted(): [192.168.1.249:34224|3] [192.168.1.249:34224, 192.168.1.250:32789]
2007-08-23 12:42:24,115 INFO  [org.jboss.cache.TreeCache] locking the subtree at / to transfer state
2007-08-23 12:42:29,457 INFO  [org.jboss.cache.statetransfer.StateTransferGenerator_140] returning the state for tree rooted in /(372529026 bytes)
2007-08-23 12:42:34,514 ERROR [org.jgroups.stack.DownHandler] DownHandler (FRAG) caught exception
java.lang.OutOfMemoryError
2007-08-23 12:42:34,514 INFO  [ztc.tftpd.TFTPServerWrapper] processRequest(), RRQ by 127.0.0.1
2007-08-23 12:42:34,538 INFO  [PROFILE] configure:23 msecs

Note that before getting error my memory checker thread states there's nearly
1.2G of memory!!!

Working with smaller cache, everything works fine. 

Debugging your code I found the slave hangs on the following jGroups method:

	boolean rc = channel.getState(null, state_fetch_timeout);

So, at first glance, it seems to me jgroups introduces a memory leak, but it may be a  protocol problem
In my configuration file, the part related to jgroups looks like this:

                <UDP mcast_addr="229.1.2.4" mcast_port="45555"
                        ip_ttl="64" ip_mcast="true"
			bind_addr="192.168.1.250"
                        mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
                        ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
                        loopback="false" />
                <PING timeout="2000" num_initial_members="3"
                        up_thread="true" down_thread="true" />
                <MERGE2 min_interval="5000" max_interval="10000" />
                <FD_SOCK/>
                <VERIFY_SUSPECT timeout="3000" num_msgs="3"
                        up_thread="true" down_thread="true" />
                <pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800"
                       up_thread="true" down_thread="true" />
                <pbcast.STABLE desired_avg_gossip="20000"
                       up_thread="true" down_thread="true" />
                <UNICAST timeout="5000" window_size="100" min_threshold="10"
                        down_thread="true" />
                <FRAG frag_size="8192"
                        down_thread="true" up_thread="true" />
                <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
                        shun="true" print_local_addr="true" />
		<pbcast.STATE_TRANSFER down_thread="false" up_thread="false"/>

I read on jgroups user guide, pbcast.STATE_TRANSFER consumes a lot of memory, so STREAMING_STATE_TRANFER is better for big caches.
I replaced <pbcast.STATE_TRANSFER down_thread="false" up_thread="false"/> with <pbcast.STREAMING_STATE_TRANSFER down_thread="false" up_thread="false"/>,
but slave hangs and I see no attempt to tranfer state on master log (Consider that now I have no cache, so by using STATE_TRASFER everything works fine).

Have you got any suggestion?

Many thanks, Fabrizio

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4079034#4079034

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4079034