[jboss-jira] [JBoss JIRA] Resolved: (JBCACHE-1106) STREAMING_STATE_TRANSFER fails between two JBoss Cache instances when state is large (several hundred MB).

Thu Jun 14 12:07:11 EDT 2007

     [ http://jira.jboss.com/jira/browse/JBCACHE-1106?page=all ]

Vladimir Blagojevic resolved JBCACHE-1106.
------------------------------------------

    Resolution: Done

Related commit:
CacheImpl.java 1.91

> STREAMING_STATE_TRANSFER fails between two JBoss Cache instances when state is large (several hundred MB).
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: JBCACHE-1106
>                 URL: http://jira.jboss.com/jira/browse/JBCACHE-1106
>             Project: JBoss Cache
>          Issue Type: Bug
>      Security Level: Public(Everyone can see) 
>    Affects Versions: 2.0.0.CR1
>         Environment: Linux + jdk1.5.0_09
> Windows XP + jdk1.5.0_09
> JBossCache 2.0.0.CR1 (Habanero)
> Tested with JGroups 2.4.1 SP3 and 2.5
>            Reporter: hernan silberman
>         Assigned To: Vladimir Blagojevic
>             Fix For: 2.0.0.GA
>
>
> Initial state transfer using STREAMING_STATE_TRANSFER fails between two JBoss Cache instances when state being transfered is large. I'm not a JGroups expert but what I'm seeing is the following pattern:
> I build a large cache with a BDBJE cache loader all alone in a cluster with a REPL_SYNC policy, STREAMING_STATE_TRANSFER, and fetchPersistentState set to true.  This first cache starts up perfectly and is fed several hundred MB of cache entries.
> I then start a second cache with a similar configuration and it finds the first cache and requests a full transfer of the persistent state from its cache loader as expected.  When the state is a few hundred MB big and takes less than 20 seconds to transmit this works fine but I hit a limit when it's around 800MB big and takes longer than 20 seconds to transmit.  I see a generic CacheException as shown below.
> org.jboss.cache.CacheException: Unable to fetch state on startup
> 	at org.jboss.cache.CacheImpl.start(CacheImpl.java:791)
> 	at org.jboss.cache.DefaultCacheFactory.createCache(DefaultCacheFactory.java:87)
> 	at org.jboss.cache.DefaultCacheFactory.createCache(DefaultCacheFactory.java:58)
> 	at org.jboss.cache.DefaultCacheFactory.createCache(DefaultCacheFactory.java:51)
> 	at jboss.StreamingStateXferTest.<init>(StreamingStateXferTest.java:25)
> 	at jboss.StreamingStateXferTest.main(StreamingStateXferTest.java:41)
> When I tail the DEBUG logs as these caches talk the pattern I've noticed is that the second cache (the client in the state transfer) decides to leave the cluster 20 seconds into the state transfer:
> DEBUG 14:51.27 [main] STABLE - resuming message garbage collection
> DEBUG 14:51.27 [main] ParticipantGmsImpl - sending LEAVE request to 10.98.35.69:3298 (local_addr=10.98.35.69:3300)
> It's all downhill from there and the CacheException is thrown.
> I've come up with a couple of simple programs to test this: the first one instantiates the the first cache and loads it with test data.  The second program instantiates the caches so that the state transfer runs:
> ----------------------------------------------------------------------------------------------------------
> import java.util.Properties;
> import java.util.UUID;
> import javax.naming.Context;
> import javax.naming.InitialContext;
> import javax.transaction.UserTransaction;
> import org.jboss.cache.Cache;
> import org.jboss.cache.CacheFactory;
> import org.jboss.cache.DefaultCacheFactory;
> import org.jboss.cache.Fqn;
> import org.jboss.cache.Node;
> import org.jboss.cache.transaction.DummyTransactionManager;
> public class BuildTestCache {
>     private Cache cache1;
>     private static final int CacheSizeInMb = 800;
>     
>     public BuildTestCache() {
>         
>         CacheFactory factory = DefaultCacheFactory.getInstance();
>         // Set up transaction manager (otherwise our BDBJE cache loader will not
>         // sync to disk for each cache update leading to a possible OutOfMemoryException).
>         DummyTransactionManager.getInstance();
>         Properties prop = new Properties();
>         prop.put(Context.INITIAL_CONTEXT_FACTORY, "org.jboss.cache.transaction.DummyContextFactory");
>         // Build the first cache and load it with data.
>         cache1 = factory.createCache("cache1-configuration.xml");
>         
>         System.out.println("------------------ Loading Cache 1 ---------------------");
>         byte[] value = new byte[1024*1024];
>         for( int i=1; i<CacheSizeInMb; i++ ) {
>             UserTransaction tx = null;
>             try {
>                 tx = (UserTransaction)new InitialContext(prop).lookup("UserTransaction");
>                 tx.begin();
>                 StringBuilder key = new StringBuilder("/").append(UUID.randomUUID().toString());
>                 Node newNode = cache1.getRoot().addChild(Fqn.fromString(key.toString()));
>                 newNode.put("payload", value);
>                 tx.commit();
>             } catch(Exception e) {
>                 e.printStackTrace();
>                 if(null!=tx) {
>                     try {
>                         tx.rollback();
>                     } catch(Exception ignore){}
>                 }
>             }
>         }
>         System.out.println("------------------ Cache 1 loaded ---------------------");
>         if(null!=cache1) {
>             cache1.stop();
>         }
>     }
>     
>     public static final void main(String[] args) {
>         new BuildTestCache();
>     }
> }
> ----------------------------------------------------------------------------------------------------------
> import org.jboss.cache.Cache;
> import org.jboss.cache.CacheException;
> import org.jboss.cache.CacheFactory;
> import org.jboss.cache.DefaultCacheFactory;
> public class StreamingStateXferTest {
>     private Cache cache1;
>     private Cache cache2;
>     
>     public StreamingStateXferTest() {
>         
>         CacheFactory factory = DefaultCacheFactory.getInstance();
>         // Create the first cache.
>         System.out.println("------------------ Building Cache 1 ---------------------");
>         cache1 = factory.createCache("cache1-configuration.xml");
>         // Now create the second cache.  It should contact the first and ask
>         // for an initial state transfer which fails after about 20 seconds.
>         try {
>             System.out.println("------------------ Building Cache 2 ---------------------");
>             cache2 = factory.createCache("cache2-configuration.xml");
>         } catch(CacheException e) {
>             System.out.println("CacheException encountered.");
>             e.printStackTrace();
>         }
>         if(null!=cache1) {
>             cache1.stop();
>         }
>         if(null!=cache2) {
>             cache2.stop();
>         }
>     }
>     
>     public static final void main(String[] args) {
>         new StreamingStateXferTest();
>     }
> }
> ----------------------------------------------------------------------------------------------------------
> cache1-configuration.xml
> <?xml version="1.0" encoding="UTF-8"?>
> <server>
>    <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar"/>
>    <mbean code="org.jboss.cache.CacheImpl" name="jboss.cache:service=Cache">
>       <depends>jboss:service=Naming</depends>
>       <depends>jboss:service=TransactionManager</depends>
>       <!-- Configure the TransactionManager -->
>       <attribute name="TransactionManagerLookupClass">org.jboss.cache.transaction.DummyTransactionManagerLookup</attribute>
>       <!--
>           Node locking level : SERIALIZABLE
>                                REPEATABLE_READ (default)
>                                READ_COMMITTED
>                                READ_UNCOMMITTED
>                                NONE
>       -->
>       <attribute name="IsolationLevel">READ_COMMITTED</attribute>
>       <!-- Lock parent before doing node additions/removes -->
>       <attribute name="LockParentForChildInsertRemove">false</attribute>
>       <!--
>            Valid modes are LOCAL
>                            REPL_ASYNC
>                            REPL_SYNC
>                            INVALIDATION_ASYNC
>                            INVALIDATION_SYNC
>       -->
>       <attribute name="CacheMode">REPL_SYNC</attribute>
>       <!-- Name of cluster. Needs to be the same for all JBoss Cache nodes in a
>            cluster in order to find each other. -->
>       <attribute name="ClusterName">ACS_CLUSTER</attribute>
>       <!--Uncomment next three statements to enable JGroups multiplexer.
>          This configuration is dependent on the JGroups multiplexer being
>          registered in an MBean server such as JBossAS.  -->
>       <!--
>       <depends>jgroups.mux:name=Multiplexer</depends>
>       <attribute name="MultiplexerService">jgroups.mux:name=Multiplexer</attribute>
>       <attribute name="MultiplexerStack">fc-fast-minimalthreads</attribute>
>       -->
>       <!-- JGroups protocol stack properties.
>          ClusterConfig isn't used if the multiplexer is enabled and successfully initialized.
>       -->
>       <attribute name="ClusterConfig">
>          <config>
>             <!-- UDP: if you have a multihomed machine,
>   set the bind_addr attribute to the appropriate NIC IP address -->
>             <!-- UDP: On Windows machines, because of the media sense feature
>       being broken with multicast (even after disabling media sense)
>       set the loopback attribute to true -->
>             <UDP mcast_addr="228.1.2.3" mcast_port="48866"
>                  ip_ttl="64" ip_mcast="true"
>                  mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
>                  ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
>                  loopback="true"/>
>             <PING timeout="2000" num_initial_members="3"
>                   up_thread="false" down_thread="false"/>
>             <MERGE2 min_interval="10000" max_interval="20000"/>
>             <!--
>             <FD shun="true" up_thread="true" down_thread="true"/>
>             -->
>             <FD_SOCK/>
>             <VERIFY_SUSPECT timeout="1500"
>                             up_thread="false" down_thread="false"/>
>             <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
>                            max_xmit_size="8192" up_thread="false" down_thread="false"/>
>             <UNICAST timeout="600,1200,2400" down_thread="false"/>
>             <pbcast.STABLE desired_avg_gossip="20000"
>                            up_thread="false" down_thread="false"/>
>             <FRAG frag_size="8192"
>                   down_thread="false" up_thread="false"/>
>             <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
>                         shun="true" print_local_addr="true"/>
> 			<!--
>             <pbcast.STATE_TRANSFER up_thread="false" down_thread="false"/>
>             -->
> 			<pbcast.STREAMING_STATE_TRANSFER use_reading_thread="true"/>            
>          </config>
>       </attribute>
>       <!--
>           The max amount of time (in milliseconds) we wait until the
>           initial state (ie. the contents of the cache) are retrieved from
>           existing members in a clustered environment
>       -->
>       <attribute name="InitialStateRetrievalTimeout">60000</attribute>
>       <!--
>           Number of milliseconds to wait until all responses for a
>           synchronous call have been received.
>       -->
>       <attribute name="SyncReplTimeout">20000</attribute>
>       <!-- Max number of milliseconds to wait for a lock acquisition -->
>       <attribute name="LockAcquisitionTimeout">15000</attribute>
>       <!-- Specific eviction policy configurations.-->
>       <attribute name="EvictionPolicyConfig">
>          <config>
>             <attribute name="wakeUpIntervalSeconds">1</attribute>
>             <!-- This defaults to 200000 if not specified -->
>             <attribute name="eventQueueSize">200000</attribute>
>             <attribute name="policyClass">org.jboss.cache.eviction.FIFOPolicy</attribute>
>             <!-- Cache wide default -->
>             <region name="/_default_">
>                <attribute name="maxNodes">300</attribute>
>             </region>
>          </config>
>       </attribute>
> 	<attribute name="CacheLoaderConfiguration">
> 	<config>
> 		<passivation>false</passivation>
> 		<!--<preload>/</preload>-->
> 		<shared>false</shared>
> 		<cacheloader>
> 			<class>org.jboss.cache.loader.bdbje.BdbjeCacheLoader</class>
> 			<properties>
> 			location=/tmp/filestore2
> 			</properties>
> 			<fetchPersistentState>true</fetchPersistentState>
> 			<async>false</async>
> 			<ignoreModifications>false</ignoreModifications>
> 			<purgeOnStartup>true</purgeOnStartup>
> 		</cacheloader>
> 	</config>
> 	</attribute>
>    </mbean>
> </server>
> ----------------------------------------------------------------------------------------------------------
> cache2-configuration.xml
> <?xml version="1.0" encoding="UTF-8"?>
> <server>
>    <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar"/>
>    <mbean code="org.jboss.cache.CacheImpl" name="jboss.cache:service=Cache">
>       <depends>jboss:service=Naming</depends>
>       <depends>jboss:service=TransactionManager</depends>
>       <!-- Configure the TransactionManager -->
>       <attribute name="TransactionManagerLookupClass">org.jboss.cache.transaction.DummyTransactionManagerLookup</attribute>
>       <!-- Use our custom TxManagerLookup class -->
>       <!--
>       <attribute name="TransactionManagerLookupClass">org.jboss.cache.transaction.GenericTransactionManagerLookup</attribute>
>       <attribute name="TransactionManagerLookupClass">com.scea.scne.acs.shard.tx.CustomTxManagerLookup</attribute>
> 	  -->
> 	  
>       <!--
>           Node locking level : SERIALIZABLE
>                                REPEATABLE_READ (default)
>                                READ_COMMITTED
>                                READ_UNCOMMITTED
>                                NONE
>       -->
>       <attribute name="IsolationLevel">READ_COMMITTED</attribute>
>       <!-- Lock parent before doing node additions/removes -->
>       <attribute name="LockParentForChildInsertRemove">false</attribute>
>       <!--
>            Valid modes are LOCAL
>                            REPL_ASYNC
>                            REPL_SYNC
>                            INVALIDATION_ASYNC
>                            INVALIDATION_SYNC
>       -->
>       <attribute name="CacheMode">REPL_SYNC</attribute>
>       <!-- Name of cluster. Needs to be the same for all JBoss Cache nodes in a
>            cluster in order to find each other. -->
>       <attribute name="ClusterName">ACS_CLUSTER</attribute>
>       <!--Uncomment next three statements to enable JGroups multiplexer.
>          This configuration is dependent on the JGroups multiplexer being
>          registered in an MBean server such as JBossAS.  -->
>       <!--
>       <depends>jgroups.mux:name=Multiplexer</depends>
>       <attribute name="MultiplexerService">jgroups.mux:name=Multiplexer</attribute>
>       <attribute name="MultiplexerStack">fc-fast-minimalthreads</attribute>
>       -->
>       <!-- JGroups protocol stack properties.
>          ClusterConfig isn't used if the multiplexer is enabled and successfully initialized.
>       -->
>       <attribute name="ClusterConfig">
>          <config>
>             <!-- UDP: if you have a multihomed machine,
>   set the bind_addr attribute to the appropriate NIC IP address -->
>             <!-- UDP: On Windows machines, because of the media sense feature
>       being broken with multicast (even after disabling media sense)
>       set the loopback attribute to true -->
>             <UDP mcast_addr="228.1.2.3" mcast_port="48866"
>                  ip_ttl="64" ip_mcast="true"
>                  mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
>                  ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
>                  loopback="true"/>
>             <PING timeout="2000" num_initial_members="3"
>                   up_thread="false" down_thread="false"/>
>             <MERGE2 min_interval="10000" max_interval="20000"/>
>             <!--
>             <FD shun="true" up_thread="true" down_thread="true"/>
>             -->
>             <FD_SOCK/>
>             <VERIFY_SUSPECT timeout="1500"
>                             up_thread="false" down_thread="false"/>
>             <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
>                            max_xmit_size="8192" up_thread="false" down_thread="false"/>
>             <UNICAST timeout="600,1200,2400" down_thread="false"/>
>             <pbcast.STABLE desired_avg_gossip="20000"
>                            up_thread="false" down_thread="false"/>
>             <FRAG frag_size="8192"
>                   down_thread="false" up_thread="false"/>
>             <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
>                         shun="true" print_local_addr="true"/>
> 			<!--
>             <pbcast.STATE_TRANSFER up_thread="false" down_thread="false"/>
>             -->
> 			<pbcast.STREAMING_STATE_TRANSFER use_reading_thread="true"/>            
>          </config>
>       </attribute>
>       <!--
>           The max amount of time (in milliseconds) we wait until the
>           initial state (ie. the contents of the cache) are retrieved from
>           existing members in a clustered environment
>       -->
>       <attribute name="InitialStateRetrievalTimeout">60000</attribute>
>       <!--
>           Number of milliseconds to wait until all responses for a
>           synchronous call have been received.
>       -->
>       <attribute name="SyncReplTimeout">20000</attribute>
>       <!-- Max number of milliseconds to wait for a lock acquisition -->
>       <attribute name="LockAcquisitionTimeout">15000</attribute>
>       <!-- Specific eviction policy configurations.-->
>       <attribute name="EvictionPolicyConfig">
>          <config>
>             <attribute name="wakeUpIntervalSeconds">1</attribute>
>             <!-- This defaults to 200000 if not specified -->
>             <attribute name="eventQueueSize">200000</attribute>
>             <attribute name="policyClass">org.jboss.cache.eviction.FIFOPolicy</attribute>
>             <!-- Cache wide default -->
>             <region name="/_default_">
>                <attribute name="maxNodes">300</attribute>
>             </region>
>          </config>
>       </attribute>
> 	<attribute name="CacheLoaderConfiguration">
> 	<config>
> 		<passivation>false</passivation>
> 		<!--<preload>/</preload>-->
> 		<shared>false</shared>
> 		<cacheloader>
> 			<class>org.jboss.cache.loader.bdbje.BdbjeCacheLoader</class>
> 			<properties>
> 			location=/tmp/filestore2
> 			</properties>
> 			<fetchPersistentState>true</fetchPersistentState>
> 			<async>false</async>
> 			<ignoreModifications>false</ignoreModifications>
> 			<purgeOnStartup>true</purgeOnStartup>
> 		</cacheloader>
> 	</config>
> 	</attribute>
>    </mbean>
> </server>
> Please let me know if there's any additional information I can provide.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira