[jboss-user] [JBossCache] - Distributed cache crash after four days of operation

xxxz do-not-reply at jboss.com
Wed Oct 31 04:41:52 EDT 2007


Hi,

we are using jboss cache in a cluster of two machines. The caches in the cluster are propagating asynchronous invalidations.
We observed strange behavior after four days in production environment. One instance of cache sent 8MB of state info and the other cache instance crashed. We suspect that the second instance crashed because the first one sent a lot of data. What is weird is the first intance sending 8MB of state info. How is this possible when we only use invalidation ?

The log from the two machines :

MACHINE A

  | [10/29/07 13:26:39:745 CET] 00000029 TreeCache     I org.jboss.cache.TreeCache viewAccepted viewAccepted(): [192.168.200.33:4045|5] [192.168.200.33:4045, 192.168.200.32:1211]
  | [10/29/07 13:26:46:917 CET] 00000029 StateTransfer I org.jboss.cache.statetransfer.StateTransferGenerator_140 generateStateTransfer returning the state for tree rooted in /(8388608 bytes)
  | 

MACHINE B

  | [10/29/07 13:26:38:836 CET] 00000079 TreeCache     I org.jboss.cache.TreeCache viewAccepted viewAccepted(): [192.168.200.33:4045|5] [192.168.200.33:4045, 192.168.200.32:1211]
  | [10/29/07 13:26:39:945 CET] 00000075 JChannel      I org.jgroups.JChannel$CloserThread run fetching the state (auto_getstate=true)
  | [10/29/07 13:26:44:961 CET] 00000075 JChannel      I org.jgroups.JChannel$CloserThread run state transfer failed
  | [10/29/07 13:26:59:788 CET] 00000078 STATE_TRANSFE W org.jgroups.protocols.pbcast.STATE_TRANSFER handleViewChange discovered that the state provider (192.168.200.33:4045) crashed; will return null state to application
  | [10/29/07 13:26:59:788 CET] 00000078 STATE_TRANSFE W org.jgroups.protocols.pbcast.STATE_TRANSFER handleStateRsp digest received from 192.168.200.32:1211 is null, skipping setting digest !
  | [10/29/07 13:26:59:788 CET] 00000078 STATE_TRANSFE W org.jgroups.protocols.pbcast.STATE_TRANSFER handleStateRsp state received from 192.168.200.32:1211 is null, will return null state to application
  | [10/29/07 13:26:59:788 CET] 00000078 TreeCache     I org.jboss.cache.TreeCache viewAccepted viewAccepted(): [192.168.200.32:1211|6] [192.168.200.32:1211]
  | 

cahce configuration:


  | <server>
  |   <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar" />
  |   <mbean code="org.jboss.cache.TreeCache" name="jboss.cache:service=ISRTreeCache">
  |     <attribute name="TransactionManagerLookupClass">org.jboss.cache.GenericTransactionManagerLookup</attribute>
  | 
  |     <!-- depends>jboss:service=Naming</depends>
  |     <depends>jboss:service=TransactionManager</depends -->
  |     
  |     <!--
  |       Node locking scheme :
  |       PESSIMISTIC (default)
  |       OPTIMISTIC
  |     -->
  |     <attribute name="NodeLockingScheme">OPTIMISTIC</attribute>
  |     <!--
  |       Node locking isolation level :
  |       SERIALIZABLE
  |       REPEATABLE_READ (default)
  |       READ_COMMITTED
  |       READ_UNCOMMITTED
  |       NONE
  |       (ignored if NodeLockingScheme is OPTIMISTIC)
  |     -->
  |     <attribute name="IsolationLevel">REPEATABLE_READ</attribute>
  |     <!-- Lock parent before doing node additions/removes -->
  |     <attribute name="LockParentForChildInsertRemove">true</attribute>
  |     <!-- Valid modes are LOCAL
  |       REPL_ASYNC
  |       REPL_SYNC
  |       INVALIDATION_ASYNC
  |       INVALIDATION_SYNC
  |     -->
  |     <attribute name="CacheMode">INVALIDATION_ASYNC</attribute>
  |     <!-- Name of cluster. Needs to be the same for all TreeCache nodes in a
  |       cluster, in order to find each other -->
  |     <attribute name="ClusterName">ISR</attribute>
  |     <!-- Whether each interceptor should have an mbean
  |       registered to capture and display its statistics. -->
  |     <attribute name="UseInterceptorMbeans">false</attribute>
  | 
  |     <attribute name="ClusterConfig">
  |      <config>
  |       	<!-- UDP: if you have a multihomed machine,
  |           set the bind_addr attribute to the appropriate NIC IP address
  |           bind_addr="192.168.200.32"
  |         -->
  |         <!-- UDP: On Windows machines, because of the media sense feature
  |           being broken with multicast (even after disabling media sense)
  |           set the loopback attribute to true
  |         -->
  |       	<UDP mcast_port="45454" mcast_addr="228.1.2.3" tos="16"
  |       		ucast_recv_buf_size="20000000" ucast_send_buf_size="640000"
  |       		mcast_recv_buf_size="25000000" mcast_send_buf_size="640000"
  |       		loopback="true" discard_incompatible_packets="true"
  |       		max_bundle_size="10000" max_bundle_timeout="30"
  |       		use_incoming_packet_handler="true"
  |       		use_outgoing_packet_handler="false" ip_ttl="2"
  |       		enable_diagnostics="false" down_thread="false" up_thread="false"
  |       		enable_bundling="true" />
  |       	<PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false" />
  |         <MERGE2 min_interval="10000" max_interval="20000" />
  |         <FD shun="true" up_thread="true" down_thread="true" />
  |         <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false" />
  |         <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192" up_thread="false"
  |           down_thread="false" />
  |         <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10" down_thread="false" />
  |         <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false" />
  |         <FRAG frag_size="8192" down_thread="false" up_thread="false" />
  |         <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true" />
  |         <pbcast.STATE_TRANSFER up_thread="false" down_thread="false" />
  |       </config>
  | 
  |     </attribute>
  | 
  |    ...
  |    ...
  |    
  |   </mbean>
  | </server>
  | 

JBoss cache version: 1.4.1.SP4
JGroups version: 2.4.1

Has anyone any idea's what is going on ? Any Help Is Appreciated.

martin

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4100531#4100531

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4100531



More information about the jboss-user mailing list