[JBossCache] - Distributed cache crash after four days of operation - jboss-user

Wednesday, 31 October 2007

Hi,

we are using jboss cache in a cluster of two machines. The caches in the cluster are
propagating asynchronous invalidations.
We observed strange behavior after four days in production environment. One instance of
cache sent 8MB of state info and the other cache instance crashed. We suspect that the
second instance crashed because the first one sent a lot of data. What is weird is the
first intance sending 8MB of state info. How is this possible when we only use
invalidation ?

The log from the two machines :

MACHINE A

  | [10/29/07 13:26:39:745 CET] 00000029 TreeCache     I org.jboss.cache.TreeCache
viewAccepted viewAccepted(): [192.168.200.33:4045|5] [192.168.200.33:4045,
192.168.200.32:1211]
  | [10/29/07 13:26:46:917 CET] 00000029 StateTransfer I
org.jboss.cache.statetransfer.StateTransferGenerator_140 generateStateTransfer returning
the state for tree rooted in /(8388608 bytes)
  | 

MACHINE B

  | [10/29/07 13:26:38:836 CET] 00000079 TreeCache     I org.jboss.cache.TreeCache
viewAccepted viewAccepted(): [192.168.200.33:4045|5] [192.168.200.33:4045,
192.168.200.32:1211]
  | [10/29/07 13:26:39:945 CET] 00000075 JChannel      I org.jgroups.JChannel$CloserThread
run fetching the state (auto_getstate=true)
  | [10/29/07 13:26:44:961 CET] 00000075 JChannel      I org.jgroups.JChannel$CloserThread
run state transfer failed
  | [10/29/07 13:26:59:788 CET] 00000078 STATE_TRANSFE W
org.jgroups.protocols.pbcast.STATE_TRANSFER handleViewChange discovered that the state
provider (192.168.200.33:4045) crashed; will return null state to application
  | [10/29/07 13:26:59:788 CET] 00000078 STATE_TRANSFE W
org.jgroups.protocols.pbcast.STATE_TRANSFER handleStateRsp digest received from
192.168.200.32:1211 is null, skipping setting digest !
  | [10/29/07 13:26:59:788 CET] 00000078 STATE_TRANSFE W
org.jgroups.protocols.pbcast.STATE_TRANSFER handleStateRsp state received from
192.168.200.32:1211 is null, will return null state to application
  | [10/29/07 13:26:59:788 CET] 00000078 TreeCache     I org.jboss.cache.TreeCache
viewAccepted viewAccepted(): [192.168.200.32:1211|6] [192.168.200.32:1211]
  | 

cahce configuration:

  | <server>
  |   <classpath codebase="./lib" archives="jboss-cache.jar,
jgroups.jar" />
  |   <mbean code="org.jboss.cache.TreeCache"
name="jboss.cache:service=ISRTreeCache">
  |     <attribute
name="TransactionManagerLookupClass">org.jboss.cache.GenericTransactionManagerLookup</attribute>
  | 
  |     <!-- depends>jboss:service=Naming</depends>
  |     <depends>jboss:service=TransactionManager</depends -->
  |     
  |     <!--
  |       Node locking scheme :
  |       PESSIMISTIC (default)
  |       OPTIMISTIC
  |     -->
  |     <attribute name="NodeLockingScheme">OPTIMISTIC</attribute>
  |     <!--
  |       Node locking isolation level :
  |       SERIALIZABLE
  |       REPEATABLE_READ (default)
  |       READ_COMMITTED
  |       READ_UNCOMMITTED
  |       NONE
  |       (ignored if NodeLockingScheme is OPTIMISTIC)
  |     -->
  |     <attribute
name="IsolationLevel">REPEATABLE_READ</attribute>
  |     <!-- Lock parent before doing node additions/removes -->
  |     <attribute
name="LockParentForChildInsertRemove">true</attribute>
  |     <!-- Valid modes are LOCAL
  |       REPL_ASYNC
  |       REPL_SYNC
  |       INVALIDATION_ASYNC
  |       INVALIDATION_SYNC
  |     -->
  |     <attribute name="CacheMode">INVALIDATION_ASYNC</attribute>
  |     <!-- Name of cluster. Needs to be the same for all TreeCache nodes in a
  |       cluster, in order to find each other -->
  |     <attribute name="ClusterName">ISR</attribute>
  |     <!-- Whether each interceptor should have an mbean
  |       registered to capture and display its statistics. -->
  |     <attribute name="UseInterceptorMbeans">false</attribute>
  | 
  |     <attribute name="ClusterConfig">
  |      <config>
  |       	<!-- UDP: if you have a multihomed machine,
  |           set the bind_addr attribute to the appropriate NIC IP address
  |           bind_addr="192.168.200.32"
  |         -->
  |         <!-- UDP: On Windows machines, because of the media sense feature
  |           being broken with multicast (even after disabling media sense)
  |           set the loopback attribute to true
  |         -->
  |       	<UDP mcast_port="45454" mcast_addr="228.1.2.3"
tos="16"
  |       		ucast_recv_buf_size="20000000"
ucast_send_buf_size="640000"
  |       		mcast_recv_buf_size="25000000"
mcast_send_buf_size="640000"
  |       		loopback="true" discard_incompatible_packets="true"
  |       		max_bundle_size="10000" max_bundle_timeout="30"
  |       		use_incoming_packet_handler="true"
  |       		use_outgoing_packet_handler="false" ip_ttl="2"
  |       		enable_diagnostics="false" down_thread="false"
up_thread="false"
  |       		enable_bundling="true" />
  |       	<PING timeout="2000" num_initial_members="3"
up_thread="false" down_thread="false" />
  |         <MERGE2 min_interval="10000" max_interval="20000"
/>
  |         <FD shun="true" up_thread="true"
down_thread="true" />
  |         <VERIFY_SUSPECT timeout="1500" up_thread="false"
down_thread="false" />
  |         <pbcast.NAKACK gc_lag="50"
retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192"
up_thread="false"
  |           down_thread="false" />
  |         <UNICAST timeout="600,1200,2400" window_size="100"
min_threshold="10" down_thread="false" />
  |         <pbcast.STABLE desired_avg_gossip="20000"
up_thread="false" down_thread="false" />
  |         <FRAG frag_size="8192" down_thread="false"
up_thread="false" />
  |         <pbcast.GMS join_timeout="5000"
join_retry_timeout="2000" shun="true"
print_local_addr="true" />
  |         <pbcast.STATE_TRANSFER up_thread="false"
down_thread="false" />
  |       </config>
  | 
  |     </attribute>
  | 
  |    ...
  |    ...
  |    
  |   </mbean>
  | </server>
  | 

JBoss cache version: 1.4.1.SP4
JGroups version: 2.4.1

Has anyone any idea's what is going on ? Any Help Is Appreciated.

martin

View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4100531#...

Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[JBossCache] - Distributed cache crash after four days of operation