[jboss-user] [JBoss Cache: Core Edition] - FD Issue

Thu Mar 26 05:32:00 EDT 2009

We are using JBoss Cache 1.4.1 SP6 with JGroups 2.4.x
We have a cluster of cache instances with two Sun Solaris and multiple RHEL machines.

When one of the RHEL instance is restarted, the VIEW of the cache instances in SOLARIS machines aren't updated.
i.e. viewAccepted() - Still has the old RHEL instance along with the new RHEL instance(which was restarted)

eg: [172.16.11.200:65261, 172.16.11.12:50903, 172.16.11.10:41912, 172.16.11.20:51156, 172.16.11.10:43789, 172.16.11.20:57771,  172.16.11.10:51722, 172.16.11.20:35858,  172.16.11.11:51210]

172.16.11.10 - RHEL Instance 1
172.16.11.20 - RHEL Instance 2

Its assumed that when a cache instance goes down the view should be immediately when FD_SOCK is configured. But it wasn't updated as expected.

Whereas the viewAccepted() was updated with active members and got resolved after some hours only.

We got a ReplicationException timeout

Received Throwable from remote node org.jboss.cache.ReplicationException: rsp=sender=172.16.11.10:41912, retval=null, received=false, suspected=false

The code is as follows

 <attribute name="ClusterConfig">
  |             <config>
  |                 <!-- UDP: if you have a multihomed machine,
  |                 set the bind_addr attribute to the appropriate NIC IP address, e.g bind_addr="192.168.0.2"
  |                 -->
  |                 <!-- UDP: On Windows machines, because of the media sense feature
  |                  being broken with multicast (even after disabling media sense)
  |                  set the loopback attribute to true -->
  |                 <UDP mcast_addr="224.7.8.9" mcast_port="45567"
  |                     ip_ttl="64" ip_mcast="true"
  |                     mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
  |                     ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
  |                     loopback="true" bind_addr="16.150.24.69"/>
  |                 <PING timeout="2000" num_initial_members="3"
  |                     up_thread="false" down_thread="false"/>
  |                 <MERGE2 min_interval="10000" max_interval="20000"/>
  |                 <!--        <FD shun="true" up_thread="true" down_thread="true" />-->
  |                 <FD_SOCK/>
  |                 <VERIFY_SUSPECT timeout="1500"
  |                     up_thread="false" down_thread="false"/>
  |                 <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
  |                     max_xmit_size="8192" up_thread="false" down_thread="false"/>
  |                 <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
  |                     down_thread="false"/>
  |                 <pbcast.STABLE desired_avg_gossip="20000"
  |                     up_thread="false" down_thread="false"/>
  |                 <FRAG frag_size="8192"
  |                     down_thread="false" up_thread="false"/>
  |                 <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
  |                     shun="true" print_local_addr="true"/>
  |                 <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
  |             </config>
  |         </attribute>

>From the exception message we infer that  172.16.11.10:41912, this cache instance has been restarted and the current active instance was 172.16.11.10:51722

View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4221191#4221191

Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4221191