[JBoss Cache: Core Edition] - FD Issue - jboss-user

Thursday, 26 March 2009

We are using JBoss Cache 1.4.1 SP6 with JGroups 2.4.x
We have a cluster of cache instances with two Sun Solaris and multiple RHEL machines.

When one of the RHEL instance is restarted, the VIEW of the cache instances in SOLARIS
machines aren't updated.
i.e. viewAccepted() - Still has the old RHEL instance along with the new RHEL
instance(which was restarted)

eg: [172.16.11.200:65261, 172.16.11.12:50903, 172.16.11.10:41912, 172.16.11.20:51156,
172.16.11.10:43789, 172.16.11.20:57771,  172.16.11.10:51722, 172.16.11.20:35858, 
172.16.11.11:51210]

172.16.11.10 - RHEL Instance 1
172.16.11.20 - RHEL Instance 2

Its assumed that when a cache instance goes down the view should be immediately when
FD_SOCK is configured. But it wasn't updated as expected.

Whereas the viewAccepted() was updated with active members and got resolved after some
hours only.

We got a ReplicationException timeout

Received Throwable from remote node org.jboss.cache.ReplicationException:
rsp=sender=172.16.11.10:41912, retval=null, received=false, suspected=false

The code is as follows

 <attribute name="ClusterConfig">
  |             <config>
  |                 <!-- UDP: if you have a multihomed machine,
  |                 set the bind_addr attribute to the appropriate NIC IP address, e.g
bind_addr="192.168.0.2"
  |                 -->
  |                 <!-- UDP: On Windows machines, because of the media sense feature
  |                  being broken with multicast (even after disabling media sense)
  |                  set the loopback attribute to true -->
  |                 <UDP mcast_addr="224.7.8.9" mcast_port="45567"
  |                     ip_ttl="64" ip_mcast="true"
  |                     mcast_send_buf_size="150000"
mcast_recv_buf_size="80000"
  |                     ucast_send_buf_size="150000"
ucast_recv_buf_size="80000"
  |                     loopback="true" bind_addr="16.150.24.69"/>
  |                 <PING timeout="2000" num_initial_members="3"
  |                     up_thread="false" down_thread="false"/>
  |                 <MERGE2 min_interval="10000"
max_interval="20000"/>
  |                 <!--        <FD shun="true" up_thread="true"
down_thread="true" />-->
  |                 <FD_SOCK/>
  |                 <VERIFY_SUSPECT timeout="1500"
  |                     up_thread="false" down_thread="false"/>
  |                 <pbcast.NAKACK gc_lag="50"
retransmit_timeout="600,1200,2400,4800"
  |                     max_xmit_size="8192" up_thread="false"
down_thread="false"/>
  |                 <UNICAST timeout="600,1200,2400"
window_size="100" min_threshold="10"
  |                     down_thread="false"/>
  |                 <pbcast.STABLE desired_avg_gossip="20000"
  |                     up_thread="false" down_thread="false"/>
  |                 <FRAG frag_size="8192"
  |                     down_thread="false" up_thread="false"/>
  |                 <pbcast.GMS join_timeout="5000"
join_retry_timeout="2000"
  |                     shun="true" print_local_addr="true"/>
  |                 <pbcast.STATE_TRANSFER up_thread="true"
down_thread="true"/>
  |             </config>
  |         </attribute>

...
From the exception message we infer that  172.16.11.10:41912, this
cache instance has been restarted and the current active instance was 172.16.11.10:51722

View the original post :
http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4221191#...

Reply to the post :
http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&a...

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[JBoss Cache: Core Edition] - FD Issue