We are using JBoss Cache 1.4.1 SP6 with JGroups 2.4.x
We have a cluster of cache instances with two Sun Solaris and multiple RHEL machines.
When one of the RHEL instance is restarted, the VIEW of the cache instances in SOLARIS
machines aren't updated.
i.e. viewAccepted() - Still has the old RHEL instance along with the new RHEL
instance(which was restarted)
eg: [172.16.11.200:65261, 172.16.11.12:50903, 172.16.11.10:41912, 172.16.11.20:51156,
172.16.11.10:43789, 172.16.11.20:57771, 172.16.11.10:51722, 172.16.11.20:35858,
172.16.11.11:51210]
172.16.11.10 - RHEL Instance 1
172.16.11.20 - RHEL Instance 2
Its assumed that when a cache instance goes down the view should be immediately when
FD_SOCK is configured. But it wasn't updated as expected.
Whereas the viewAccepted() was updated with active members and got resolved after some
hours only.
We got a ReplicationException timeout
Received Throwable from remote node org.jboss.cache.ReplicationException:
rsp=sender=172.16.11.10:41912, retval=null, received=false, suspected=false
The code is as follows
<attribute name="ClusterConfig">
| <config>
| <!-- UDP: if you have a multihomed machine,
| set the bind_addr attribute to the appropriate NIC IP address, e.g
bind_addr="192.168.0.2"
| -->
| <!-- UDP: On Windows machines, because of the media sense feature
| being broken with multicast (even after disabling media sense)
| set the loopback attribute to true -->
| <UDP mcast_addr="224.7.8.9" mcast_port="45567"
| ip_ttl="64" ip_mcast="true"
| mcast_send_buf_size="150000"
mcast_recv_buf_size="80000"
| ucast_send_buf_size="150000"
ucast_recv_buf_size="80000"
| loopback="true" bind_addr="16.150.24.69"/>
| <PING timeout="2000" num_initial_members="3"
| up_thread="false" down_thread="false"/>
| <MERGE2 min_interval="10000"
max_interval="20000"/>
| <!-- <FD shun="true" up_thread="true"
down_thread="true" />-->
| <FD_SOCK/>
| <VERIFY_SUSPECT timeout="1500"
| up_thread="false" down_thread="false"/>
| <pbcast.NAKACK gc_lag="50"
retransmit_timeout="600,1200,2400,4800"
| max_xmit_size="8192" up_thread="false"
down_thread="false"/>
| <UNICAST timeout="600,1200,2400"
window_size="100" min_threshold="10"
| down_thread="false"/>
| <pbcast.STABLE desired_avg_gossip="20000"
| up_thread="false" down_thread="false"/>
| <FRAG frag_size="8192"
| down_thread="false" up_thread="false"/>
| <pbcast.GMS join_timeout="5000"
join_retry_timeout="2000"
| shun="true" print_local_addr="true"/>
| <pbcast.STATE_TRANSFER up_thread="true"
down_thread="true"/>
| </config>
| </attribute>
From the exception message we infer that 172.16.11.10:41912, this
cache instance has been restarted and the current active instance was 172.16.11.10:51722
View the original post :
http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4221191#...
Reply to the post :
http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&a...