[jboss-user] [JBoss Cache: Core Edition] - FD Issue
karnivas
do-not-reply at jboss.com
Thu Mar 26 05:32:00 EDT 2009
We are using JBoss Cache 1.4.1 SP6 with JGroups 2.4.x
We have a cluster of cache instances with two Sun Solaris and multiple RHEL machines.
When one of the RHEL instance is restarted, the VIEW of the cache instances in SOLARIS machines aren't updated.
i.e. viewAccepted() - Still has the old RHEL instance along with the new RHEL instance(which was restarted)
eg: [172.16.11.200:65261, 172.16.11.12:50903, 172.16.11.10:41912, 172.16.11.20:51156, 172.16.11.10:43789, 172.16.11.20:57771, 172.16.11.10:51722, 172.16.11.20:35858, 172.16.11.11:51210]
172.16.11.10 - RHEL Instance 1
172.16.11.20 - RHEL Instance 2
Its assumed that when a cache instance goes down the view should be immediately when FD_SOCK is configured. But it wasn't updated as expected.
Whereas the viewAccepted() was updated with active members and got resolved after some hours only.
We got a ReplicationException timeout
Received Throwable from remote node org.jboss.cache.ReplicationException: rsp=sender=172.16.11.10:41912, retval=null, received=false, suspected=false
The code is as follows
<attribute name="ClusterConfig">
| <config>
| <!-- UDP: if you have a multihomed machine,
| set the bind_addr attribute to the appropriate NIC IP address, e.g bind_addr="192.168.0.2"
| -->
| <!-- UDP: On Windows machines, because of the media sense feature
| being broken with multicast (even after disabling media sense)
| set the loopback attribute to true -->
| <UDP mcast_addr="224.7.8.9" mcast_port="45567"
| ip_ttl="64" ip_mcast="true"
| mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
| ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
| loopback="true" bind_addr="16.150.24.69"/>
| <PING timeout="2000" num_initial_members="3"
| up_thread="false" down_thread="false"/>
| <MERGE2 min_interval="10000" max_interval="20000"/>
| <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
| <FD_SOCK/>
| <VERIFY_SUSPECT timeout="1500"
| up_thread="false" down_thread="false"/>
| <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
| max_xmit_size="8192" up_thread="false" down_thread="false"/>
| <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
| down_thread="false"/>
| <pbcast.STABLE desired_avg_gossip="20000"
| up_thread="false" down_thread="false"/>
| <FRAG frag_size="8192"
| down_thread="false" up_thread="false"/>
| <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
| shun="true" print_local_addr="true"/>
| <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
| </config>
| </attribute>
>From the exception message we infer that 172.16.11.10:41912, this cache instance has been restarted and the current active instance was 172.16.11.10:51722
View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4221191#4221191
Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4221191
More information about the jboss-user
mailing list