[jboss-user] [JBoss Cache: Core Edition] - Cluster hanging when a member is non responsive
ajassal
do-not-reply at jboss.com
Tue Dec 30 00:58:04 EST 2008
I have a jboss cache cluster with 2 nodes. When one of the nodes is over-loaded or is running into OOM issues, the other node also becomes non-responsive. A thread dump on the (non-OOM) instance shows jboss cache threads waiting on a lock (excerpt below).
Do I need to tweak the failure detection protocol somehow?
Configuration:
Version: 2.2.1.GA
Codename: Poblano
Replication mode: REPL_ASYNC
<!-- UDP: if you have a multihomed machine,
set the bind_addr attribute to the appropriate NIC IP address -->
<!-- UDP: On Windows machines, because of the media sense feature
being broken with multicast (even after disabling media sense)
set the loopback attribute to true -->
<UDP mcast_addr="228.8.8.8" mcast_port="45567"
bind_addr="127.0.0.1" bind_to_all_interfaces="false"
ip_ttl="64" ip_mcast="true" mcast_send_buf_size="150000"
mcast_recv_buf_size="80000" ucast_send_buf_size="150000"
ucast_recv_buf_size="80000" loopback="false" />
<PING timeout="2000" num_initial_members="3" />
<MERGE2 min_interval="10000" max_interval="20000" />
<FD_SOCK/>
<VERIFY_SUSPECT timeout="1500" />
<pbcast.NAKACK gc_lag="50"
retransmit_timeout="600,1200,2400,4800" />
<pbcast.STABLE desired_avg_gossip="20000" />
<FRAG frag_size="8192" />
<pbcast.GMS join_timeout="5000" shun="true" print_local_addr="true" />
<pbcast.STATE_TRANSFER />
Thread dump:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aaacd330d30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1889)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254)
at org.jgroups.blocks.BasicConnectionTable$Connection.send(BasicConnectionTable.java:499)
at org.jgroups.blocks.BasicConnectionTable.send(BasicConnectionTable.java:322)
at org.jgroups.protocols.TCP.send(TCP.java:55)
at org.jgroups.protocols.BasicTCP.sendToSingleMember(BasicTCP.java:209)
at org.jgroups.protocols.BasicTCP.sendToAllMembers(BasicTCP.java:194)
at org.jgroups.protocols.TP.doSend(TP.java:1476)
at org.jgroups.protocols.TP.send(TP.java:1466)
at org.jgroups.protocols.TP.down(TP.java:1187)
at org.jgroups.protocols.Discovery.down(Discovery.java:373)
at org.jgroups.protocols.MERGE2.down(MERGE2.java:175)
at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:367)
at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:95)
at org.jgroups.protocols.pbcast.NAKACK.send(NAKACK.java:787)
at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:589)
at org.jgroups.protocols.UNICAST.down(UNICAST.java:462)
at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:316)
at org.jgroups.protocols.FRAG.down(FRAG.java:136)
at org.jgroups.protocols.pbcast.GMS.down(GMS.java:858)
at org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:200)
at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:457)
at org.jgroups.JChannel.downcall(JChannel.java:1474)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:780)
at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:303)
at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:545)
at org.jgroups.blocks.GroupRequest.execute(GroupRequest.java:228)
at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:457)
at org.jboss.cache.marshall.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:102)
at org.jboss.cache.RPCManagerImpl.callRemoteMethods(RPCManagerImpl.java:403)
at org.jboss.cache.RPCManagerImpl.callRemoteMethods(RPCManagerImpl.java:375)
at org.jboss.cache.RPCManagerImpl.callRemoteMethods(RPCManagerImpl.java:380)
at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:143)
at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:117)
at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:89)
at org.jboss.cache.interceptors.ReplicationInterceptor.handleCrudMethod(ReplicationInterceptor.java:139)
at org.jboss.cache.interceptors.ReplicationInterceptor.visitPutKeyValueCommand(ReplicationInterceptor.java:86)
at org.jboss.cache.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:92)
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4198856#4198856
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4198856
More information about the jboss-user
mailing list