[jboss-user] [JBoss Cache: Core Edition] - Cluster hanging when a member is non responsive

ajassal do-not-reply at jboss.com
Tue Dec 30 00:58:04 EST 2008


I have a jboss cache cluster with 2 nodes. When one of the nodes is over-loaded or is running into OOM issues, the other node also becomes non-responsive. A thread dump on the (non-OOM) instance shows jboss cache threads waiting on a lock (excerpt below).

Do I need to tweak the failure detection protocol somehow?

Configuration:

Version: 	2.2.1.GA
Codename: 	Poblano

Replication mode: REPL_ASYNC

        
            
                <!-- UDP: if you have a multihomed machine,
                    set the bind_addr attribute to the appropriate NIC IP address -->
                <!-- UDP: On Windows machines, because of the media sense feature
                    being broken with multicast (even after disabling media sense)
                    set the loopback attribute to true -->
                <UDP mcast_addr="228.8.8.8" mcast_port="45567"
                    bind_addr="127.0.0.1" bind_to_all_interfaces="false"
                    ip_ttl="64" ip_mcast="true" mcast_send_buf_size="150000"
                    mcast_recv_buf_size="80000" ucast_send_buf_size="150000"
                    ucast_recv_buf_size="80000" loopback="false" />
                <PING timeout="2000" num_initial_members="3" />
                <MERGE2 min_interval="10000" max_interval="20000" />
                <FD_SOCK/>
                <VERIFY_SUSPECT timeout="1500" />
                <pbcast.NAKACK gc_lag="50"
                    retransmit_timeout="600,1200,2400,4800" />
                
                <pbcast.STABLE desired_avg_gossip="20000" />
                <FRAG frag_size="8192" />
                <pbcast.GMS join_timeout="5000" shun="true" print_local_addr="true" />
                <pbcast.STATE_TRANSFER />
            
        


Thread dump:

   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00002aaacd330d30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1889)
    at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254)
    at org.jgroups.blocks.BasicConnectionTable$Connection.send(BasicConnectionTable.java:499)
    at org.jgroups.blocks.BasicConnectionTable.send(BasicConnectionTable.java:322)
    at org.jgroups.protocols.TCP.send(TCP.java:55)
    at org.jgroups.protocols.BasicTCP.sendToSingleMember(BasicTCP.java:209)
    at org.jgroups.protocols.BasicTCP.sendToAllMembers(BasicTCP.java:194)
    at org.jgroups.protocols.TP.doSend(TP.java:1476)
    at org.jgroups.protocols.TP.send(TP.java:1466)
    at org.jgroups.protocols.TP.down(TP.java:1187)
    at org.jgroups.protocols.Discovery.down(Discovery.java:373)
    at org.jgroups.protocols.MERGE2.down(MERGE2.java:175)
    at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:367)
    at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:95)
    at org.jgroups.protocols.pbcast.NAKACK.send(NAKACK.java:787)
    at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:589)
    at org.jgroups.protocols.UNICAST.down(UNICAST.java:462)
    at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:316)
    at org.jgroups.protocols.FRAG.down(FRAG.java:136)
    at org.jgroups.protocols.pbcast.GMS.down(GMS.java:858)
    at org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:200)
    at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:457)
    at org.jgroups.JChannel.downcall(JChannel.java:1474)
    at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:780)
    at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:303)
    at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:545)
    at org.jgroups.blocks.GroupRequest.execute(GroupRequest.java:228)
    at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:457)
    at org.jboss.cache.marshall.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:102)
    at org.jboss.cache.RPCManagerImpl.callRemoteMethods(RPCManagerImpl.java:403)
    at org.jboss.cache.RPCManagerImpl.callRemoteMethods(RPCManagerImpl.java:375)
    at org.jboss.cache.RPCManagerImpl.callRemoteMethods(RPCManagerImpl.java:380)
    at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:143)
    at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:117)
    at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:89)
    at org.jboss.cache.interceptors.ReplicationInterceptor.handleCrudMethod(ReplicationInterceptor.java:139)
    at org.jboss.cache.interceptors.ReplicationInterceptor.visitPutKeyValueCommand(ReplicationInterceptor.java:86)
    at org.jboss.cache.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:92)




View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4198856#4198856

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4198856



More information about the jboss-user mailing list