[jboss-user] [JBoss Cache] - SuspectException seen by one node when the other node in the cluster goes down

Aditi Andhare do-not-reply at jboss.com
Tue Nov 9 05:14:18 EST 2010


Aditi Andhare [http://community.jboss.org/people/aditi.andhare] created the discussion

"SuspectException seen by one node when the other node in the cluster goes down"

To view the discussion, visit: http://community.jboss.org/message/570224#570224

--------------------------------------------------------------
Hi all,

We are using the following configuration:
jGroups 3.2.0GA
jboss cache 3.2.0 GA
jboss AS 5.1.0 GA

I have two nodes in my clustered setup. Consider the case when there is continous load on both of these nodes. Now I stop one of the nodes in the cluster. The other node which is still active sees the SuspectException for a very few transactions hence affecting transactions performed by the active node. Here is the stack trace:

org.jboss.cache.SuspectException: Suspected member: 10.17.221.19:59378
        at org.jboss.cache.RPCManagerImpl.callRemoteMethods(RPCManagerImpl.java:764)
        at org.jboss.cache.RPCManagerImpl.callRemoteMethods(RPCManagerImpl.java:716)
        at org.jboss.cache.RPCManagerImpl.callRemoteMethods(RPCManagerImpl.java:721)
        at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:161)
        at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:135)
        at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:107)
        at org.jboss.cache.interceptors.ReplicationInterceptor.handleCrudMethod(ReplicationInterceptor.java:160)
        at org.jboss.cache.interceptors.ReplicationInterceptor.visitPutDataMapCommand(ReplicationInterceptor.java:113)
        at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
        at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
        at org.jboss.cache.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:131)
        at org.jboss.cache.commands.AbstractVisitor.visitPutDataMapCommand(AbstractVisitor.java:60)
        at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
        at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
        at org.jboss.cache.interceptors.TxInterceptor.attachGtxAndPassUpChain(TxInterceptor.java:301)
        at org.jboss.cache.interceptors.TxInterceptor.handleDefault(TxInterceptor.java:283)
        at org.jboss.cache.commands.AbstractVisitor.visitPutDataMapCommand(AbstractVisitor.java:60)
        at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
        at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
        at org.jboss.cache.interceptors.CacheMgmtInterceptor.visitPutDataMapCommand(CacheMgmtInterceptor.java:97)
        at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
        at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
        at org.jboss.cache.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:178)
        at org.jboss.cache.interceptors.InvocationContextInterceptor.visitPutDataMapCommand(InvocationContextInterceptor.java:64)
        at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
        at org.jboss.cache.interceptors.InterceptorChain.invoke(InterceptorChain.java:287)
        at org.jboss.cache.invocation.CacheInvocationDelegate.invokePut(CacheInvocationDelegate.java:705)
        at org.jboss.cache.invocation.CacheInvocationDelegate.put(CacheInvocationDelegate.java:519)
        at org.jboss.cache.invocation.NodeInvocationDelegate.addChild(NodeInvocationDelegate.java:337)
        at com.openwave.servicebroker.util.CacheUtil.createNode(CacheUtil.java:204)
        at com.openwave.servicebroker.ServiceBrokerImpl.serviceRequest(ServiceBrokerImpl.java:838)
        at servicebroker.ServiceBroker$Processor$serviceRequest.process(ServiceBroker.java:809)
        at servicebroker.ServiceBroker$Processor.process(ServiceBroker.java:626)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

The code statement which causes the above exception is not in a transaction. Here is the FD configurations in the conf files:
         <FD_SOCK/>
         <FD max_tries="20" shun="false" timeout="60000"/>
         <VERIFY_SUSPECT timeout="1500"/>

Upon analysis of logs I see that this exception is always caused during the short time gap when one node is suspected and when the cluster recieves a new view.

2010-10-29 *03:04:12,480* INFO  [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.ClusterOne] (VERIFY_SUSPECT.TimerThread,ClusterOne,10.17.221.18:48782) Suspected member: 10.17.221.19:34650
2010-10-29 *03:04:12,534* INFO  [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.ClusterOne] (Incoming-17,10.17.221.18:48782) New cluster view for partition ClusterOne (id: 4, delta: -1) : [10.17.221.18:1099]

So my question is:
1. Is this an expected behaviour?
2. Whenever a member in a cluster goes down, will all the active transactions seen by the other active members in the cluster fail due to the Suspect Exception?
3. Or are there any configuration settings that I am missing out here?

Thanks in advance for all your help.
Aditi
--------------------------------------------------------------

Reply to this message by going to Community
[http://community.jboss.org/message/570224#570224]

Start a new discussion in JBoss Cache at Community
[http://community.jboss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2052]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/jboss-user/attachments/20101109/c7d06a79/attachment.html 


More information about the jboss-user mailing list