Aditi Andhare [
http://community.jboss.org/people/aditi.andhare] created the discussion
"SuspectException seen by one node when the other node in the cluster goes
down"
To view the discussion, visit:
http://community.jboss.org/message/570224#570224
--------------------------------------------------------------
Hi all,
We are using the following configuration:
jGroups 3.2.0GA
jboss cache 3.2.0 GA
jboss AS 5.1.0 GA
I have two nodes in my clustered setup. Consider the case when there is continous load on
both of these nodes. Now I stop one of the nodes in the cluster. The other node which is
still active sees the SuspectException for a very few transactions hence affecting
transactions performed by the active node. Here is the stack trace:
org.jboss.cache.SuspectException: Suspected member: 10.17.221.19:59378
at org.jboss.cache.RPCManagerImpl.callRemoteMethods(RPCManagerImpl.java:764)
at org.jboss.cache.RPCManagerImpl.callRemoteMethods(RPCManagerImpl.java:716)
at org.jboss.cache.RPCManagerImpl.callRemoteMethods(RPCManagerImpl.java:721)
at
org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:161)
at
org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:135)
at
org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:107)
at
org.jboss.cache.interceptors.ReplicationInterceptor.handleCrudMethod(ReplicationInterceptor.java:160)
at
org.jboss.cache.interceptors.ReplicationInterceptor.visitPutDataMapCommand(ReplicationInterceptor.java:113)
at
org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
at
org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
at
org.jboss.cache.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:131)
at
org.jboss.cache.commands.AbstractVisitor.visitPutDataMapCommand(AbstractVisitor.java:60)
at
org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
at
org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
at
org.jboss.cache.interceptors.TxInterceptor.attachGtxAndPassUpChain(TxInterceptor.java:301)
at
org.jboss.cache.interceptors.TxInterceptor.handleDefault(TxInterceptor.java:283)
at
org.jboss.cache.commands.AbstractVisitor.visitPutDataMapCommand(AbstractVisitor.java:60)
at
org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
at
org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
at
org.jboss.cache.interceptors.CacheMgmtInterceptor.visitPutDataMapCommand(CacheMgmtInterceptor.java:97)
at
org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
at
org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
at
org.jboss.cache.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:178)
at
org.jboss.cache.interceptors.InvocationContextInterceptor.visitPutDataMapCommand(InvocationContextInterceptor.java:64)
at
org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
at
org.jboss.cache.interceptors.InterceptorChain.invoke(InterceptorChain.java:287)
at
org.jboss.cache.invocation.CacheInvocationDelegate.invokePut(CacheInvocationDelegate.java:705)
at
org.jboss.cache.invocation.CacheInvocationDelegate.put(CacheInvocationDelegate.java:519)
at
org.jboss.cache.invocation.NodeInvocationDelegate.addChild(NodeInvocationDelegate.java:337)
at com.openwave.servicebroker.util.CacheUtil.createNode(CacheUtil.java:204)
at
com.openwave.servicebroker.ServiceBrokerImpl.serviceRequest(ServiceBrokerImpl.java:838)
at
servicebroker.ServiceBroker$Processor$serviceRequest.process(ServiceBroker.java:809)
at servicebroker.ServiceBroker$Processor.process(ServiceBroker.java:626)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
The code statement which causes the above exception is not in a transaction. Here is the
FD configurations in the conf files:
<FD_SOCK/>
<FD max_tries="20" shun="false"
timeout="60000"/>
<VERIFY_SUSPECT timeout="1500"/>
Upon analysis of logs I see that this exception is always caused during the short time gap
when one node is suspected and when the cluster recieves a new view.
2010-10-29 *03:04:12,480* INFO
[org.jboss.ha.framework.interfaces.HAPartition.lifecycle.ClusterOne]
(VERIFY_SUSPECT.TimerThread,ClusterOne,10.17.221.18:48782) Suspected member:
10.17.221.19:34650
2010-10-29 *03:04:12,534* INFO
[org.jboss.ha.framework.interfaces.HAPartition.lifecycle.ClusterOne]
(Incoming-17,10.17.221.18:48782) New cluster view for partition ClusterOne (id: 4, delta:
-1) : [10.17.221.18:1099]
So my question is:
1. Is this an expected behaviour?
2. Whenever a member in a cluster goes down, will all the active transactions seen by the
other active members in the cluster fail due to the Suspect Exception?
3. Or are there any configuration settings that I am missing out here?
Thanks in advance for all your help.
Aditi
--------------------------------------------------------------
Reply to this message by going to Community
[
http://community.jboss.org/message/570224#570224]
Start a new discussion in JBoss Cache at Community
[
http://community.jboss.org/choose-container!input.jspa?contentType=1&...]