[infinispan-issues] [JBoss JIRA] (ISPN-2402) Cache operations or transactions should never fail with SuspectException

Fri Feb 15 06:36:56 EST 2013

    [ https://issues.jboss.org/browse/ISPN-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754383#comment-12754383 ] 

Sanne Grinovero commented on ISPN-2402:
---------------------------------------

I've also hit failures on simple Cache operations because of 

{code}2013-02-14 18:50:08,520 WARN  [CacheTopologyControlCommand] (OOB-1,Infinispan-Query-Cluster,NodeA-17773) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=LuceneIndexesData, type=REBALANCE_CONFIRM, sender=NodeD-64761, joinInfo=null, topologyId=8, currentCH=null, pendingCH=null, throwable=null, viewId=3}
org.infinispan.CacheException: Received invalid rebalance confirmation from NodeD-64761 for cache LuceneIndexesData, we don't have a rebalance in progress
	at org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:206)
	at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:160)
	at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137)
	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:253)
	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:220)
	at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:484)
	at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:391)
	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:249)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
	at org.jgroups.JChannel.up(JChannel.java:707)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
	at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
	at org.jgroups.protocols.FC.up(FC.java:479)
	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:245)
	at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:765)
	at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:420)
	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
	at org.jgroups.protocols.Discovery.up(Discovery.java:359)
	at org.jgroups.protocols.TP.passMessageUp(TP.java:1263)
	at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1825)
	at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1798)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886){code}

Dan, is part of this same issue or should I open a separate one?

> Cache operations or transactions should never fail with SuspectException
> ------------------------------------------------------------------------
>
>                 Key: ISPN-2402
>                 URL: https://issues.jboss.org/browse/ISPN-2402
>             Project: Infinispan
>          Issue Type: Task
>          Components: RPC, State transfer
>    Affects Versions: 5.2.0.Beta2
>            Reporter: Dan Berindei
>            Assignee: Dan Berindei
>             Fix For: 5.3.0.Alpha1
>
>         Attachments: vrstt.log
>
>
> This is an extension of ISPN-1896 of sorts, but for all the cache operations that are visible to the user.
> After a node leaves, the other nodes that have sent commands to that node should either ignore SuspectExceptions or, if not possible, they should retry the operation (e.g. if they didn't get any response back).
> For example, VersionReplStateTransferTest quite often on my machine with a SuspectException, because the versioned prepare command expects a response from the coordinator and the coordinator has just left.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira