[infinispan-issues] [JBoss JIRA] Commented: (ISPN-1182) Failure after TimeoutException during the restart of HotRod Server

Wed Jun 15 10:15:29 EDT 2011

    [ https://issues.jboss.org/browse/ISPN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608859#comment-12608859 ] 

Galder Zamarreño commented on ISPN-1182:
----------------------------------------

@Jacek, I've done a further investigation about ISPN-448 and it's not as straightforward to implement and so I think it's a bit too late in the dev cycle of 5.0.0 to develop it. So, the scope of this JIRA will be limited to fixing the issue you're reporting.

By the way, remember that you can control the topology cache replication timeout and lock timeout via infinispan.server.topology.repl_timeout and infinispan.server.topology.lock_timeout properties respectively, and so you can maybe tweak them if you expect concurrent startups. 

> Failure after TimeoutException during the restart of HotRod Server
> ------------------------------------------------------------------
>
>                 Key: ISPN-1182
>                 URL: https://issues.jboss.org/browse/ISPN-1182
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Cache Server
>    Affects Versions: 5.0.0.CR4
>            Reporter: Jacek Gerbszt
>            Assignee: Galder Zamarreño
>             Fix For: 5.0.0.CR7
>
>         Attachments: hotrodexception.txt
>
>
> Sometimes during restart of 3 or more HotRod nodes from 25-node cluster, I receive replication timeout exception, after which the node is unusable. 
> The timeout comes from replacing the view in HotrodServer.addSelfToTopologyView. If 3 nodes try to replace the same element in cache at the same time, it's not a big surprise, that they fall into some kind of deadlock, which is properly recognized and broken after the timeout. But unfortunately the breaking exception is not handled and stops the HotRodServer start procedure. I suggest to catch it in addSelfToTopologyView like this:
> 	    var updated = false
>             try {
>                 updated = topologyCache.replace("view", currentView, newView)
>             } catch {
>                 case e: TimeoutException => logUnableToReplaceView
>             }
> This time the exception will not be thrown from the containing closure and updateTopologyView method will have the chance to replace the view again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira