[infinispan-issues] [JBoss JIRA] Updated: (ISPN-1182) Failure after TimeoutException during the restart of HotRod Server
Dan Berindei (JIRA)
jira-events at lists.jboss.org
Fri Jun 17 04:49:23 EDT 2011
[ https://issues.jboss.org/browse/ISPN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dan Berindei updated ISPN-1182:
-------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
* Suspect and timeout exceptions are now handled when trying to update the topology view on startup so that it can be retried.
* The amount of time to wait for topology to be updated is now configurable.
* Removing self from topology on stoppage is done with 0 lock timeout and silent failure, to avoid blocking stop procedure. Crashed member detection listener can deal with nodes having issues at stop time.
* Tests have been enhanced so that if a failure is received, the error message is printed in the failure message, giving more clues about the error.
> Failure after TimeoutException during the restart of HotRod Server
> ------------------------------------------------------------------
>
> Key: ISPN-1182
> URL: https://issues.jboss.org/browse/ISPN-1182
> Project: Infinispan
> Issue Type: Bug
> Components: Cache Server
> Affects Versions: 5.0.0.CR4
> Reporter: Jacek Gerbszt
> Assignee: Galder ZamarreƱo
> Fix For: 5.0.0.CR6
>
> Attachments: hotrodexception.txt
>
>
> Sometimes during restart of 3 or more HotRod nodes from 25-node cluster, I receive replication timeout exception, after which the node is unusable.
> The timeout comes from replacing the view in HotrodServer.addSelfToTopologyView. If 3 nodes try to replace the same element in cache at the same time, it's not a big surprise, that they fall into some kind of deadlock, which is properly recognized and broken after the timeout. But unfortunately the breaking exception is not handled and stops the HotRodServer start procedure. I suggest to catch it in addSelfToTopologyView like this:
> var updated = false
> try {
> updated = topologyCache.replace("view", currentView, newView)
> } catch {
> case e: TimeoutException => logUnableToReplaceView
> }
> This time the exception will not be thrown from the containing closure and updateTopologyView method will have the chance to replace the view again.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list