[
https://issues.jboss.org/browse/ISPN-1182?page=com.atlassian.jira.plugin....
]
Jacek Gerbszt commented on ISPN-1182:
-------------------------------------
I've played with repl_timeout, but without good results - the exeption appears sooner,
but the node stops working as earlier. To overcome this issue, I've made a dirty patch
from the description above, and problem has gone (for now).
Thanks Galder for your work on the issue, I'm waiting for a release:)
Failure after TimeoutException during the restart of HotRod Server
------------------------------------------------------------------
Key: ISPN-1182
URL:
https://issues.jboss.org/browse/ISPN-1182
Project: Infinispan
Issue Type: Bug
Components: Cache Server
Affects Versions: 5.0.0.CR4
Reporter: Jacek Gerbszt
Assignee: Galder Zamarreño
Fix For: 5.0.0.CR6
Attachments: hotrodexception.txt
Sometimes during restart of 3 or more HotRod nodes from 25-node cluster, I receive
replication timeout exception, after which the node is unusable.
The timeout comes from replacing the view in HotrodServer.addSelfToTopologyView. If 3
nodes try to replace the same element in cache at the same time, it's not a big
surprise, that they fall into some kind of deadlock, which is properly recognized and
broken after the timeout. But unfortunately the breaking exception is not handled and
stops the HotRodServer start procedure. I suggest to catch it in addSelfToTopologyView
like this:
var updated = false
try {
updated = topologyCache.replace("view", currentView, newView)
} catch {
case e: TimeoutException => logUnableToReplaceView
}
This time the exception will not be thrown from the containing closure and
updateTopologyView method will have the chance to replace the view again.
--
This message is automatically generated by JIRA.
For more information on JIRA, see:
http://www.atlassian.com/software/jira