[jboss-dev-forums] [Design of Messaging on JBoss (Messaging/JBoss)] - Re: Channel.AUTO_RECONNECT

bstansberry@jboss.com do-not-reply at jboss.com
Tue Mar 25 10:33:26 EDT 2008


The failure detection protocol on another node considers the node to have failed [1] and the VERIFY_SUSPECT protocol on the coordinator node concurs [2].  The coordinator then publishes a new view that doesn't contain the suspect node. But the node hasn't entirely failed; it begins working again and tries to send messages to the group, of which it is no longer a member.

Typically this would be a case of something happening on the node that prevents it responding to FD heartbeat requests for a period.  Maybe a 100% CPU situation or something blocking all the threads that pass messages up from the transport protocol. (In JG 2.4 there is just one such thread, so if it gets blocked for a while in the app, the node can be suspected.)


[1] Details on the most common failure detection protocols: http://wiki.jboss.org/wiki/Wiki.jsp?page=JGroupsFD[/url] and [url]http://wiki.jboss.org/wiki/Wiki.jsp?page=JGroupsFD_SOCK[/url] as well as [url]http://wiki.jboss.org/wiki/Wiki.jsp?page=FDVersusFD_SOCK

[2] http://wiki.jboss.org/wiki/Wiki.jsp?page=JGroupsVERIFY_SUSPECT

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4138730#4138730

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4138730



More information about the jboss-dev-forums mailing list