[jboss-dev-forums] [Design of Messaging on JBoss (Messaging/JBoss)] - Re: Channel.AUTO_RECONNECT
bstansberry@jboss.com
do-not-reply at jboss.com
Tue Mar 25 10:33:26 EDT 2008
The failure detection protocol on another node considers the node to have failed [1] and the VERIFY_SUSPECT protocol on the coordinator node concurs [2]. The coordinator then publishes a new view that doesn't contain the suspect node. But the node hasn't entirely failed; it begins working again and tries to send messages to the group, of which it is no longer a member.
Typically this would be a case of something happening on the node that prevents it responding to FD heartbeat requests for a period. Maybe a 100% CPU situation or something blocking all the threads that pass messages up from the transport protocol. (In JG 2.4 there is just one such thread, so if it gets blocked for a while in the app, the node can be suspected.)
[1] Details on the most common failure detection protocols: http://wiki.jboss.org/wiki/Wiki.jsp?page=JGroupsFD[/url] and [url]http://wiki.jboss.org/wiki/Wiki.jsp?page=JGroupsFD_SOCK[/url] as well as [url]http://wiki.jboss.org/wiki/Wiki.jsp?page=FDVersusFD_SOCK
[2] http://wiki.jboss.org/wiki/Wiki.jsp?page=JGroupsVERIFY_SUSPECT
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4138730#4138730
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4138730
More information about the jboss-dev-forums
mailing list