[jboss-dev-forums] [Design of Clustering on JBoss (Clusters/JBoss)] - Re: JBPAPP-863 -- FC blocks during slow failure detection

bela@jboss.com do-not-reply at jboss.com
Wed Jun 25 04:57:39 EDT 2008


It's not just FC that might block for credits from crashed members (by pulling the plug), but also all cluster-wide calls with GET_ALL will either block or time out (if configured) because they won't get a response from the 'pulled' members. Note that of course this doesn't apply to GET_FIRST or other modes, e.g. a response filter: 
a response filter could handle SUSPECT messages, and terminate a call if the only responses missing are one from suspected members.

If we decrease that timeout, we might run into false suspicions again. However, with 2.5 and later versions, since we use out-of-band (OOB) messages for heartbeats, which are handled by a separate thread pool, we should at least not run into the problem that a heartbeat is not handled because it is stuck behind a regular message, as this won't be the case. 
So in this light, +1 for reducing the timeout in FD, BUT ONLY in 2.6.x, *NOT* 2.4.x !
Changing to FD_ALL from FD in a future version will also help.

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4160466#4160466

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4160466



More information about the jboss-dev-forums mailing list