It's not just FC that might block for credits from crashed members (by pulling the
plug), but also all cluster-wide calls with GET_ALL will either block or time out (if
configured) because they won't get a response from the 'pulled' members. Note
that of course this doesn't apply to GET_FIRST or other modes, e.g. a response filter:
a response filter could handle SUSPECT messages, and terminate a call if the only
responses missing are one from suspected members.
If we decrease that timeout, we might run into false suspicions again. However, with 2.5
and later versions, since we use out-of-band (OOB) messages for heartbeats, which are
handled by a separate thread pool, we should at least not run into the problem that a
heartbeat is not handled because it is stuck behind a regular message, as this won't
be the case.
So in this light, +1 for reducing the timeout in FD, BUT ONLY in 2.6.x, *NOT* 2.4.x !
Changing to FD_ALL from FD in a future version will also help.
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4160466#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...