IIRC, the 1 min deployment scenario was due to a deadlock where the AS code used the
JGroups up_handler to make an RPC, thus preventing the RPC response from arriving.
Wasn't this a bug that was fixed?
In that case, it was the node sending the RPC that was faulty. In some other case where a
remote node "isn't responding" all you could do would be to send a message
to "commit suicide" -- there's no mechanism to evict a node from the group
outside of JGroups' own failure detection. But if the node isn't responding to
RPCs, it likely wouldn't respond to the "commit suicide" either.
Logically, I could see some benefit in some sort of self-healing approach where cluster
members detect faults and restart themselves or send commands to others telling them to
restart. But this will take a lot of thought.
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4120188#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...