Paul Ferraro wrote:
>>
>> If you are running mod_cluster via HAModClusterService and node1 crashed
>> (instead of a clean shut down), node2 will not send any MCMP messages to
>> httpd on behalf of node1 when it receives a view change.
> Ah, yes, I remember now ! I just killed -9 node1 and mod-cluster-manager
> didn't remove node1 from its worker list.
>
> Why don't we do this ? We cannot rely on a node being shutdown
> gracefully !
Because currently a given node cannot distinguish between a crashed
member and a network partition
OK, so {A,B} would remove C and D and {C,D} would remove A and B...
How about we send an MCMP SUSPECT message to httpd ? This would cause
httpd to double-check, e.g. via the regular AJP cping/cpong or socket
connection checks, and if the doube-check fails, remove a worker ?
For the regular case (a crash), this would be faster than having to wait
until httpd checks the connection.
So a view change would be nothing else than trigger an immediate
connection check on httpd.
--
Bela Ban
Lead JGroups / Clustering Team
JBoss