Ah - this solves the partition problem nicely.
I think a more primitive command, like PING <jvmRoute> is preferable,
since it simplifies the proxy by keeping the suspect/remove logic on the
server-side.
On Wed, 2009-07-22 at 12:01 -0400, Brian Stansberry wrote:
+1.
The MCMP SUSPECT should get a response informing what suspected nodes
are/aren't still alive. That information can then be made available to
management tools.
Bela Ban wrote:
>
> Paul Ferraro wrote:
>>>> If you are running mod_cluster via HAModClusterService and node1
crashed
>>>> (instead of a clean shut down), node2 will not send any MCMP messages
to
>>>> httpd on behalf of node1 when it receives a view change.
>>> Ah, yes, I remember now ! I just killed -9 node1 and mod-cluster-manager
>>> didn't remove node1 from its worker list.
>>>
>>> Why don't we do this ? We cannot rely on a node being shutdown
>>> gracefully !
>> Because currently a given node cannot distinguish between a crashed
>> member and a network partition
>
> OK, so {A,B} would remove C and D and {C,D} would remove A and B...
>
> How about we send an MCMP SUSPECT message to httpd ? This would cause
> httpd to double-check, e.g. via the regular AJP cping/cpong or socket
> connection checks, and if the doube-check fails, remove a worker ?
>
> For the regular case (a crash), this would be faster than having to wait
> until httpd checks the connection.
>
> So a view change would be nothing else than trigger an immediate
> connection check on httpd.