[mod_cluster-dev] Configuring checking a crashed worker

Thu Aug 13 12:18:57 EDT 2009

jean-frederic clere wrote:
> On 08/13/2009 04:56 PM, Bela Ban wrote:
>> But this works in mod_jk doesn't it ?
>
> mod_jk has a static configuration it is not going to remove something 
> from the configuration.
>
>>
>> Besides, when we have *non-clustered* workers, there is no HA singleton
>> telling httpd to remove the crashed worker, so httpd has to do it 
>> itself.
>>
>> I mean, why can't httpd simply remove a worker W is the socket
>> connection to W is closed (by W crashing) ?
>
> First an entry corresponding shouldn't disturb mod_cluster, see 
> MODCLUSTER-92 about telling it is broken. Of course it would possible 
> to remove broken workers after a while, but I think it must be a 
> switch-able option and the while should be a parameter... Please 
> create a JIRA.

I think what should happen is
- httpd periodically tests the connection to worker W
- httpd detects connection loss to W
- httpd removes W from its worker tables, so requests are not dispatched 
to W
- httpd also starts a timer, going off in configurable intervals, which 
tests W again. If W comes up again, httpd adds it again

Actually, I think the last point is not necessary, because when W is 
started again, mod-cluster/jboss will tell httpd ! We should actually 
*not* implement the last point because W might never get started again !

WDYT ?

-- 
Bela Ban
Lead JGroups / Clustering Team
JBoss