[mod_cluster-issues] [JBoss JIRA] (MODCLUSTER-434) Enable workers to tell the balancer they are overloaded at a certain threshold

Mon Aug 17 08:05:27 EDT 2015

     [ https://issues.jboss.org/browse/MODCLUSTER-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Husar reassigned MODCLUSTER-434:
-----------------------------------------

    Assignee: Radoslav Husar  (was: Michal Karm Babacek)


> Enable workers to tell the balancer they are overloaded at a certain threshold
> ------------------------------------------------------------------------------
>
>                 Key: MODCLUSTER-434
>                 URL: https://issues.jboss.org/browse/MODCLUSTER-434
>             Project: mod_cluster
>          Issue Type: Enhancement
>    Affects Versions: 1.2.10.Final, 1.3.1.Final
>            Reporter: Michal Karm Babacek
>            Assignee: Radoslav Husar
>            Priority: Minor
>
> The discussion on [Server load threshold|https://developer.jboss.org/message/907660], initiated by [~leaqui] and [~godiedelrio], revealed that there might be a need for having a way how one could enable workers to trigger failover at a certain load threshold. For instance, one might want to consider nodes in the cluster desperately overloaded at Load 40 and have failover triggered.
> Formerly, similar behavior could have been achieved with a custom load metric returning Load 0, thus marking the node as "stand by" one. This is no longer possible though: [MODCLUSTER-279 mod_cluster returns 503s after STATUS|https://github.com/modcluster/mod_cluster/commit/1fbff789481f48409a483462e258e94dc3e7f897]
> I see some viable ways for adding the described logic into the current code base:
> h3. 1. Leveraging Load: -1 Worker in Error or Load: 0 Stand-by worker
> We could simply modify [DynamicLoadBalanceFactorProvider.java|https://github.com/modcluster/mod_cluster/blob/93d021c6ba4a0276011106c97886516dcbe74e9b/core/src/main/java/org/jboss/modcluster/load/impl/DynamicLoadBalanceFactorProvider.java#L106] so as it doesn't apply ceiling|floor normalization {{if load == -1}}. The question is what shall happen when we want the node back? Would returning Load > 0 make the balancer to change the node's status from *NOTOK* to *OK* again even after removal period? I think yes, because the node would keep responding to cping/cpong. This way seems legit, IMHO.
>  
> h3. 2. {{STOP-APP}} message
> Well, hey, don't we actually have a message for this? :-) What if we allow custom load metric to access mod_cluster subsystem API so as it could send {{STOP-APP}} message with {{Range=NODE}}? One could send {{ENABLE-APP}} as soon as the load settles below the threshold. 
> h3. 3. An additional _threshold_ attribute
> I don't favor this one, but it's still an option. We might add a configuration attribute {{threshold}} to both mod_cluster subsystem and native mod_proxy_cluster code, hence allowing us to set a certain threshold on per-worker basis. Then, in function [internal_find_best_byrequests|https://github.com/modcluster/mod_cluster/blob/93d021c6ba4a0276011106c97886516dcbe74e9b/native/mod_proxy_cluster/mod_proxy_cluster.c#L2142] we might access this threshold and decide whether the lbfactor is below it.
> WDYT? 


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)