[mod_cluster-issues] [JBoss JIRA] (MODCLUSTER-434) Enable workers to tell the balancer they are overloaded at a certain threshold

Wed Oct 22 06:52:35 EDT 2014

     [ https://issues.jboss.org/browse/MODCLUSTER-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michal Babacek updated MODCLUSTER-434:
--------------------------------------
    Description: 
The discussion on [Server load threshold|https://developer.jboss.org/message/907660], initiated by [~leaqui] and [~godiedelrio], revealed that there might be a need for having a way how one could enable workers to trigger failover at a certain load threshold. For instance, one might want to consider nodes in the cluster desperately overloaded at Load 40 and have failover triggered.

Formerly, similar behavior could have been achieved with a custom load metric returning Load 0, thus marking the node as "stand by" one. This is no longer possible though: [MODCLUSTER-279 mod_cluster returns 503s after STATUS|https://github.com/modcluster/mod_cluster/commit/1fbff789481f48409a483462e258e94dc3e7f897]

I see some viable ways for adding the described logic into the current code base:

h3. 1. Leveraging Load: -1 Worker in Error or Load: 0 Stand-by worker
We could simply modify [DynamicLoadBalanceFactorProvider.java|https://github.com/modcluster/mod_cluster/blob/93d021c6ba4a0276011106c97886516dcbe74e9b/core/src/main/java/org/jboss/modcluster/load/impl/DynamicLoadBalanceFactorProvider.java#L106] so as it doesn't apply ceiling|floor normalization {{if load == -1}}. The question is what shall happen when we want the node back? Would returning Load > 0 make the balancer to change the node's status from *NOTOK* to *OK* again even after removal period? I think yes, because the node would keep responding to cping/cpong. This way seems legit, IMHO.

h3. 2. {{STOP-APP}} message
Well, hey, don't we actually have a message for this? :-) What if we allow custom load metric to access mod_cluster subsystem API so as it could send {{STOP-APP}} message with {{Range=NODE}}? One could send {{ENABLE-APP}} as soon as the load settles below the threshold. 

h3. 3. An additional _threshold_ attribute
I don't favor this one, but it's still an option. We might add a configuration attribute {{threshold}} to both mod_cluster subsystem and native mod_proxy_cluster code, hence allowing us to set a certain threshold on per-worker basis. Then, in function [internal_find_best_byrequests|https://github.com/modcluster/mod_cluster/blob/93d021c6ba4a0276011106c97886516dcbe74e9b/native/mod_proxy_cluster/mod_proxy_cluster.c#L2142] we might access this threshold and decide whether the lbfactor is below it.

WDYT? 

  was:
The discussion on [Server load threshold|https://developer.jboss.org/message/907660], initiated by [~leaqui] and [~godiedelrio], revealed that there might be a need for having a way how one could enable workers to trigger failover at a certain load threshold. For instance, one might want to consider nodes in the cluster desperately overloaded at Load 40 and have failover triggered.

Formerly, similar behaviour could have been achieved with a custom load metric returning Load 0, thus marking the node as "stand by" one. This is no longer possible though: [MODCLUSTER-279 mod_cluster returns 503s after STATUS|https://github.com/modcluster/mod_cluster/commit/1fbff789481f48409a483462e258e94dc3e7f897]

I see some viable ways for adding the described logic into the current code base:

h3. 1. Leveraging Load: -1 Worker in Error or Load: 0 Stand-by worker
We could simply modify [DynamicLoadBalanceFactorProvider.java|https://github.com/modcluster/mod_cluster/blob/93d021c6ba4a0276011106c97886516dcbe74e9b/core/src/main/java/org/jboss/modcluster/load/impl/DynamicLoadBalanceFactorProvider.java#L106] so as it doesn't apply ceiling|floor normalization {{if load == -1}}. The question is what shall happen when we want the node back? Would returning Load > 0 make the balancer to change the node's status from *NOTOK* to *OK* again even after removal period? I think yes, because the node would keep responding to cping/cpong. This way seems legit, IMHO.

h3. 2. {{STOP-APP}} message
Well, hey, don't we actually have a message for this? :-) What if we allow custom load metric to access mod_cluster subsustem API so as it could send {{STOP-APP}} message with {{Range=NODE}}? One could send {{ENABLE-APP}} as soon as the load settles below the threshold. 

h3. 3. An additional _threshold_ attribute
I don't favor this one, but it's still an option. We might add a configuration attribute {{threshold}} to both mod_cluster subsystem and native mod_proxy_cluster code, hence allowing us to set a certain threshold on per-worker basis. Then, in function [internal_find_best_byrequests|https://github.com/modcluster/mod_cluster/blob/93d021c6ba4a0276011106c97886516dcbe74e9b/native/mod_proxy_cluster/mod_proxy_cluster.c#L2142] we might access this threshold and decide whether the lbfactor is below it.

WDYT? 

> Enable workers to tell the balancer they are overloaded at a certain threshold
> ------------------------------------------------------------------------------
>
>                 Key: MODCLUSTER-434
>                 URL: https://issues.jboss.org/browse/MODCLUSTER-434
>             Project: mod_cluster
>          Issue Type: Enhancement
>            Reporter: Michal Babacek
>            Assignee: Michal Babacek
>            Priority: Minor
>
> The discussion on [Server load threshold|https://developer.jboss.org/message/907660], initiated by [~leaqui] and [~godiedelrio], revealed that there might be a need for having a way how one could enable workers to trigger failover at a certain load threshold. For instance, one might want to consider nodes in the cluster desperately overloaded at Load 40 and have failover triggered.
> Formerly, similar behavior could have been achieved with a custom load metric returning Load 0, thus marking the node as "stand by" one. This is no longer possible though: [MODCLUSTER-279 mod_cluster returns 503s after STATUS|https://github.com/modcluster/mod_cluster/commit/1fbff789481f48409a483462e258e94dc3e7f897]
> I see some viable ways for adding the described logic into the current code base:
> h3. 1. Leveraging Load: -1 Worker in Error or Load: 0 Stand-by worker
> We could simply modify [DynamicLoadBalanceFactorProvider.java|https://github.com/modcluster/mod_cluster/blob/93d021c6ba4a0276011106c97886516dcbe74e9b/core/src/main/java/org/jboss/modcluster/load/impl/DynamicLoadBalanceFactorProvider.java#L106] so as it doesn't apply ceiling|floor normalization {{if load == -1}}. The question is what shall happen when we want the node back? Would returning Load > 0 make the balancer to change the node's status from *NOTOK* to *OK* again even after removal period? I think yes, because the node would keep responding to cping/cpong. This way seems legit, IMHO.
>  
> h3. 2. {{STOP-APP}} message
> Well, hey, don't we actually have a message for this? :-) What if we allow custom load metric to access mod_cluster subsystem API so as it could send {{STOP-APP}} message with {{Range=NODE}}? One could send {{ENABLE-APP}} as soon as the load settles below the threshold. 
> h3. 3. An additional _threshold_ attribute
> I don't favor this one, but it's still an option. We might add a configuration attribute {{threshold}} to both mod_cluster subsystem and native mod_proxy_cluster code, hence allowing us to set a certain threshold on per-worker basis. Then, in function [internal_find_best_byrequests|https://github.com/modcluster/mod_cluster/blob/93d021c6ba4a0276011106c97886516dcbe74e9b/native/mod_proxy_cluster/mod_proxy_cluster.c#L2142] we might access this threshold and decide whether the lbfactor is below it.
> WDYT? 

--
This message was sent by Atlassian JIRA
(v6.3.1#6329)