[JBoss JIRA] (MODCLUSTER-528) Re-think catalina/tomcat dependencies so as it is possible to build without tomcat7 API
by Michal Karm Babacek (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-528?page=com.atlassian.jira.pl... ]
Michal Karm Babacek commented on MODCLUSTER-528:
------------------------------------------------
...[~rhusar] since you've asked what could be improved with Java libs, here you go :)
> Re-think catalina/tomcat dependencies so as it is possible to build without tomcat7 API
> ---------------------------------------------------------------------------------------
>
> Key: MODCLUSTER-528
> URL: https://issues.jboss.org/browse/MODCLUSTER-528
> Project: mod_cluster
> Issue Type: Enhancement
> Affects Versions: 1.3.3.Final
> Reporter: Michal Karm Babacek
> Assignee: Radoslav Husar
> Fix For: 2.0.0.Alpha1
>
>
> It is not possible (or I'm doing something wrong) to build mod_cluster Tomcat libs with Tomcat 8-only API.
> I hit this problem with RPM build on Fedora where I cannot download random stuff from maven central and I need to specify dependencies beforehand.
> h3. Description
> * container/catalina contains calls to API that is deprecated in Tomcat 7 and missing in Tomcat 8, so I have to apply this patch: [MODCLUSTER-528-CatalinaContext.java.patch|https://github.com/Karm/mod_clu...] before build
> * profiles and test dependencies and Servlet API dependencies needs hacks to build without Tomcat 7 too: see other patches in [mod_cluster-fedora-packages/|https://github.com/Karm/mod_cluster-fedora-p...]
> h3. Suggestion
> * architecture dependencies so as it is possible to build mod_cluster with Tomcat 8 API only (and Tomcat 7 API only)
> * drop Tomcat 6 and JBossWeb from 1.3.x as they are not used in 1.3.x anyway (last use of Tomcat 6 and JBossWeb is 1.2.x)
> h3. Fedora 24 offers only Tomcat 8
> ...so does it even make sense to offer users Tomcat 7 libs RPM? At the moment, I have it disabled. Note: Former mod_cluster RPM (1.2.6) hasn't been distributing any usable tomcat integration; it has been AS7/JBossWeb one.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (MODCLUSTER-528) Re-think catalina/tomcat dependencies so as it is possible to build without tomcat7 API
by Michal Karm Babacek (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-528?page=com.atlassian.jira.pl... ]
Michal Karm Babacek reassigned MODCLUSTER-528:
----------------------------------------------
Assignee: Radoslav Husar (was: Michal Karm Babacek)
> Re-think catalina/tomcat dependencies so as it is possible to build without tomcat7 API
> ---------------------------------------------------------------------------------------
>
> Key: MODCLUSTER-528
> URL: https://issues.jboss.org/browse/MODCLUSTER-528
> Project: mod_cluster
> Issue Type: Enhancement
> Affects Versions: 1.3.3.Final
> Reporter: Michal Karm Babacek
> Assignee: Radoslav Husar
> Fix For: 2.0.0.Alpha1
>
>
> It is not possible (or I'm doing something wrong) to build mod_cluster Tomcat libs with Tomcat 8-only API.
> I hit this problem with RPM build on Fedora where I cannot download random stuff from maven central and I need to specify dependencies beforehand.
> h3. Description
> * container/catalina contains calls to API that is deprecated in Tomcat 7 and missing in Tomcat 8, so I have to apply this patch: [MODCLUSTER-528-CatalinaContext.java.patch|https://github.com/Karm/mod_clu...] before build
> * profiles and test dependencies and Servlet API dependencies needs hacks to build without Tomcat 7 too: see other patches in [mod_cluster-fedora-packages/|https://github.com/Karm/mod_cluster-fedora-p...]
> h3. Suggestion
> * architecture dependencies so as it is possible to build mod_cluster with Tomcat 8 API only (and Tomcat 7 API only)
> * drop Tomcat 6 and JBossWeb from 1.3.x as they are not used in 1.3.x anyway (last use of Tomcat 6 and JBossWeb is 1.2.x)
> h3. Fedora 24 offers only Tomcat 8
> ...so does it even make sense to offer users Tomcat 7 libs RPM? At the moment, I have it disabled. Note: Former mod_cluster RPM (1.2.6) hasn't been distributing any usable tomcat integration; it has been AS7/JBossWeb one.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (MODCLUSTER-528) Re-think catalina/tomcat dependencies so as it is possible to build without tomcat7 API
by Michal Karm Babacek (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-528?page=com.atlassian.jira.pl... ]
Michal Karm Babacek updated MODCLUSTER-528:
-------------------------------------------
Description:
It is not possible (or I'm doing something wrong) to build mod_cluster Tomcat libs with Tomcat 8-only API.
I hit this problem with RPM build on Fedora where I cannot download random stuff from maven central and I need to specify dependencies beforehand.
h3. Description
* container/catalina contains calls to API that is deprecated in Tomcat 7 and missing in Tomcat 8, so I have to apply this patch: [MODCLUSTER-528-CatalinaContext.java.patch|https://github.com/Karm/mod_clu...] before build
* profiles and test dependencies and Servlet API dependencies needs hacks to build without Tomcat 7 too: see other patches in [mod_cluster-fedora-packages/|https://github.com/Karm/mod_cluster-fedora-p...]
h3. Suggestion
* architecture dependencies so as it is possible to build mod_cluster with Tomcat 8 API only (and Tomcat 7 API only)
* drop Tomcat 6 and JBossWeb from 1.3.x as they are not used in 1.3.x anyway (last use of Tomcat 6 and JBossWeb is 1.2.x)
h3. Fedora 24 offers only Tomcat 8
...so does it even make sense to offer users Tomcat 7 libs RPM? At the moment, I have it disabled. Note: Former mod_cluster RPM (1.2.6) hasn't been distributing any usable tomcat integration; it has been AS7/JBossWeb one.
was:TBD
> Re-think catalina/tomcat dependencies so as it is possible to build without tomcat7 API
> ---------------------------------------------------------------------------------------
>
> Key: MODCLUSTER-528
> URL: https://issues.jboss.org/browse/MODCLUSTER-528
> Project: mod_cluster
> Issue Type: Enhancement
> Affects Versions: 1.3.3.Final
> Reporter: Michal Karm Babacek
> Assignee: Michal Karm Babacek
> Fix For: 2.0.0.Alpha1
>
>
> It is not possible (or I'm doing something wrong) to build mod_cluster Tomcat libs with Tomcat 8-only API.
> I hit this problem with RPM build on Fedora where I cannot download random stuff from maven central and I need to specify dependencies beforehand.
> h3. Description
> * container/catalina contains calls to API that is deprecated in Tomcat 7 and missing in Tomcat 8, so I have to apply this patch: [MODCLUSTER-528-CatalinaContext.java.patch|https://github.com/Karm/mod_clu...] before build
> * profiles and test dependencies and Servlet API dependencies needs hacks to build without Tomcat 7 too: see other patches in [mod_cluster-fedora-packages/|https://github.com/Karm/mod_cluster-fedora-p...]
> h3. Suggestion
> * architecture dependencies so as it is possible to build mod_cluster with Tomcat 8 API only (and Tomcat 7 API only)
> * drop Tomcat 6 and JBossWeb from 1.3.x as they are not used in 1.3.x anyway (last use of Tomcat 6 and JBossWeb is 1.2.x)
> h3. Fedora 24 offers only Tomcat 8
> ...so does it even make sense to offer users Tomcat 7 libs RPM? At the moment, I have it disabled. Note: Former mod_cluster RPM (1.2.6) hasn't been distributing any usable tomcat integration; it has been AS7/JBossWeb one.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (MODCLUSTER-527) Load Balancing logic can fail production environments
by Sean Cavanagh (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-527?page=com.atlassian.jira.pl... ]
Sean Cavanagh edited comment on MODCLUSTER-527 at 7/25/16 12:07 PM:
--------------------------------------------------------------------
3) Good point
As for the existing test, yes, update_workers_lbstatus does seem to recognize that if the worker hasn't won any elections then the worker might be in trouble, and it does a bunch of extra tests to make sure it's up.
However, oldelected is set 3 lines earlier:
{code:c}
ou->mess.oldelected = elected;
if (stat->lbfactor > 0)
stat->lbstatus = ((elected - oldelected) * 1000) / stat->lbfactor;
if (elected == oldelected) {
{code}
And as soon as that happens, a worker has a 0 numerator, and might steal an election.
So right, clearly the goal is to make sure that (elected - oldelected) > 0, so there are two ideas on the table:
1. Add 1 to the numerator
{code:c}
stat->lbstatus = ((elected - oldelected +1) * 1000) / stat->lbfactor;
{code}
2. Test for the bad condition and correct it somehow, e.g.
{code:c}
if (elected == oldelected) {
stat->lbstatus = 1000 / stat->lbfactor
} else {
stat->lbstatus = ((elected - oldelected) * 1000) / stat->lbfactor;
}
{code}
I'm not terribly opinionated either way.
was (Author: sean.cavanagh):
3) Good point
As for the existing test, yes, update_workers_lbstatus does seem to recognize that if the worker hasn't won any elections then the worker might be in trouble, and it does a bunch of extra tests to make sure it's up.
However, oldelected is set 3 lines earlier:
{code:c}
ou->mess.oldelected = elected;
if (stat->lbfactor > 0)
stat->lbstatus = ((elected - oldelected) * 1000) / stat->lbfactor;
if (elected == oldelected) {
{code}
And as soon as that happens, a worker has a 0 numerator, and might steal an election.
So right, clearly the goal is to make sure that (elected - oldelected) > 0, so there are two ideas on the table:
1. Add 1 to the numerator
{code:c}
stat->lbstatus = ((elected - oldelected +1) * 1000) / stat->lbfactor;
{code}
2. Test for the bad condition and correct it somehow, e.g.
{code:c}
if (elected == oldelected) {
stat->lbstatus = 1000 / stat->lbfactor
} else {
stat->lbstatus = ((elected - oldelected) * 1000) / stat->lbfactor;
{code}
I'm not terribly opinionated either way.
> Load Balancing logic can fail production environments
> -----------------------------------------------------
>
> Key: MODCLUSTER-527
> URL: https://issues.jboss.org/browse/MODCLUSTER-527
> Project: mod_cluster
> Issue Type: Bug
> Components: Native (httpd modules)
> Environment:
> Reporter: Sean Cavanagh
> Assignee: Jean-Frederic Clere
>
> We suffered the same problem as MODCLUSTER-100 in our production environment.
> If any balance pool sees 0 requests during 2*LBstatusRecalTime the next request will go to the first worker in the pool. If an application consistently sends requests at a rate less than 2*LBstatusRecalTime, then all requests will be sent to the same node, and _balancing will be completely broken_
> Less worrisome, but still not ideal, if one worker is more heavily balanced than the others (e.g. it's load factor has been manually set to '1' in order to offload it; a somewhat common practice), that worker will still get a session once every 2*LBstatusRecalTime.
> I know it seems odd to hear about an application which can load a server by sending fewer than 1 request every 10 seconds, however, if one is using sticky sessions, one is not balancing _requests_, one is balancing _logins_. The usage pattern for our enterprise application sees our users login relatively slowly, and once a user has their JSESSIONID, those http requests never see the mod_proxy_cluster balancing algorithm again.
> It's nice that there is a config parameter now, which will allow us to balance properly again, but I would offer that since the behaviour of external users can completely break the algorithm this bug requires a more sophisticated fix.
> If authoritative parties agree it's a good idea, I'm happy to write a patch that does the following:
> 1) improves the documentation, more clearly explaining how the algorithm functions and alerting users to this parameter
> 2) clean up a couple of straight-up documentation errors
> 3) Modifies the internal_find_best_byrequest formula from:
> {code:c}
> status = lbstatus + (elected - oldelected) * 1000)/lbfactor;
> {code}
> to:
> {code:c}
> status = lbstatus + (elected - oldelected + 1) * 1000)/lbfactor
> {code}
> It seems to me that the current formula assumes that each node would be winning at least some elections during the LBstatusRecalTime window, so by adding 1 we can ensure that assumption is always true, and no worker ever gets a score of 0.
> I did a bit of algebra and I'm pretty confident that this change won't influence the outcome of any elections. That could only happen if:
> {noformat}
> lfc - lfw > lfc*lsw - lfw*lsc
> {noformat}
> where lfc is the client load factor, lsc is the client's lbstatus, and simlar for lfw and lsw.
> Since the election numbers are multiplied by 1000, and the max difference between two load factors is 99, I conclude it is impossible for the addition of a single election to both workers to change the outcome of the election.
> So yeah, let me know.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (MODCLUSTER-527) Load Balancing logic can fail production environments
by Sean Cavanagh (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-527?page=com.atlassian.jira.pl... ]
Sean Cavanagh edited comment on MODCLUSTER-527 at 7/25/16 12:07 PM:
--------------------------------------------------------------------
3) Good point
As for the existing test, yes, update_workers_lbstatus does seem to recognize that if the worker hasn't won any elections then the worker might be in trouble, and it does a bunch of extra tests to make sure it's up.
However, oldelected is set 3 lines earlier:
{code:c}
ou->mess.oldelected = elected;
if (stat->lbfactor > 0)
stat->lbstatus = ((elected - oldelected) * 1000) / stat->lbfactor;
if (elected == oldelected) {
{code}
And as soon as that happens, a worker has a 0 numerator, and might steal an election.
So right, clearly the goal is to make sure that (elected - oldelected) > 0, so there are two ideas on the table:
1. Add 1 to the numerator
{code:c}
stat->lbstatus = ((elected - oldelected +1) * 1000) / stat->lbfactor;
{code}
2. Test for the bad condition and correct it somehow, e.g.
{code:c}
if (elected == oldelected) {
stat->lbstatus = 1000 / stat->lbfactor
} else {
stat->lbstatus = ((elected - oldelected) * 1000) / stat->lbfactor;
{code}
I'm not terribly opinionated either way.
was (Author: sean.cavanagh):
3) Good point
As for the existing test, yes, update_workers_lbstatus does seem to recognize that if the worker hasn't won any elections then the worker might be in trouble, and it does a bunch of extra tests to make sure it's up.
However, oldelected is set 3 lines earlier:
{code:c}
ou->mess.oldelected = elected;
if (stat->lbfactor > 0)
stat->lbstatus = ((elected - oldelected) * 1000) / stat->lbfactor;
if (elected == oldelected) {
{code}
And as soon as that happens, a worker has a 0 denominator, and might steal an election.
So right, clearly the goal is to make sure that {noformat}(elected - oldelected) > 0{noformat}, so I see two ideas:
1. Add 1 to the numerator
{code:c}
stat->lbstatus = ((elected - oldelected +1) * 1000) / stat->lbfactor;
{code}
2. Test for the bad condition and correct it somehow, e.g.
{code:c}
if (elected == oldelected) {
stat->lbstatus = 1000 / stat->lbfactor
} else {
stat->lbstatus = ((elected - oldelected) * 1000) / stat->lbfactor;
{code}
I'm not terribly opinionated either way.
> Load Balancing logic can fail production environments
> -----------------------------------------------------
>
> Key: MODCLUSTER-527
> URL: https://issues.jboss.org/browse/MODCLUSTER-527
> Project: mod_cluster
> Issue Type: Bug
> Components: Native (httpd modules)
> Environment:
> Reporter: Sean Cavanagh
> Assignee: Jean-Frederic Clere
>
> We suffered the same problem as MODCLUSTER-100 in our production environment.
> If any balance pool sees 0 requests during 2*LBstatusRecalTime the next request will go to the first worker in the pool. If an application consistently sends requests at a rate less than 2*LBstatusRecalTime, then all requests will be sent to the same node, and _balancing will be completely broken_
> Less worrisome, but still not ideal, if one worker is more heavily balanced than the others (e.g. it's load factor has been manually set to '1' in order to offload it; a somewhat common practice), that worker will still get a session once every 2*LBstatusRecalTime.
> I know it seems odd to hear about an application which can load a server by sending fewer than 1 request every 10 seconds, however, if one is using sticky sessions, one is not balancing _requests_, one is balancing _logins_. The usage pattern for our enterprise application sees our users login relatively slowly, and once a user has their JSESSIONID, those http requests never see the mod_proxy_cluster balancing algorithm again.
> It's nice that there is a config parameter now, which will allow us to balance properly again, but I would offer that since the behaviour of external users can completely break the algorithm this bug requires a more sophisticated fix.
> If authoritative parties agree it's a good idea, I'm happy to write a patch that does the following:
> 1) improves the documentation, more clearly explaining how the algorithm functions and alerting users to this parameter
> 2) clean up a couple of straight-up documentation errors
> 3) Modifies the internal_find_best_byrequest formula from:
> {code:c}
> status = lbstatus + (elected - oldelected) * 1000)/lbfactor;
> {code}
> to:
> {code:c}
> status = lbstatus + (elected - oldelected + 1) * 1000)/lbfactor
> {code}
> It seems to me that the current formula assumes that each node would be winning at least some elections during the LBstatusRecalTime window, so by adding 1 we can ensure that assumption is always true, and no worker ever gets a score of 0.
> I did a bit of algebra and I'm pretty confident that this change won't influence the outcome of any elections. That could only happen if:
> {noformat}
> lfc - lfw > lfc*lsw - lfw*lsc
> {noformat}
> where lfc is the client load factor, lsc is the client's lbstatus, and simlar for lfw and lsw.
> Since the election numbers are multiplied by 1000, and the max difference between two load factors is 99, I conclude it is impossible for the addition of a single election to both workers to change the outcome of the election.
> So yeah, let me know.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (MODCLUSTER-527) Load Balancing logic can fail production environments
by Sean Cavanagh (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-527?page=com.atlassian.jira.pl... ]
Sean Cavanagh commented on MODCLUSTER-527:
------------------------------------------
3) Good point
As for the existing test, yes, update_workers_lbstatus does seem to recognize that if the worker hasn't won any elections then the worker might be in trouble, and it does a bunch of extra tests to make sure it's up.
However, oldelected is set 3 lines earlier:
{code:c}
ou->mess.oldelected = elected;
if (stat->lbfactor > 0)
stat->lbstatus = ((elected - oldelected) * 1000) / stat->lbfactor;
if (elected == oldelected) {
{code}
And as soon as that happens, a worker has a 0 denominator, and might steal an election.
So right, clearly the goal is to make sure that {noformat}(elected - oldelected) > 0{noformat}, so I see two ideas:
1. Add 1 to the numerator
{code:c}
stat->lbstatus = ((elected - oldelected +1) * 1000) / stat->lbfactor;
{code}
2. Test for the bad condition and correct it somehow, e.g.
{code:c}
if (elected == oldelected) {
stat->lbstatus = 1000 / stat->lbfactor
} else {
stat->lbstatus = ((elected - oldelected) * 1000) / stat->lbfactor;
{code}
I'm not terribly opinionated either way.
> Load Balancing logic can fail production environments
> -----------------------------------------------------
>
> Key: MODCLUSTER-527
> URL: https://issues.jboss.org/browse/MODCLUSTER-527
> Project: mod_cluster
> Issue Type: Bug
> Components: Native (httpd modules)
> Environment:
> Reporter: Sean Cavanagh
> Assignee: Jean-Frederic Clere
>
> We suffered the same problem as MODCLUSTER-100 in our production environment.
> If any balance pool sees 0 requests during 2*LBstatusRecalTime the next request will go to the first worker in the pool. If an application consistently sends requests at a rate less than 2*LBstatusRecalTime, then all requests will be sent to the same node, and _balancing will be completely broken_
> Less worrisome, but still not ideal, if one worker is more heavily balanced than the others (e.g. it's load factor has been manually set to '1' in order to offload it; a somewhat common practice), that worker will still get a session once every 2*LBstatusRecalTime.
> I know it seems odd to hear about an application which can load a server by sending fewer than 1 request every 10 seconds, however, if one is using sticky sessions, one is not balancing _requests_, one is balancing _logins_. The usage pattern for our enterprise application sees our users login relatively slowly, and once a user has their JSESSIONID, those http requests never see the mod_proxy_cluster balancing algorithm again.
> It's nice that there is a config parameter now, which will allow us to balance properly again, but I would offer that since the behaviour of external users can completely break the algorithm this bug requires a more sophisticated fix.
> If authoritative parties agree it's a good idea, I'm happy to write a patch that does the following:
> 1) improves the documentation, more clearly explaining how the algorithm functions and alerting users to this parameter
> 2) clean up a couple of straight-up documentation errors
> 3) Modifies the internal_find_best_byrequest formula from:
> {code:c}
> status = lbstatus + (elected - oldelected) * 1000)/lbfactor;
> {code}
> to:
> {code:c}
> status = lbstatus + (elected - oldelected + 1) * 1000)/lbfactor
> {code}
> It seems to me that the current formula assumes that each node would be winning at least some elections during the LBstatusRecalTime window, so by adding 1 we can ensure that assumption is always true, and no worker ever gets a score of 0.
> I did a bit of algebra and I'm pretty confident that this change won't influence the outcome of any elections. That could only happen if:
> {noformat}
> lfc - lfw > lfc*lsw - lfw*lsc
> {noformat}
> where lfc is the client load factor, lsc is the client's lbstatus, and simlar for lfw and lsw.
> Since the election numbers are multiplied by 1000, and the max difference between two load factors is 99, I conclude it is impossible for the addition of a single election to both workers to change the outcome of the election.
> So yeah, let me know.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (MODCLUSTER-527) Load Balancing logic can fail production environments
by Jean-Frederic Clere (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-527?page=com.atlassian.jira.pl... ]
Jean-Frederic Clere commented on MODCLUSTER-527:
------------------------------------------------
1) that is always welcome.
2) that is always welcome too.
3) none of your formula are valid (missing '(')
If you look to the code https://github.com/modcluster/mod_proxy_cluster you will that there is already a test for elected == oldelected (probably we need 2?).
> Load Balancing logic can fail production environments
> -----------------------------------------------------
>
> Key: MODCLUSTER-527
> URL: https://issues.jboss.org/browse/MODCLUSTER-527
> Project: mod_cluster
> Issue Type: Bug
> Components: Native (httpd modules)
> Environment:
> Reporter: Sean Cavanagh
> Assignee: Jean-Frederic Clere
>
> We suffered the same problem as MODCLUSTER-100 in our production environment.
> If any balance pool sees 0 requests during 2*LBstatusRecalTime the next request will go to the first worker in the pool. If an application consistently sends requests at a rate less than 2*LBstatusRecalTime, then all requests will be sent to the same node, and _balancing will be completely broken_
> Less worrisome, but still not ideal, if one worker is more heavily balanced than the others (e.g. it's load factor has been manually set to '1' in order to offload it; a somewhat common practice), that worker will still get a session once every 2*LBstatusRecalTime.
> I know it seems odd to hear about an application which can load a server by sending fewer than 1 request every 10 seconds, however, if one is using sticky sessions, one is not balancing _requests_, one is balancing _logins_. The usage pattern for our enterprise application sees our users login relatively slowly, and once a user has their JSESSIONID, those http requests never see the mod_proxy_cluster balancing algorithm again.
> It's nice that there is a config parameter now, which will allow us to balance properly again, but I would offer that since the behaviour of external users can completely break the algorithm this bug requires a more sophisticated fix.
> If authoritative parties agree it's a good idea, I'm happy to write a patch that does the following:
> 1) improves the documentation, more clearly explaining how the algorithm functions and alerting users to this parameter
> 2) clean up a couple of straight-up documentation errors
> 3) Modifies the internal_find_best_byrequest formula from:
> {code:c}
> status = lbstatus + (elected - oldelected) * 1000)/lbfactor;
> {code}
> to:
> {code:c}
> status = lbstatus + (elected - oldelected + 1) * 1000)/lbfactor
> {code}
> It seems to me that the current formula assumes that each node would be winning at least some elections during the LBstatusRecalTime window, so by adding 1 we can ensure that assumption is always true, and no worker ever gets a score of 0.
> I did a bit of algebra and I'm pretty confident that this change won't influence the outcome of any elections. That could only happen if:
> {noformat}
> lfc - lfw > lfc*lsw - lfw*lsc
> {noformat}
> where lfc is the client load factor, lsc is the client's lbstatus, and simlar for lfw and lsw.
> Since the election numbers are multiplied by 1000, and the max difference between two load factors is 99, I conclude it is impossible for the addition of a single election to both workers to change the outcome of the election.
> So yeah, let me know.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (MODCLUSTER-527) Load Balancing logic can fail production environments
by Radoslav Husar (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-527?page=com.atlassian.jira.pl... ]
Radoslav Husar updated MODCLUSTER-527:
--------------------------------------
Tester: Michal Karm Babacek
Environment:
was:
> Load Balancing logic can fail production environments
> -----------------------------------------------------
>
> Key: MODCLUSTER-527
> URL: https://issues.jboss.org/browse/MODCLUSTER-527
> Project: mod_cluster
> Issue Type: Bug
> Components: Native (httpd modules)
> Environment:
> Reporter: Sean Cavanagh
> Assignee: Jean-Frederic Clere
>
> We suffered the same problem as MODCLUSTER-100 in our production environment.
> If any balance pool sees 0 requests during 2*LBstatusRecalTime the next request will go to the first worker in the pool. If an application consistently sends requests at a rate less than 2*LBstatusRecalTime, then all requests will be sent to the same node, and _balancing will be completely broken_
> Less worrisome, but still not ideal, if one worker is more heavily balanced than the others (e.g. it's load factor has been manually set to '1' in order to offload it; a somewhat common practice), that worker will still get a session once every 2*LBstatusRecalTime.
> I know it seems odd to hear about an application which can load a server by sending fewer than 1 request every 10 seconds, however, if one is using sticky sessions, one is not balancing _requests_, one is balancing _logins_. The usage pattern for our enterprise application sees our users login relatively slowly, and once a user has their JSESSIONID, those http requests never see the mod_proxy_cluster balancing algorithm again.
> It's nice that there is a config parameter now, which will allow us to balance properly again, but I would offer that since the behaviour of external users can completely break the algorithm this bug requires a more sophisticated fix.
> If authoritative parties agree it's a good idea, I'm happy to write a patch that does the following:
> 1) improves the documentation, more clearly explaining how the algorithm functions and alerting users to this parameter
> 2) clean up a couple of straight-up documentation errors
> 3) Modifies the internal_find_best_byrequest formula from:
> {code:c}
> status = lbstatus + (elected - oldelected) * 1000)/lbfactor;
> {code}
> to:
> {code:c}
> status = lbstatus + (elected - oldelected + 1) * 1000)/lbfactor
> {code}
> It seems to me that the current formula assumes that each node would be winning at least some elections during the LBstatusRecalTime window, so by adding 1 we can ensure that assumption is always true, and no worker ever gets a score of 0.
> I did a bit of algebra and I'm pretty confident that this change won't influence the outcome of any elections. That could only happen if:
> {noformat}
> lfc - lfw > lfc*lsw - lfw*lsc
> {noformat}
> where lfc is the client load factor, lsc is the client's lbstatus, and simlar for lfw and lsw.
> Since the election numbers are multiplied by 1000, and the max difference between two load factors is 99, I conclude it is impossible for the addition of a single election to both workers to change the outcome of the election.
> So yeah, let me know.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (MODCLUSTER-527) Load Balancing logic can fail production environments
by Sean Cavanagh (JIRA)
Sean Cavanagh created MODCLUSTER-527:
----------------------------------------
Summary: Load Balancing logic can fail production environments
Key: MODCLUSTER-527
URL: https://issues.jboss.org/browse/MODCLUSTER-527
Project: mod_cluster
Issue Type: Bug
Components: Native (httpd modules)
Environment:
Reporter: Sean Cavanagh
Assignee: Jean-Frederic Clere
We suffered the same problem as MODCLUSTER-100 in our production environment.
If any balance pool sees 0 requests during 2*LBstatusRecalTime the next request will go to the first worker in the pool. If an application consistently sends requests at a rate less than 2*LBstatusRecalTime, then all requests will be sent to the same node, and _balancing will be completely broken_
Less worrisome, but still not ideal, if one worker is more heavily balanced than the others (e.g. it's load factor has been manually set to '1' in order to offload it; a somewhat common practice), that worker will still get a session once every 2*LBstatusRecalTime.
I know it seems odd to hear about an application which can load a server by sending fewer than 1 request every 10 seconds, however, if one is using sticky sessions, one is not balancing _requests_, one is balancing _logins_. The usage pattern for our enterprise application sees our users login relatively slowly, and once a user has their JSESSIONID, those http requests never see the mod_proxy_cluster balancing algorithm again.
It's nice that there is a config parameter now, which will allow us to balance properly again, but I would offer that since the behaviour of external users can completely break the algorithm this bug requires a more sophisticated fix.
If authoritative parties agree it's a good idea, I'm happy to write a patch that does the following:
1) improves the documentation, more clearly explaining how the algorithm functions and alerting users to this parameter
2) clean up a couple of straight-up documentation errors
3) Modifies the internal_find_best_byrequest formula from:
{code:c}
status = lbstatus + (elected - oldelected) * 1000)/lbfactor;
{code}
to:
{code:c}
status = lbstatus + (elected - oldelected + 1) * 1000)/lbfactor
{code}
It seems to me that the current formula assumes that each node would be winning at least some elections during the LBstatusRecalTime window, so by adding 1 we can ensure that assumption is always true, and no worker ever gets a score of 0.
I did a bit of algebra and I'm pretty confident that this change won't influence the outcome of any elections. That could only happen if:
{noformat}
lfc - lfw > lfc*lsw - lfw*lsc
{noformat}
where lfc is the client load factor, lsc is the client's lbstatus, and simlar for lfw and lsw.
Since the election numbers are multiplied by 1000, and the max difference between two load factors is 99, I conclude it is impossible for the addition of a single election to both workers to change the outcome of the election.
So yeah, let me know.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months