[JBoss JIRA] (MODCLUSTER-449) Implement ramp-up when starting new nodes
by Bogdan Sikora (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-449?page=com.atlassian.jira.pl... ]
Bogdan Sikora updated MODCLUSTER-449:
-------------------------------------
Attachment: httpdRamp-up.jpg
Httpd balancer with 2 (Load 10) nodes and in the middle third (Load 90) node
> Implement ramp-up when starting new nodes
> -----------------------------------------
>
> Key: MODCLUSTER-449
> URL: https://issues.jboss.org/browse/MODCLUSTER-449
> Project: mod_cluster
> Issue Type: Feature Request
> Components: Core & Container Integration (Java)
> Affects Versions: 1.2.0.Final, 1.3.0.Final
> Reporter: Radoslav Husar
> Assignee: Radoslav Husar
> Priority: Critical
> Fix For: 2.0.0.Alpha1
>
> Attachments: httpdRamp-up.jpg, undertowRamp-up.jpg
>
>
> IIUC this has been a problem since inception. The problem is that the initial load stays in effect for performing load-balancing decisions until a new stat interval kicks in.
> This effect is mitigated by load decay over time, but for the time a new node joins in, it can get overloaded upon startup.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 4 months
[JBoss JIRA] (MODCLUSTER-449) Implement ramp-up when starting new nodes
by Bogdan Sikora (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-449?page=com.atlassian.jira.pl... ]
Bogdan Sikora updated MODCLUSTER-449:
-------------------------------------
Attachment: undertowRamp-up.jpg
Undertow balancer with 2 (Load 10) nodes and in the middle third (Load 90) node
> Implement ramp-up when starting new nodes
> -----------------------------------------
>
> Key: MODCLUSTER-449
> URL: https://issues.jboss.org/browse/MODCLUSTER-449
> Project: mod_cluster
> Issue Type: Feature Request
> Components: Core & Container Integration (Java)
> Affects Versions: 1.2.0.Final, 1.3.0.Final
> Reporter: Radoslav Husar
> Assignee: Radoslav Husar
> Priority: Critical
> Fix For: 2.0.0.Alpha1
>
> Attachments: undertowRamp-up.jpg
>
>
> IIUC this has been a problem since inception. The problem is that the initial load stays in effect for performing load-balancing decisions until a new stat interval kicks in.
> This effect is mitigated by load decay over time, but for the time a new node joins in, it can get overloaded upon startup.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 4 months
[JBoss JIRA] (MODCLUSTER-503) Write balancer's address to Azure Blob Storage via its REST api, libcurl
by Michal Karm Babacek (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-503?page=com.atlassian.jira.pl... ]
Michal Karm Babacek commented on MODCLUSTER-503:
------------------------------------------------
Progress: I have the REST API bits and pieces put together, close to a running bash prototype. As soon as it works, I'll code it in C using libcurl and libcrypto, creating a mod_azure_storage module.
> Write balancer's address to Azure Blob Storage via its REST api, libcurl
> ------------------------------------------------------------------------
>
> Key: MODCLUSTER-503
> URL: https://issues.jboss.org/browse/MODCLUSTER-503
> Project: mod_cluster
> Issue Type: Feature Request
> Components: Native (httpd modules)
> Environment: Azure
> Reporter: Michal Karm Babacek
> Assignee: Michal Karm Babacek
>
> * the Azure Storage C++ lib depends on Casablanca, it's bloatware, too heavy
> * we can do with simple libcurl + openssl for generating proper REST requests
> * could be done in C without much overhead
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 4 months
[JBoss JIRA] (MODCLUSTER-536) List of open files grows steadily during load test through mod_cluster
by Wayne Wang (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-536?page=com.atlassian.jira.pl... ]
Wayne Wang commented on MODCLUSTER-536:
---------------------------------------
The default value of MaxClients =150
The default value of MaxRequestWorkers = 400
The default values are quite different although those two parameters mean the same thing. I have tested in v1.2.6 with MaxClients set to 200, 250, and I did not see server error although I saw the number of messages being processed actually got decreased. I found the MaxClients set to 50 actually results in much more requests being successfully processed.
Not sure if I change MaxRequestWorkers, and the server error could disappear. However, it looks like v1.2.6 actually has better performance than v1.3.1
> List of open files grows steadily during load test through mod_cluster
> ----------------------------------------------------------------------
>
> Key: MODCLUSTER-536
> URL: https://issues.jboss.org/browse/MODCLUSTER-536
> Project: mod_cluster
> Issue Type: Bug
> Components: Core & Container Integration (Java)
> Affects Versions: 1.3.1.Final
> Environment: Wildfly10.0.0.Final
> mod_cluster-1.3.1.Final-linux2-x64-ssl
> CentOS7 (virtualbox)
> Reporter: Wayne Wang
> Assignee: Michal Karm Babacek
> Attachments: error_log, httpd-mpm.conf, httpd.conf, server.log, standalone-full-ha-snippet.xml
>
>
> I was able to configure wildfly 10 modcluster to work with Apache mod_cluster (1.3.1). However, when I was doing a load test, I found out that the test through web server eventually caused error in wildfly instance and I also saw error log in Apache web server
> The obvious error in wildfly instance is the so-called "java.net.SocketException: Too many files open". When I used the command lsop -u | grep TCP | wc -l, I can see the number grew steadily until the wildfly instance reported the error. This was when I sent requests through web server.
> However, when I sent the requests through wildfly instance (app server) directly, the number did not grow, and the app server can take a lot heavier load without this issue.
> The issue did not happen until many rounds of load tests were executed through web server. If I restart the web server, everything is working fine until I execute many rounds of load tests again
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 4 months
[JBoss JIRA] (MODCLUSTER-536) List of open files grows steadily during load test through mod_cluster
by Wayne Wang (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-536?page=com.atlassian.jira.pl... ]
Wayne Wang commented on MODCLUSTER-536:
---------------------------------------
New information about using mod_cluster v1.2.6 for comparison.
I have executed 10 load tests now, and I found v1.2.6 is a lot more performing (processed a lot more requests), and I have NOT found the issue of "Too many open files". During load tests, I monitored the lsof every 3 seconds, and I found the outcome could increase up to a bit more than 300 (314 was the highest number I observed), but it fluctuated between 200 and 300 for most of the time.
Summary of the differences between v1.3.1 and v1.2.6
(1) with v1.3.1, the web server was out of service for a period of time during load tests, and this did not happen with v1.2.6 (/server-status page was always up)
(2) much more requests got processed by v1.2.6. Jmeter showed 70% requests got OK status with v1.2.6 vs less than 50% got OK status with v1.3.1
(3) The number of open files under httpd processes stabilized at around 300 during peak load for v1.2.6, while this number could shoot up to more than 3000 with v1.3.1 and the server will print out errors after even one load test.
(4) once the JMeter send button is back (becomes green), the lsof result from v1.2.6 will decrease to around 75 in all 10 load tests. I also did not see any performance degradation in v1.2.6 during all the 10 load tests.
It looks like v1.3.1 may have some issue managing "open file/socket" during the shutdown of wildfly instances. In addition, it is obviously less performing than v1.2.6
> List of open files grows steadily during load test through mod_cluster
> ----------------------------------------------------------------------
>
> Key: MODCLUSTER-536
> URL: https://issues.jboss.org/browse/MODCLUSTER-536
> Project: mod_cluster
> Issue Type: Bug
> Components: Core & Container Integration (Java)
> Affects Versions: 1.3.1.Final
> Environment: Wildfly10.0.0.Final
> mod_cluster-1.3.1.Final-linux2-x64-ssl
> CentOS7 (virtualbox)
> Reporter: Wayne Wang
> Assignee: Michal Karm Babacek
> Attachments: error_log, httpd-mpm.conf, httpd.conf, server.log, standalone-full-ha-snippet.xml
>
>
> I was able to configure wildfly 10 modcluster to work with Apache mod_cluster (1.3.1). However, when I was doing a load test, I found out that the test through web server eventually caused error in wildfly instance and I also saw error log in Apache web server
> The obvious error in wildfly instance is the so-called "java.net.SocketException: Too many files open". When I used the command lsop -u | grep TCP | wc -l, I can see the number grew steadily until the wildfly instance reported the error. This was when I sent requests through web server.
> However, when I sent the requests through wildfly instance (app server) directly, the number did not grow, and the app server can take a lot heavier load without this issue.
> The issue did not happen until many rounds of load tests were executed through web server. If I restart the web server, everything is working fine until I execute many rounds of load tests again
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 4 months
[JBoss JIRA] (MODCLUSTER-536) List of open files grows steadily during load test through mod_cluster
by Wayne Wang (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-536?page=com.atlassian.jira.pl... ]
Wayne Wang edited comment on MODCLUSTER-536 at 9/1/16 12:08 PM:
----------------------------------------------------------------
The lsof command used for wildfly instance in one virtualbox vm was
lsof -u wayne | grep TCP | wc -l
The wildfly instance was started under user wayne. I can also find out the pid of the wildfly instance and change the command to lsof -p <pid> | grep TCP | wc -l. However, the result will be the same, so it was convenient for me to use the -u command
The lsof command used for apache web server in another virtualbox vm was
lsof -u daemon | grep TCP | wc -l
I also use the following to monitor the outcome of the command
watch -n 5 'lsof -u wayne | grep TCP |wc -l'
(1) if there was no shutdown of wildfly instance, the outcome of the command will keep low (about 200) and does not show the pattern of steady increase
(2) if I shut down the wildfly instance and let another standby instance to become active, and do this multiple times, I will see the outcome of the command show steady increase, and when it reach > 3000, server will print out error. The Apache web server needs to be restarted in order to support another load test. The wildfly instance also needs to be restarted due to the error of "Too many open files" and the outcome of lsof command remains high and unchanged.
Note the soft limit was set at 4096 and hard limit was set at 8192
was (Author: wwang2016):
The lsof command used for wildfly instance in one virtualbox vm was
lsof -u wayne | grep TCP | wc -l
The wildfly instance was started under user wayne. I can also find out the pid of the wildfly instance and change the command to lsof -p <pid> | grep TCP | wc -l. However, the result will be the same, so it was convenient for me to use the -u command
The lsof command used for apache web server in another virtualbox vm was
lsof -u daemon | grep TCP | wc -l
I also use the following to monitor the outcome of the command
watch -n 5 '${the above command}'
(1) if there was no shutdown of wildfly instance, the outcome of the command will keep low (about 200) and does not show the pattern of steady increase
(2) if I shut down the wildfly instance and let another standby instance to become active, and do this multiple times, I will see the outcome of the command show steady increase, and when it reach > 3000, server will print out error. The Apache web server needs to be restarted in order to support another load test. The wildfly instance also needs to be restarted due to the error of "Too many open files" and the outcome of lsof command remains high and unchanged.
Note the soft limit was set at 4096 and hard limit was set at 8192
> List of open files grows steadily during load test through mod_cluster
> ----------------------------------------------------------------------
>
> Key: MODCLUSTER-536
> URL: https://issues.jboss.org/browse/MODCLUSTER-536
> Project: mod_cluster
> Issue Type: Bug
> Components: Core & Container Integration (Java)
> Affects Versions: 1.3.1.Final
> Environment: Wildfly10.0.0.Final
> mod_cluster-1.3.1.Final-linux2-x64-ssl
> CentOS7 (virtualbox)
> Reporter: Wayne Wang
> Assignee: Michal Karm Babacek
> Attachments: error_log, httpd-mpm.conf, httpd.conf, server.log, standalone-full-ha-snippet.xml
>
>
> I was able to configure wildfly 10 modcluster to work with Apache mod_cluster (1.3.1). However, when I was doing a load test, I found out that the test through web server eventually caused error in wildfly instance and I also saw error log in Apache web server
> The obvious error in wildfly instance is the so-called "java.net.SocketException: Too many files open". When I used the command lsop -u | grep TCP | wc -l, I can see the number grew steadily until the wildfly instance reported the error. This was when I sent requests through web server.
> However, when I sent the requests through wildfly instance (app server) directly, the number did not grow, and the app server can take a lot heavier load without this issue.
> The issue did not happen until many rounds of load tests were executed through web server. If I restart the web server, everything is working fine until I execute many rounds of load tests again
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 4 months
[JBoss JIRA] (MODCLUSTER-536) List of open files grows steadily during load test through mod_cluster
by Wayne Wang (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-536?page=com.atlassian.jira.pl... ]
Wayne Wang edited comment on MODCLUSTER-536 at 9/1/16 11:51 AM:
----------------------------------------------------------------
The lsof command used for wildfly instance in one virtualbox vm was
lsof -u wayne | grep TCP | wc -l
The wildfly instance was started under user wayne. I can also find out the pid of the wildfly instance and change the command to lsof -p <pid> | grep TCP | wc -l. However, the result will be the same, so it was convenient for me to use the -u command
The lsof command used for apache web server in another virtualbox vm was
lsof -u daemon | grep TCP | wc -l
I also use the following to monitor the outcome of the command
watch -n 5 '${the above command}'
(1) if there was no shutdown of wildfly instance, the outcome of the command will keep low (about 200) and does not show the pattern of steady increase
(2) if I shut down the wildfly instance and let another standby instance to become active, and do this multiple times, I will see the outcome of the command show steady increase, and when it reach > 3000, server will print out error. The Apache web server needs to be restarted in order to support another load test. The wildfly instance also needs to be restarted due to the error of "Too many open files" and the outcome of lsof command remains high and unchanged.
Note the soft limit was set at 4096 and hard limit was set at 8192
was (Author: wwang2016):
The lsof command used for wildfly instance in one virtualbox vm was
lsof -u wayne | grep TCP | wc -l
The wildfly instance was started under user wayne. I can also find out the pid of the wildfly instance and change the command to lsof -p <pid> | grep TCP | wc -l. However, the result will be the same, so it was convenient for me to use the -u command
The lsof command used for apache web server in another virtualbox vm was
lsof -u daemon | grep TCP | wc -l
I also use the following to monitor the outcome of the command
{{watch -n 5 '${the above command}'}}
(1) if there was no shutdown of wildfly instance, the outcome of the command will keep low (about 200) and does not show the pattern of steady increase
(2) if I shut down the wildfly instance and let another standby instance to become active, and do this multiple times, I will see the outcome of the command show steady increase, and when it reach > 3000, server will print out error. The Apache web server needs to be restarted in order to support another load test. The wildfly instance also needs to be restarted due to the error of "Too many open files" and the outcome of lsof command remains high and unchanged.
Note the soft limit was set at 4096 and hard limit was set at 8192
> List of open files grows steadily during load test through mod_cluster
> ----------------------------------------------------------------------
>
> Key: MODCLUSTER-536
> URL: https://issues.jboss.org/browse/MODCLUSTER-536
> Project: mod_cluster
> Issue Type: Bug
> Components: Core & Container Integration (Java)
> Affects Versions: 1.3.1.Final
> Environment: Wildfly10.0.0.Final
> mod_cluster-1.3.1.Final-linux2-x64-ssl
> CentOS7 (virtualbox)
> Reporter: Wayne Wang
> Assignee: Michal Karm Babacek
> Attachments: error_log, httpd-mpm.conf, httpd.conf, server.log, standalone-full-ha-snippet.xml
>
>
> I was able to configure wildfly 10 modcluster to work with Apache mod_cluster (1.3.1). However, when I was doing a load test, I found out that the test through web server eventually caused error in wildfly instance and I also saw error log in Apache web server
> The obvious error in wildfly instance is the so-called "java.net.SocketException: Too many files open". When I used the command lsop -u | grep TCP | wc -l, I can see the number grew steadily until the wildfly instance reported the error. This was when I sent requests through web server.
> However, when I sent the requests through wildfly instance (app server) directly, the number did not grow, and the app server can take a lot heavier load without this issue.
> The issue did not happen until many rounds of load tests were executed through web server. If I restart the web server, everything is working fine until I execute many rounds of load tests again
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 4 months
[JBoss JIRA] (MODCLUSTER-536) List of open files grows steadily during load test through mod_cluster
by Wayne Wang (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-536?page=com.atlassian.jira.pl... ]
Wayne Wang edited comment on MODCLUSTER-536 at 9/1/16 11:51 AM:
----------------------------------------------------------------
The lsof command used for wildfly instance in one virtualbox vm was
lsof -u wayne | grep TCP | wc -l
The wildfly instance was started under user wayne. I can also find out the pid of the wildfly instance and change the command to lsof -p <pid> | grep TCP | wc -l. However, the result will be the same, so it was convenient for me to use the -u command
The lsof command used for apache web server in another virtualbox vm was
lsof -u daemon | grep TCP | wc -l
I also use the following to monitor the outcome of the command
{{watch -n 5 '${the above command}'}}
(1) if there was no shutdown of wildfly instance, the outcome of the command will keep low (about 200) and does not show the pattern of steady increase
(2) if I shut down the wildfly instance and let another standby instance to become active, and do this multiple times, I will see the outcome of the command show steady increase, and when it reach > 3000, server will print out error. The Apache web server needs to be restarted in order to support another load test. The wildfly instance also needs to be restarted due to the error of "Too many open files" and the outcome of lsof command remains high and unchanged.
Note the soft limit was set at 4096 and hard limit was set at 8192
was (Author: wwang2016):
The lsof command used for wildfly instance in one virtualbox vm was
lsof -u wayne | grep TCP | wc -l
The wildfly instance was started under user wayne. I can also find out the pid of the wildfly instance and change the command to lsof -p <pid> | grep TCP | wc -l. However, the result will be the same, so it was convenient for me to use the -u command
The lsof command used for apache web server in another virtualbox vm was
lsof -u daemon | grep TCP | wc -l
I also use the following to monitor the outcome of the command
{quote}watch -n 5 '${the above command}'{quote}
(1) if there was no shutdown of wildfly instance, the outcome of the command will keep low (about 200) and does not show the pattern of steady increase
(2) if I shut down the wildfly instance and let another standby instance to become active, and do this multiple times, I will see the outcome of the command show steady increase, and when it reach > 3000, server will print out error. The Apache web server needs to be restarted in order to support another load test. The wildfly instance also needs to be restarted due to the error of "Too many open files" and the outcome of lsof command remains high and unchanged.
Note the soft limit was set at 4096 and hard limit was set at 8192
> List of open files grows steadily during load test through mod_cluster
> ----------------------------------------------------------------------
>
> Key: MODCLUSTER-536
> URL: https://issues.jboss.org/browse/MODCLUSTER-536
> Project: mod_cluster
> Issue Type: Bug
> Components: Core & Container Integration (Java)
> Affects Versions: 1.3.1.Final
> Environment: Wildfly10.0.0.Final
> mod_cluster-1.3.1.Final-linux2-x64-ssl
> CentOS7 (virtualbox)
> Reporter: Wayne Wang
> Assignee: Michal Karm Babacek
> Attachments: error_log, httpd-mpm.conf, httpd.conf, server.log, standalone-full-ha-snippet.xml
>
>
> I was able to configure wildfly 10 modcluster to work with Apache mod_cluster (1.3.1). However, when I was doing a load test, I found out that the test through web server eventually caused error in wildfly instance and I also saw error log in Apache web server
> The obvious error in wildfly instance is the so-called "java.net.SocketException: Too many files open". When I used the command lsop -u | grep TCP | wc -l, I can see the number grew steadily until the wildfly instance reported the error. This was when I sent requests through web server.
> However, when I sent the requests through wildfly instance (app server) directly, the number did not grow, and the app server can take a lot heavier load without this issue.
> The issue did not happen until many rounds of load tests were executed through web server. If I restart the web server, everything is working fine until I execute many rounds of load tests again
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 4 months