[jboss-jira] [JBoss JIRA] (AS7-2903) Cluster suspected to perform 100x slower than unclustered

Fri Dec 2 14:04:40 EST 2011

    [ https://issues.jboss.org/browse/AS7-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647717#comment-12647717 ] 

Scott Marlow commented on AS7-2903:
-----------------------------------

This aint my issue but, I'm curious as to why 100x slower. Do you have a similar (unclustered versus clustered) comparison for throughput numbers for a previous AS?

Are there errors in the server.log, that aren't there when unclustered?

We did see the "pause" effect in earlier releases, as we made AS faster.  AS7 is very fast, which can put even more load on session replication.  So, it will be interesting to compare the unclustered as5 throughput to the AS7 unclustered throughput.  

The "pause" effect, is related to the flow control (FC) protocol, that can pause an over-active node (like one that is performing with a very high throughput).  The FC settings can be jacked up to the sky, if enough memory is available to buffer the higher flow.  If we make FC too high, we will run out of memory.  

Like I said, this isn't my jira but I know Paul is very busy and probably doesn't mind the help. ;)

> Cluster suspected to perform 100x slower than unclustered
> ---------------------------------------------------------
>
>                 Key: AS7-2903
>                 URL: https://issues.jboss.org/browse/AS7-2903
>             Project: Application Server 7
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: No Release
>         Environment: eaebc8a0041e2c7bd9b7de93ea2d2bf87701abff
>            Reporter: Radoslav Husar
>            Assignee: Paul Ferraro
>            Priority: Critical
>             Fix For: 7.1.0.CR1
>
>
> Cluster performs 100x slower than unclustered when using HTTP session replication benchmark in our perf lab. Throughput with 200 clients drops from 49,851.0 r/s to 425.3 samples/s when using clustering in standalone-ha.xml server configuration. I cannot guarantee the runs, this might be a result of a hidden networking issue in the lab.
> Here are some sample runs:
> *Unclustered*
> {noformat}
> Nodes: 2, Sessions: 100, active: 100, samples: 680317, throughput 45,346.9 samples/s, 0.2 MB/s, mean response: 2 ms, sampling errors: 0, invalid samples: 0, valid samples: 680317 (100%)
> Nodes: 2, Sessions: 200, active: 200, samples: 747890, throughput 49,851.0 samples/s, 0.2 MB/s, mean response: 3 ms, sampling errors: 0, invalid samples: 0, valid samples: 747890 (100%)
> Nodes: 2, Sessions: 300, active: 300, samples: 722338, throughput 48,149.4 samples/s, 0.2 MB/s, mean response: 6 ms, sampling errors: 0, invalid samples: 0, valid samples: 722338 (100%)
> Nodes: 2, Sessions: 400, active: 400, samples: 702323, throughput 46,813.7 samples/s, 0.2 MB/s, mean response: 8 ms, sampling errors: 0, invalid samples: 0, valid samples: 702323 (100%)
> Nodes: 2, Sessions: 500, active: 500, samples: 690571, throughput 46,028.9 samples/s, 0.2 MB/s, mean response: 10 ms, sampling errors: 0, invalid samples: 0, valid samples: 690571 (100%)
> Nodes: 2, Sessions: 600, active: 600, samples: 689405, throughput 45,949.6 samples/s, 0.2 MB/s, mean response: 13 ms, sampling errors: 0, invalid samples: 0, valid samples: 689405 (100%)
> Nodes: 2, Sessions: 700, active: 700, samples: 681118, throughput 45,400.3 samples/s, 0.2 MB/s, mean response: 15 ms, sampling errors: 0, invalid samples: 0, valid samples: 681118 (100%)
> Nodes: 2, Sessions: 800, active: 800, samples: 685431, throughput 45,684.7 samples/s, 0.2 MB/s, mean response: 16 ms, sampling errors: 0, invalid samples: 0, valid samples: 685431 (100%)
> Nodes: 2, Sessions: 900, active: 900, samples: 668869, throughput 44,580.9 samples/s, 0.2 MB/s, mean response: 20 ms, sampling errors: 0, invalid samples: 0, valid samples: 668869 (100%)
> Nodes: 2, Sessions: 1000, active: 1000, samples: 675849, throughput 45,046.1 samples/s, 0.2 MB/s, mean response: 21 ms, sampling errors: 0, invalid samples: 0, valid samples: 675849 (100%)
> {noformat}
> *Clustered*
> {noformat}
> Nodes: 2, Sessions: 100, active: 100, samples: 7350, throughput 489.9 samples/s, 0.0 MB/s, mean response: 196 ms, sampling errors: 0, invalid samples: 49, valid samples: 7301 (99%)
> Nodes: 2, Sessions: 200, active: 200, samples: 6380, throughput 425.3 samples/s, 0.0 MB/s, mean response: 458 ms, sampling errors: 0, invalid samples: 362, valid samples: 6018 (94%)
> Nodes: 2, Sessions: 300, active: 300, samples: 5821, throughput 382.4 samples/s, 0.0 MB/s, mean response: 748 ms, sampling errors: 0, invalid samples: 484, valid samples: 5337 (91%)
> Nodes: 2, Sessions: 400, active: 262, samples: 3846, throughput 256.3 samples/s, 0.0 MB/s, mean response: 1056 ms, sampling errors: 0, invalid samples: 333, valid samples: 3513 (91%)
> Nodes: 2, Sessions: 500, active: 452, samples: 6389, throughput 425.6 samples/s, 0.0 MB/s, mean response: 1104 ms, sampling errors: 0, invalid samples: 565, valid samples: 5824 (91%)
> Nodes: 2, Sessions: 600, active: 55, samples: 985, throughput 65.7 samples/s, 0.0 MB/s, mean response: 1590 ms, sampling errors: 457, invalid samples: 55, valid samples: 473 (48%)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira