]
Bela Ban commented on JGRP-1038:
--------------------------------
Gray withdrew this contribution, as he tested with regular FC and got good enough perf.
FC2
---
Key: JGRP-1038
URL:
https://jira.jboss.org/browse/JGRP-1038
Project: JGroups
Issue Type: Feature Request
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 2.11
Attachments: FC2.java, FCOverloadTest.java, Results.txt
[Email Gray Watson]
So we implemented our own FC2 because of problems with FC. I've been writing
tests and trying (with some success) to reproduce the issues we faced.
Initially when we were doing performance tests in our cluster of ~20 big-ish
linux servers with UDP/FC, we saw severe performance tests with "decent"
sized messages. Often we have large SQL responses which are in the 1+mb
range and can be as high as 10+mb. I decided to write the attached
FCOverloadTest to demonstrate some of the issues and to compare FC and FC2.
The default FC setting max_credits=500000 in udp.xml results in terrible
performance -- 952 seconds to send 10x 5mb messages to 10 hosts (500mb).
Less than optimal.
The simple solution, on the surface, is to increase the credits 1 or 2
orders of magnitude. 5000000 takes 160 secs, and 50000000 takes 32 secs.
Fine. But this in effect removes most of the flow-control. The larger the
max-credits, the more messages queue up.
What I thought would be a better flow-control model was per-recipient, where
each sender would keep a "water-level" of the outstanding packets needed to
be ack'd. When the water level got too high it would pause the sender until
the receiver had ack'd the water level back down. This is what FC2 tried to
accomplish. I've attached my test results, the FCOverloadTest, and FC2.
FC2 looks to do less UNICAST retransmits, sends fewer packets, and is
faster. It needs some review however.
I'd like to know other folks' experiences with flow-control. Are people
really using the udp.xml settings in production? What message sizes do you
have?
The next step in my testing is to write some unit tests which shut down and
start up notes while the system is loaded with an attempt to reproduce the
cluster pauses with FC that we have seen which would be deadly to our
application.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: