[jboss-jira] [JBoss JIRA] Commented: (JGRP-497) Message bundling seems to add latency well beyond max_bundle_timeout
Bela Ban (JIRA)
jira-events at lists.jboss.org
Wed May 30 02:50:08 EDT 2007
[ http://jira.jboss.com/jira/browse/JGRP-497?page=comments#action_12363505 ]
Bela Ban commented on JGRP-497:
-------------------------------
[Comment by Vladimir]
Wanted to get to the bottom of the bundler issue that was breaking my FLUSH tests :) After careful tracing I realized that the problem with delay that Brian also reported occurs when last message going down the stack in a sequence of bunch of message gets delayed is actually rather easy to explain. Message going down the stack gets added to the bundling queue but gets added at the moment bundling task is not done yet - so no new BundlingTimer task is created but the running one cannot process the message. So our unlucky message just sits there in the queue waiting for a new down the stack message to trigger BundlingTimer. Well, in step-by-step protocol like FLUSH this unlucky timing gets to happen too often....
Consider this patch. It simplifies things. No futures to track and no cancellation. We just keep track of number of bundling task running. We do not allow to have situation where each down the stack message creates new BundlingTimer but we *never* allow for a number of BundlingTimer to drop below certain treshold - say 2 or 3 - unless we have completely quiet cluster. Thus we guarantee to avoid unlucky message scenario. My flush tests were rock solid after this change and I have never seen bundler behave weird again.
> Message bundling seems to add latency well beyond max_bundle_timeout
> --------------------------------------------------------------------
>
> Key: JGRP-497
> URL: http://jira.jboss.com/jira/browse/JGRP-497
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.4.1 SP3
> Reporter: Brian Stansberry
> Assigned To: Bela Ban
> Fix For: 2.5, 2.4.1 SP4
>
>
> Short synopsis: with bundling enabled and max_bundle_timeout=30 ms, I'm sometimes seeing 700 ms delay in receiver getting a message, leading to transient AS testsuite failures. Disabling bundling makes the transient failures go away.
> Long discussion:
> The JBoss AS testsuite has been seeing intermittent failures of the asynchronous web session replication tests. Particularly with FIELD granularity tests. Basically, test modifies a session on one node, waits 500 ms, then fails over to the other node, expecting consistent state. Test fails if the session state is not as expected.
> Whenever I investigate the intermittent failure, it's always a case of the asynchronous replication message arriving after the failover request. TRACE logging of JBoss Cache shows sometimes a 700 ms delay between the sender cache sending the replication and the receiver receiving it. That's just too long!
> Causes I could think of:
> 1) Some up_thread/down_thread set to true, leaving a message sitting in a queue for a while until the OS schedules the thread. We used to see this problem. Nope -- all threads are set to false.
> 2) Bad luck; full gc happens at the wrong time. Possible but IMO unlikely; the failures occur too often and its not like these tests are generating a ton of garbage that's forcing a lot of full gc runs.
> 3) System is under some other load during the relevant period. Unlikely. The client is sleeping and the servers have nothing else going on.
> 4) Message bundling. It's turned on, but max_bundle_timeout is 30 ms, so the latency it adds to an async RPC should be minimal. But, I just disabled bundling and have now run the async FIELD tests about 10 times with no failures. With it enabled I'd get a failure in some test on average nearly once per run.
> Perhaps there is something that's preventing the Bundler task executing on the expected schedule?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list