[jboss-jira] [JBoss JIRA] Commented: (JGRP-497) Message bundling seems to add latency well beyond max_bundle_timeout
Bela Ban (JIRA)
jira-events at lists.jboss.org
Fri May 4 09:05:30 EDT 2007
[ http://jira.jboss.com/jira/browse/JGRP-497?page=comments#action_12361460 ]
Bela Ban commented on JGRP-497:
-------------------------------
In 2.5, JGroupsLatencyTest can be used as follows, to test any stack:
JGroupsLatencyTest -local -props /home/bela/udp.xml
(same address space)
or
// receiver
JGroupsLatency -props ./udp.xml
(receiver)
and
JGroupsLatency -props ./udp.xml -sender
(sender)
> Message bundling seems to add latency well beyond max_bundle_timeout
> --------------------------------------------------------------------
>
> Key: JGRP-497
> URL: http://jira.jboss.com/jira/browse/JGRP-497
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.4.1 SP3
> Reporter: Brian Stansberry
> Assigned To: Bela Ban
> Fix For: 2.5
>
>
> Short synopsis: with bundling enabled and max_bundle_timeout=30 ms, I'm sometimes seeing 700 ms delay in receiver getting a message, leading to transient AS testsuite failures. Disabling bundling makes the transient failures go away.
> Long discussion:
> The JBoss AS testsuite has been seeing intermittent failures of the asynchronous web session replication tests. Particularly with FIELD granularity tests. Basically, test modifies a session on one node, waits 500 ms, then fails over to the other node, expecting consistent state. Test fails if the session state is not as expected.
> Whenever I investigate the intermittent failure, it's always a case of the asynchronous replication message arriving after the failover request. TRACE logging of JBoss Cache shows sometimes a 700 ms delay between the sender cache sending the replication and the receiver receiving it. That's just too long!
> Causes I could think of:
> 1) Some up_thread/down_thread set to true, leaving a message sitting in a queue for a while until the OS schedules the thread. We used to see this problem. Nope -- all threads are set to false.
> 2) Bad luck; full gc happens at the wrong time. Possible but IMO unlikely; the failures occur too often and its not like these tests are generating a ton of garbage that's forcing a lot of full gc runs.
> 3) System is under some other load during the relevant period. Unlikely. The client is sleeping and the servers have nothing else going on.
> 4) Message bundling. It's turned on, but max_bundle_timeout is 30 ms, so the latency it adds to an async RPC should be minimal. But, I just disabled bundling and have now run the async FIELD tests about 10 times with no failures. With it enabled I'd get a failure in some test on average nearly once per run.
> Perhaps there is something that's preventing the Bundler task executing on the expected schedule?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list