On 10/11/13 2:40 PM, Radim Vansa wrote:
Hi,
since Infinispan moved to JGroups 3.4, we're experiencing occassional
deadlocks in some tests - most of threads that send anything over
JGroups are waiting in JGroups' FlowControl.decrementCredits.
Are those real deadlocks ? Meaning, the system never recovers ? If not,
then it's just flow control doing its job and preventing a fast sender
from overrunning a busy receiver.
For example, if the receiver is busy processing something or locked on a
lock acquisition, then it may not be able to send back credits and
blocks the sender until it is done. Also, if the receiver's thread pool
drops the message because the pool is full, no credits will be sent,
thus blocking the sender. This is a *good thing*, or else the pool would
be exhausted even more.
We really need to make a clear distinction between these 2 modes.
Naturally, if a receiver performs some blocking (as is done in
Infinispan), the sender should stop sending at some point, or the
receiver would simply drop all messages and cause a lot of retransmissions.
The
problem sometimes goes away after several seconds, but it produces some
ugly spikes in our througput/response time charts.
OK, good, so it's the latter: temp blocking caused by flow control.
Flow control *can* cause some hiccups every now and then, especially if
the receiver can block processing a message. The 200K credits is pretty
low (unless you send very small messages), but the 10M mentioned here
might be too much, I'd suggest a middle ground, e.g. 2-4 MB (default).
With the new INTERNAL thread pool, these blockings will *not* go away,
as credits (even sent INTERNAL) won't get sent in the first place until
the receiver(s) process the messages...
Originally this
affected just some RadarGun tests but this is appearing in some
client-server tests as well (we've recently investigated an issue where
this appeared in a regular soak test).
I was looking into that [1] for some time but haven't really figured out
the cause. The workaround is to set up MFC and UFC credits high enough
(I use 10M) and stuff works then. I was trying to reproduce that on pure
JGroups, but unsuccessfully.
I am not asking anyone to dig into that, but I wanted to know whether QA
is alone experiencing that or if there are more of us.
Radim
[1]
https://issues.jboss.org/browse/JGRP-1675
--
Bela Ban, JGroups lead (
http://www.jgroups.org)