On Fri, Oct 11, 2013 at 5:26 PM, Sanne Grinovero <sanne(a)infinispan.org>wrote:
On 11 October 2013 14:21, Dan Berindei <dan.berindei(a)gmail.com>
wrote:
> I've seen StateTransferLargeObjectTest hang on my machine with all OOB
> threads waiting in FlowControl$Credit.decrementIfEnoughCredits, but I
> thought that was because the JGroups internal thread pool wasn't enabled
in
> the test configuration. Now that I've seen it's enabled by default, I
think
> it could be a JGroups problem after all.
Seems you think that the testsuite setup could make some tests behave
differently than expected, that's quite a bad sign on the testsuite
quality.
The JGroups configuration, in particular the number of
OOB/INT/remote-executor threads certainly makes a difference for a lot of
tests, because they rely on being able to run n commands in parallel, in
order to achieve a certain sequence of events. I don't see that as a bad
thing, the only problem is that we don't know exactly how many threads each
test needs.
I would argue then that these threadpools should not be disabled, or
that these tests should not be end-to-end tests including a JGroups
stack, but probably replace the channel with an in-memory copy of the
buffers.
To clarify, the internal thread pool was introduced in JGroups specifically
to address this deadlock: a node can't send any message because it doesn't
have enough credits, yet it can't receive more credits because all of its
OOB threads are blocked sending a message. It was never disabled, we just
never enabled it explicitly in the test suite and I wasn't aware that it
was enabled by default. But it seems the deadlock can appear even when the
internal thread pool is enabled.
StateTransferLargeObjectTest is a stress test that checks how state
transfer and then remote reads are handled when the amount of data is very
large. I'm not sure how useful it would be if we didn't run it with a real
JGroups stack. Perhaps there are other tests that would benefit from being
isolated from the peculiarities of JGroups, but I don't think this is it.
True, the test suite's JGroups stack uses TCP so it doesn't really need
flow control, but if a test hanging help us solve a real issue then I see
it as a positive thing, not a negative one. So unless we find that dropping
UFC and/or TCP speeds up the test suite considerably, I'd keep it just as
it is.
Cheers
Dan