On Mon, Apr 15, 2013 at 1:30 PM, Sanne Grinovero <sanne@infinispan.org> wrote:

I've attached the logs the the JIRA.

Some replies inline:

On 15 April 2013 11:04, Dan Berindei <dan.berindei@gmail.com> wrote:
>
>
>
> On Sat, Apr 13, 2013 at 2:42 PM, Sanne Grinovero <sanne@infinispan.org>
> wrote:
>>
>> On 13 April 2013 11:20, Bela Ban <bban@redhat.com> wrote:
>> >
>> >
>> > On 4/13/13 2:02 AM, Sanne Grinovero wrote:
>> >
>> >> @All, the performance problem seemed to be caused by a problem in
>> >> JGroups, which I've logged here:
>> >> https://issues.jboss.org/browse/JGRP-1617
>> >
>> >
>> > Almost no information attached to the case :-( If it wasn't you, Sanne,
>> > I'd outright reject the case ...
>>
>> I wouldn't blame you, and am sorry for the lack of details: as I said
>> it was very late, still I preferred to share the observations we made
>> so far.
>>
>> >From all the experiments we made - and some good logs I'll cleanup for
>> sharing - it's clear that the thread is not woken up while the ACK was
>> already received.
>> And of course I wouldn't expect this to fail in a simple test as it
>> wouldn't have escaped you ;-) or at least you would have had earlier
>> reports.
>>
>> There are lots of complex moving parts in this scenario: from a Muxed
>> JGroups Channel, and the Application Server responsible for
>> initializing the stack with some added magic from CapeDwarf itself:
>> it's not clear to me what configuration is exactly being used, for
>> one.
>>
>
> Does CD also change the JGroups configuration? I thought it only tweaks the
> Infinispan cache configuration on deployment, and the JGroups channel is
> already started by the time the CD application is deployed.

CD uses a custom AS build and a custom AS configuration, so anything
could be different.
On top of that, some things are reconfigured programmatically by it.

Ales already cleared this out, CD doesn't change the JGroups config at all.

>> Without a testcase we might not be 100% sure but it seems likely to be
>> an unexpected behaviour in JGroups, at least under some very specific
>> setup.
>>
>>
>> I'm glad to help tracking down more details of what could trigger
>> this, but I'm not too eager to write a full unit test for this as it
>> involves a lot of other components, and by mocking my own components
>> out I could still reproduce it: it's not Hibernate Search, so I'll
>> need the help from the field experts.
>>
>> Also I suspect a test would need to depend on many more components: is
>> JGroups having an easy way to manage dependencies nowadays?
>>
>> some more inline:
>>
>> >
>> > The MessageDispatcher will *not* wait until the timeout kicks in, it'll
>> > return as soon as it has acks from all members of the target set. This
>> > works and is covered with a bunch of unit tests, so a regression would
>> > have been caught immediately.
>>
>> I don't doubt the "vanilla scenario", but this is what happens in the
>> more complex case of the CapeDwarf setup.
>>
>
> My first guess would be that the MuxRpcDispatcher on the second node hasn't
> started yet by the time you call castMessage on the first node. It could be
> that your workaround just delayed the message a little bit, until the
> MuxRpcDispatcher on the other node actually started (because the JChannel is
> already started on both nodes, but as long as the MuxRpcDispatcher isn't
> started on the 2nd node it won't send any responses back).

Before the point in which Search uses the dispatcher, many more
operations happened succesfully and with a reasonable timing:
especially some transactions on Infinispan stored entries quickly and
without trouble.

Besides if such a race condition would be possible, I would consider
it a critical bug.

I looked at the muxer code and I think they actually take care of this already: MuxUpHandler returns a NoMuxHandler response when it can't find an appropriate MuxedRpcDispatcher, and MessageDispatcher counts that response against the number of expected responses. So it must be something else...

>> > I attached a test program to JGRP-1617 which shows that this feature
>> > works correctly.
>> >
>> > Of course, if you lose an ack (e.g. due to a maxed out incoming / OOB
>> > thread pool), the unicast protocol will have to retransmit the ack until
>> > it has been received. Depending on the unicast protocol you use, this
>> > will be immediate (UNICAST, UNICAST3), or based on a stability interval
>> > (UNICAST2).
>>
>> Right it's totally possible this is a stack configuration problem in the
>> AS.
>> I wouldn't be the best to ask that though, I don't even understand the
>> configuration format.
>>
>
> You can get the actual JGroups configuration with
> channel.getProtocolStack().printProtocolSpecAsXml(), but I wouldn't expect
> you to find any surprises there: they should use pretty much the JGroups
> defaults.

Nice tip. I'll add this as a logging option.

>
> By default STABLE.desired_avg_gossip is 20s and STABLE.stability_delay is
> 6s, so even if the message was lost it should take < 30s for the message to
> be resent.

The delay is actually ~10 seconds per RPC, so still <30s.
The reason the overall test takes 60 seconds is because there are 6
operations being performed.

Ok, this means my hunch definitely wasn't true, that would have explained only the first request timing out.

>> >> For the record, the first operation was indeed triggering some lazy
>> >> initialization of indexes, which in turn would trigger a Lucene
>> >> Directory being started, triggering 3 Cache starts which in turn would
>> >> trigger 6 state transfer processes: so indeed the first operation
>> >> would not be exactly "cheap" performance wise, still this would
>> >> complete in about 120 milliseconds.
>> >
>> > This sounds very low for the work you describe above. I don't think 6
>> > state transfers can be completed in 120ms, unless they're async (but
>> > then that means they're not done when you return). Also, cache starts
>> > (wrt JGroups) will definitely take more than a few seconds if you're the
>> > first cluster node...
>>
>> It's a unit test: the caches are initially empty and networking is
>> loopback,
>> on the second round some ~6 elements are in the cache, no larger than
>> ~10 character strings.
>> Should be reasonable?
>>
>
> Yes, I think it's reasonable, if the JChannel was already started before the
> CD application was deployed. Starting the first JChannel would take at least
> 3s, which is the default PING.timeout.
>
>
>>
>> >> Not being sure about the options of depending to a newer JGroups
>> >> release or the complexity of a fix, I'll implement a workaround in
>> >> HSearch in the scope of HSEARCH-1296.
>> >
>> >
>> > If you add more information to JGRP-1617, I'll take a look. This would
>> > be a critical bug in JGroups *if* you can prove that the
>> > MessageDispatcher always runs into the timeout (I don't think you can
>> > though !).
>>
>> Considering the easy workaround and that definitely this needs
>> something special in the configuration, I wouldn't consider it too
>> critical? For as far as we know now, it's entirely possible the
>> configuration being used is illegal. But this is exactly where I need
>> your help ;-)
>>
>
> I'm not sure that your workaround is 100% effective, even if it doesn't
> happen in this test it's always possible to have the app deployed on some of
> the nodes in the cluster, but not all.

That's right. I would prefer not to apply anything like that, but for
the sake of the experiment it was usefull to isolate the problem.

Looking at your workaround, I think you actually set the response mode to GET_NONE (because that's the default value in RequestOptions), so you're back to sending an asynchronous request.