<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Apr 15, 2013 at 1:30 PM, Sanne Grinovero <span dir="ltr"><<a href="mailto:sanne@infinispan.org" target="_blank">sanne@infinispan.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I've attached the logs the the JIRA.<br>
<br>
Some replies inline:<br>
<div><div class="h5"><br>
On 15 April 2013 11:04, Dan Berindei <<a href="mailto:dan.berindei@gmail.com">dan.berindei@gmail.com</a>> wrote:<br>
><br>
><br>
><br>
> On Sat, Apr 13, 2013 at 2:42 PM, Sanne Grinovero <<a href="mailto:sanne@infinispan.org">sanne@infinispan.org</a>><br>
> wrote:<br>
>><br>
>> On 13 April 2013 11:20, Bela Ban <<a href="mailto:bban@redhat.com">bban@redhat.com</a>> wrote:<br>
>> ><br>
>> ><br>
>> > On 4/13/13 2:02 AM, Sanne Grinovero wrote:<br>
>> ><br>
>> >> @All, the performance problem seemed to be caused by a problem in<br>
>> >> JGroups, which I've logged here:<br>
>> >> <a href="https://issues.jboss.org/browse/JGRP-1617" target="_blank">https://issues.jboss.org/browse/JGRP-1617</a><br>
>> ><br>
>> ><br>
>> > Almost no information attached to the case :-( If it wasn't you, Sanne,<br>
>> > I'd outright reject the case ...<br>
>><br>
>> I wouldn't blame you, and am sorry for the lack of details: as I said<br>
>> it was very late, still I preferred to share the observations we made<br>
>> so far.<br>
>><br>
>> >From all the experiments we made - and some good logs I'll cleanup for<br>
>> sharing - it's clear that the thread is not woken up while the ACK was<br>
>> already received.<br>
>> And of course I wouldn't expect this to fail in a simple test as it<br>
>> wouldn't have escaped you ;-) or at least you would have had earlier<br>
>> reports.<br>
>><br>
>> There are lots of complex moving parts in this scenario: from a Muxed<br>
>> JGroups Channel, and the Application Server responsible for<br>
>> initializing the stack with some added magic from CapeDwarf itself:<br>
>> it's not clear to me what configuration is exactly being used, for<br>
>> one.<br>
>><br>
><br>
> Does CD also change the JGroups configuration? I thought it only tweaks the<br>
> Infinispan cache configuration on deployment, and the JGroups channel is<br>
> already started by the time the CD application is deployed.<br>
<br>
</div></div>CD uses a custom AS build and a custom AS configuration, so anything<br>
could be different.<br>
On top of that, some things are reconfigured programmatically by it.<br>
<div class="im"><br></div></blockquote><div><br></div><div>Ales already cleared this out, CD doesn't change the JGroups config at all.<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">
<br>
>> Without a testcase we might not be 100% sure but it seems likely to be<br>
>> an unexpected behaviour in JGroups, at least under some very specific<br>
>> setup.<br>
>><br>
>><br>
>> I'm glad to help tracking down more details of what could trigger<br>
>> this, but I'm not too eager to write a full unit test for this as it<br>
>> involves a lot of other components, and by mocking my own components<br>
>> out I could still reproduce it: it's not Hibernate Search, so I'll<br>
>> need the help from the field experts.<br>
>><br>
>> Also I suspect a test would need to depend on many more components: is<br>
>> JGroups having an easy way to manage dependencies nowadays?<br>
>><br>
>> some more inline:<br>
>><br>
>> ><br>
>> > The MessageDispatcher will *not* wait until the timeout kicks in, it'll<br>
>> > return as soon as it has acks from all members of the target set. This<br>
>> > works and is covered with a bunch of unit tests, so a regression would<br>
>> > have been caught immediately.<br>
>><br>
>> I don't doubt the "vanilla scenario", but this is what happens in the<br>
>> more complex case of the CapeDwarf setup.<br>
>><br>
><br>
> My first guess would be that the MuxRpcDispatcher on the second node hasn't<br>
> started yet by the time you call castMessage on the first node. It could be<br>
> that your workaround just delayed the message a little bit, until the<br>
> MuxRpcDispatcher on the other node actually started (because the JChannel is<br>
> already started on both nodes, but as long as the MuxRpcDispatcher isn't<br>
> started on the 2nd node it won't send any responses back).<br>
<br>
</div>Before the point in which Search uses the dispatcher, many more<br>
operations happened succesfully and with a reasonable timing:<br>
especially some transactions on Infinispan stored entries quickly and<br>
without trouble.<br>
<br>
Besides if such a race condition would be possible, I would consider<br>
it a critical bug.<br>
<div class="im"><br></div></blockquote><div><br></div>I looked at the muxer code and I think they actually take care of this already: MuxUpHandler returns a NoMuxHandler response when it can't find an appropriate MuxedRpcDispatcher, and MessageDispatcher counts that response against the number of expected responses. So it must be something else...<br>
<br><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
<br>
>> > I attached a test program to JGRP-1617 which shows that this feature<br>
>> > works correctly.<br>
>> ><br>
>> > Of course, if you lose an ack (e.g. due to a maxed out incoming / OOB<br>
>> > thread pool), the unicast protocol will have to retransmit the ack until<br>
>> > it has been received. Depending on the unicast protocol you use, this<br>
>> > will be immediate (UNICAST, UNICAST3), or based on a stability interval<br>
>> > (UNICAST2).<br>
>><br>
>> Right it's totally possible this is a stack configuration problem in the<br>
>> AS.<br>
>> I wouldn't be the best to ask that though, I don't even understand the<br>
>> configuration format.<br>
>><br>
><br>
> You can get the actual JGroups configuration with<br>
> channel.getProtocolStack().printProtocolSpecAsXml(), but I wouldn't expect<br>
> you to find any surprises there: they should use pretty much the JGroups<br>
> defaults.<br>
<br>
</div>Nice tip. I'll add this as a logging option.<br>
<div class="im"><br>
><br>
> By default STABLE.desired_avg_gossip is 20s and STABLE.stability_delay is<br>
> 6s, so even if the message was lost it should take < 30s for the message to<br>
> be resent.<br>
<br>
</div>The delay is actually ~10 seconds per RPC, so still <30s.<br>
The reason the overall test takes 60 seconds is because there are 6<br>
operations being performed.<br>
<div><div class="h5"><br></div></div></blockquote><div><br></div><div>Ok, this means my hunch definitely wasn't true, that would have explained only the first request timing out.<br><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div class="h5">
<br>
>> >> For the record, the first operation was indeed triggering some lazy<br>
>> >> initialization of indexes, which in turn would trigger a Lucene<br>
>> >> Directory being started, triggering 3 Cache starts which in turn would<br>
>> >> trigger 6 state transfer processes: so indeed the first operation<br>
>> >> would not be exactly "cheap" performance wise, still this would<br>
>> >> complete in about 120 milliseconds.<br>
>> ><br>
>> > This sounds very low for the work you describe above. I don't think 6<br>
>> > state transfers can be completed in 120ms, unless they're async (but<br>
>> > then that means they're not done when you return). Also, cache starts<br>
>> > (wrt JGroups) will definitely take more than a few seconds if you're the<br>
>> > first cluster node...<br>
>><br>
>> It's a unit test: the caches are initially empty and networking is<br>
>> loopback,<br>
>> on the second round some ~6 elements are in the cache, no larger than<br>
>> ~10 character strings.<br>
>> Should be reasonable?<br>
>><br>
><br>
> Yes, I think it's reasonable, if the JChannel was already started before the<br>
> CD application was deployed. Starting the first JChannel would take at least<br>
> 3s, which is the default PING.timeout.<br>
><br>
><br>
>><br>
>> >> Not being sure about the options of depending to a newer JGroups<br>
>> >> release or the complexity of a fix, I'll implement a workaround in<br>
>> >> HSearch in the scope of HSEARCH-1296.<br>
>> ><br>
>> ><br>
>> > If you add more information to JGRP-1617, I'll take a look. This would<br>
>> > be a critical bug in JGroups *if* you can prove that the<br>
>> > MessageDispatcher always runs into the timeout (I don't think you can<br>
>> > though !).<br>
>><br>
>> Considering the easy workaround and that definitely this needs<br>
>> something special in the configuration, I wouldn't consider it too<br>
>> critical? For as far as we know now, it's entirely possible the<br>
>> configuration being used is illegal. But this is exactly where I need<br>
>> your help ;-)<br>
>><br>
><br>
> I'm not sure that your workaround is 100% effective, even if it doesn't<br>
> happen in this test it's always possible to have the app deployed on some of<br>
> the nodes in the cluster, but not all.<br>
<br>
</div></div>That's right. I would prefer not to apply anything like that, but for<br>
the sake of the experiment it was usefull to isolate the problem.<br>
<br></blockquote><div><br></div><div>Looking at your workaround, I think you actually set the response mode to GET_NONE (because that's the default value in RequestOptions), so you're back to sending an asynchronous request.<br>
</div></div></div></div>