Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

Saturday, 13 April 2013

On 4/13/13 2:02 AM, Sanne Grinovero wrote:

...
 @All, the performance problem seemed to be caused by a problem in
 JGroups, which I've logged here:
 https://issues.jboss.org/browse/JGRP-1617 

Almost no information attached to the case :-( If it wasn't you, Sanne, 
I'd outright reject the case ...

The MessageDispatcher will *not* wait until the timeout kicks in, it'll 
return as soon as it has acks from all members of the target set. This 
works and is covered with a bunch of unit tests, so a regression would 
have been caught immediately.

I attached a test program to JGRP-1617 which shows that this feature 
works correctly.

Of course, if you lose an ack (e.g. due to a maxed out incoming / OOB 
thread pool), the unicast protocol will have to retransmit the ack until 
it has been received. Depending on the unicast protocol you use, this 
will be immediate (UNICAST, UNICAST3), or based on a stability interval 
(UNICAST2).

...
 For the record, the first operation was indeed triggering some lazy
 initialization of indexes, which in turn would trigger a Lucene
 Directory being started, triggering 3 Cache starts which in turn would
 trigger 6 state transfer processes: so indeed the first operation
 would not be exactly "cheap" performance wise, still this would
 complete in about 120 milliseconds. 
This sounds very low for the work you describe above. I don't think 6 
state transfers can be completed in 120ms, unless they're async (but 
then that means they're not done when you return). Also, cache starts 
(wrt JGroups) will definitely take more than a few seconds if you're the 
first cluster node...

...
 Not being sure about the options of depending to a newer JGroups
 release or the complexity of a fix, I'll implement a workaround in
 HSearch in the scope of HSEARCH-1296. 

If you add more information to JGRP-1617, I'll take a look. This would 
be a critical bug in JGroups *if* you can prove that the 
MessageDispatcher always runs into the timeout (I don't think you can 
though !).

...
 As a lesson learned, I think we need to polish some of our TRACE
level
 messaged to include the cache name: to resolve this we had not just
 many threads and components but also 4 of them where using JGroups
 (interleaving messages of all sorts) and 9 different caches where
 involved for each simple write operation in CD: made it interesting to
 figure what was going on! 
Yes, that would help. In JGroups, I usually log the cluster address of 
the thread that's writing to the log, so I can differentiate between 
different clusters on the same host.

...
 Also I'm wondering how hard it would be to
 have a log parser which converts my 10GB of text log from today in a
 graphical sequence diagram. 

Yes, something like wireshark "follow TCP" feature would be very helpful !

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296