Different types of flush WAS: [jgroups-dev] Start flush timeout with MuxChannel connect

Tuesday, 18 September 2007

The discussion on a javagroups-development thread made me realize that 
for sure JBC wants to take advantage of the 2.6 connect+statetransfer 
feature.

The JBCACHE-315 solution is only really applicable to a FLUSH associated 
with a state transfer.  That's when the cache needs to ensure the state 
it transfer doesn't have any changes in it from uncommitted 
transactions. Since the JBCACHE-315 fix is potentially disruptive to 
transactions, we want to use it as judiciously as possible. Combining 
connect+statetransfer is a good step in that direction.

Beyond that:

1) A block() call associated with a simple connect.  JBC has no need to 
deal with ongoing transactions; just prevent messages going out.

2) A block() call associated with a state transfer request for a 
different state-id. For example, two different caches sharing a 
multiplexed channel; cacheA gets a block call when a new instance of 
cacheB deploys somewhere. Again, cacheA has no need to deal with ongoing 
transactions since it won't be transferring staet; just prevent messages 
going out.

Problem is JBC has no idea what the context is when its block() callback 
is invoked.

Brian Stansberry wrote:
...
 AIUI, it's the time for all the services' block() impls to
return, plus 
 all the other work FLUSH does sending messages around.

 It's the JBC block() impl that is going to take time. JBCACHE-315 means 
 JBC block() analyzing transactions that have written to the cache state, 
 giving them a chance to clear, rolling back those that don't etc.  Takes 
 time.

 Vladimir Blagojevic wrote:
> Hey Brian,
>
> This is a different timeout. It gives flush time window of 3 seconds to 
> quiet the cluster i.e to complete a first phase of flush. It does not 
> mean that the cluster will be quiet for 3 seconds.  The timeout you are 
> thinking about is the one in configuration file.
>
> Cheers,
> Vladimir
>
> Brian Stansberry wrote:
>> Looking at org.jgroups.JChannelFactory.connect it's calling startFlush 
>> on the MuxChannel with a hardcoded timeout of 3 secs for the flush to 
>> complete. I think such a short timeout will for sure be a problem on 
>> production systems when nodes join active clusters. E.g. a good 
>> solution to http://jira.jboss.com/jira/browse/JBCACHE-315 likely 
>> requires spending some time to allow active transactions to clear.
>>
>>   

-- 
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberry(a)redhat.com

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006