[jbosscache-dev] Different types of flush WAS: [jgroups-dev] Start flush timeout with MuxChannel connect

Tue Sep 18 18:12:47 EDT 2007

The discussion on a javagroups-development thread made me realize that 
for sure JBC wants to take advantage of the 2.6 connect+statetransfer 
feature.

The JBCACHE-315 solution is only really applicable to a FLUSH associated 
with a state transfer.  That's when the cache needs to ensure the state 
it transfer doesn't have any changes in it from uncommitted 
transactions. Since the JBCACHE-315 fix is potentially disruptive to 
transactions, we want to use it as judiciously as possible. Combining 
connect+statetransfer is a good step in that direction.

Beyond that:

1) A block() call associated with a simple connect.  JBC has no need to 
deal with ongoing transactions; just prevent messages going out.

2) A block() call associated with a state transfer request for a 
different state-id. For example, two different caches sharing a 
multiplexed channel; cacheA gets a block call when a new instance of 
cacheB deploys somewhere. Again, cacheA has no need to deal with ongoing 
transactions since it won't be transferring staet; just prevent messages 
going out.

Problem is JBC has no idea what the context is when its block() callback 
is invoked.

Brian Stansberry wrote:
> AIUI, it's the time for all the services' block() impls to return, plus 
> all the other work FLUSH does sending messages around.
> 
> It's the JBC block() impl that is going to take time. JBCACHE-315 means 
> JBC block() analyzing transactions that have written to the cache state, 
> giving them a chance to clear, rolling back those that don't etc.  Takes 
> time.
> 
> Vladimir Blagojevic wrote:
>> Hey Brian,
>>
>> This is a different timeout. It gives flush time window of 3 seconds to 
>> quiet the cluster i.e to complete a first phase of flush. It does not 
>> mean that the cluster will be quiet for 3 seconds.  The timeout you are 
>> thinking about is the one in configuration file.
>>
>> Cheers,
>> Vladimir
>>
>> Brian Stansberry wrote:
>>> Looking at org.jgroups.JChannelFactory.connect it's calling startFlush 
>>> on the MuxChannel with a hardcoded timeout of 3 secs for the flush to 
>>> complete. I think such a short timeout will for sure be a problem on 
>>> production systems when nodes join active clusters. E.g. a good 
>>> solution to http://jira.jboss.com/jira/browse/JBCACHE-315 likely 
>>> requires spending some time to allow active transactions to clear.
>>>
>>>   
> 
> 

-- 
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberry at redhat.com