The discussion on a javagroups-development thread made me realize that
for sure JBC wants to take advantage of the 2.6 connect+statetransfer
feature.
The JBCACHE-315 solution is only really applicable to a FLUSH associated
with a state transfer. That's when the cache needs to ensure the state
it transfer doesn't have any changes in it from uncommitted
transactions. Since the JBCACHE-315 fix is potentially disruptive to
transactions, we want to use it as judiciously as possible. Combining
connect+statetransfer is a good step in that direction.
Beyond that:
1) A block() call associated with a simple connect. JBC has no need to
deal with ongoing transactions; just prevent messages going out.
2) A block() call associated with a state transfer request for a
different state-id. For example, two different caches sharing a
multiplexed channel; cacheA gets a block call when a new instance of
cacheB deploys somewhere. Again, cacheA has no need to deal with ongoing
transactions since it won't be transferring staet; just prevent messages
going out.
Problem is JBC has no idea what the context is when its block() callback
is invoked.
Brian Stansberry wrote:
AIUI, it's the time for all the services' block() impls to
return, plus
all the other work FLUSH does sending messages around.
It's the JBC block() impl that is going to take time. JBCACHE-315 means
JBC block() analyzing transactions that have written to the cache state,
giving them a chance to clear, rolling back those that don't etc. Takes
time.
Vladimir Blagojevic wrote:
> Hey Brian,
>
> This is a different timeout. It gives flush time window of 3 seconds to
> quiet the cluster i.e to complete a first phase of flush. It does not
> mean that the cluster will be quiet for 3 seconds. The timeout you are
> thinking about is the one in configuration file.
>
> Cheers,
> Vladimir
>
> Brian Stansberry wrote:
>> Looking at org.jgroups.JChannelFactory.connect it's calling startFlush
>> on the MuxChannel with a hardcoded timeout of 3 secs for the flush to
>> complete. I think such a short timeout will for sure be a problem on
>> production systems when nodes join active clusters. E.g. a good
>> solution to
http://jira.jboss.com/jira/browse/JBCACHE-315 likely
>> requires spending some time to allow active transactions to clear.
>>
>>
--
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberry(a)redhat.com