[jboss-dev-forums] [Design of JBossCache] - Re: Locking tree before setting initial state

Fri Sep 8 23:35:57 EDT 2006

Continuing the recreation of the e-mail conversation... 

"bela at jboss.com" wrote : 
  | anonymous wrote : 
  |   | #2. Won't any PREPARE received on the state provider side after the state transfer request just be waiting in the channel for the getState() to complete?  However, these kinds of PREPARE calls are presumably being handled on the state transfer *recipient* side as well, and can thus lead to inconsistent state.  So, having the *recipient* return FAIL from PREPARE is needed.
  | 
  | True. So, if the state transfer takes longer than the PREPARE timeout, the initiator will return with a timeout exception and rollback the TX anyway. So we don't need to do anything here. If the state transfer takes less, the PREPAREs will succeeed *after* the state transfer has completed.
  |  
  | anonymous wrote : #3A5 -- Breaking locks for crashed members is triggered by receipt of a view change.  But what if the view change is stuck in the channel behind the blocked getState() call?  Again, we may need a "priority queue" concept to deal with this.  This is tricky though, because here the "priority queue" is in JGroups, and basically means letting view change messages jump ahead of others.  I certainly wouldn't think that could be the default behavior in JGroups, and even if it was a configurable option I imagine it could cause problems even in the JBC use case.
  | 
  | I don't feel very good about view changes jumping ahead of regular messages, this could badly screw up the QoS provided by JGroups. For
  | example: Virtual Synchrony mandates that all messages sent in view V1 are *delivered* in V1 *before* V2 is installed (this is done through the FLUSH protocol), so having V2 jump ahead of regular messages violates this property.
  |  
  | anonymous wrote : Actually, once FLUSH is in place, if the getState() call discovers a lock held by a remote gtx, the existence of the lock is actually proof of a dead (or at least very unhealthy) member, isn't it?
  | 
  | Yes, absolutely ! The FLUSH will probably have to have a timeout associated with it, all locks still held after this phase can be broken.
  |  
  | anonymous wrote : #3B1.  Even with FLUSH, it is possible the getState() call won't be able to acquire a lock.  This would be because a tx initiated on the state provider itself is holding the lock and has not yet started the 2PC phase.  This kind of tx can safely be rolled back though.  Other kinds of long lasting locally-initiated locks are possible too, though unlikely.  E.g. a simple non-transactional put where the calling thread got hung up in a badly written TreeCacheListener. So, the lock-breaking code is still partly useful.
  | 
  | *We *MUST* absolutely change this behavior and throw an exception if the state cannot be retrieved !!!*
  | 
  | anonymous wrote : 
  |   | This is a problem that has existed in JBC all along.  I don't see a good fix for it in 1.3.  Adding a lot of changed behaviors that cause rolled back transactions but which don't really ensure a valid state transfer seems like making trouble.  Maybe better to just leave the behavior from 1.2.4 and wait for the real fix in 1.4. 
  | 
  | Yes, I agree. With the change, of course, that we fail if the state 
  | cannot be retrieved. Maybe we should also increase the state transfer 
  | timeout in the XML default files
  | 
  | 

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3970500#3970500

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=3970500