[jboss-dev-forums] [Design of JBossCache] - Re: Locking tree before setting initial state

bstansberry@jboss.com do-not-reply at jboss.com
Fri Sep 8 23:27:49 EDT 2006


Continuing the recreation of the e-mail conversation...

"bstansberry at jboss.com" wrote : 
  | Some comments on your analysis:
  |  
  | #1A.  The way it is coded now is that if a lock is held by a non-local gtx that is STATUS_PREPARED, the lock is just broken; the tx is not affected.  This is because as you noted once a non-local tx is STATUS_PREPARED it is too late to notify the intiator.  So, the lock is just broken.  This basically means the state provider is sending uncommitted state.  If the gtx later commits, no harm, but if the initiator of the gtx later issues a rollback(), the state transfer recipient will have invalid state.  Doing it this way is less likely to lead to inconsistent state than rolling back the tx on the state provider, but it is still not good.
  |  
  | #1B.  The PREPARE locks won't ever be released (at least without adding something like a "priority queue" to JBC -- JBCACHE-326).  The COMMIT message will be stuck in the JGroups channel behind the blocked getState() call.
  |  
  | #2. Won't any PREPARE received on the state provider side after the state transfer request just be waiting in the channel for the getState() to complete?  However, these kinds of PREPARE calls are presumably being handled on the state transfer *recipient* side as well, and can thus lead to inconsistent state.  So, having the *recipient* return FAIL from PREPARE is needed.
  |  
  | #3A5 -- Breaking locks for crashed members is triggered by receipt of a view change.  But what if the view change is stuck in the channel behind the blocked getState() call?  Again, we may need a "priority queue" concept to deal with this.  This is tricky though, because here the "priority queue" is in JGroups, and basically means letting view change messages jump ahead of others.  I certainly wouldn't think that could be the default behavior in JGroups, and even if it was a configurable option I imagine it could cause problems even in the JBC use case.
  |  
  | Actually, once FLUSH is in place, if the getState() call discovers a lock held by a remote gtx, the existence of the lock is actually proof of a dead (or at least very unhealthy) member, isn't it?
  |  
  | #3B1.  Even with FLUSH, it is possible the getState() call won't be able to acquire a lock.  This would be because a tx initiated on the state provider itself is holding the lock and has not yet started the 2PC phase.  This kind of tx can safely be rolled back though.  Other kinds of long lasting locally-initiated locks are possible too, though unlikely.  E.g. a simple non-transactional put where the calling thread got hung up in a badly written TreeCacheListener. So, the lock-breaking code is still partly useful.
  |  
  |  
  | The killer issue is #1B.  Until we do FLUSH or JBCACHE-326, the possibility exists that getState() will get stuck.  Seems like in that situation we have a choice -- let the getState() call fail, or break the locks and accept the risk of inconsistent state.  If we let the getState() call fail, then we have the choice of what to do on the recipient side -- abort the startup of the cache, or let the cache start without transferred state.  Up to 1.3.0, JBC has been written to allow getState() to fail and let the recipient cache start without the transferred state.
  |  
  | My current (not very strong) inclination Bela is to agree with your daughter when she wrote last week that "The proper and *only* way IMO is to wait until FLUSH has been implemented in JGroups 2.4"
  | This is a problem that has existed in JBC all along.  I don't see a good fix for it in 1.3.  Adding a lot of changed behaviors that cause rolled back transactions but which don't really ensure a valid state transfer seems like making trouble.  Maybe better to just leave the behavior from 1.2.4 and wait for the real fix in 1.4. 
  |  
  | One thing I definitely think is that if we are going to allow the getState() call to fail, we shouldn't have the recipient return FAIL to prepare() calls while it waits for the state.  That's just a nightmare -- rolled back transactions all over the place, and then the new cache ends up without a state transfer anyway :(
  | 
  | 

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3970499#3970499

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=3970499



More information about the jboss-dev-forums mailing list