Continuing the recreation of the e-mail conversation...
"bela(a)jboss.com" wrote :
| anonymous wrote :
| | #2. Won't any PREPARE received on the state provider side after the state
transfer request just be waiting in the channel for the getState() to complete? However,
these kinds of PREPARE calls are presumably being handled on the state transfer
*recipient* side as well, and can thus lead to inconsistent state. So, having the
*recipient* return FAIL from PREPARE is needed.
|
| True. So, if the state transfer takes longer than the PREPARE timeout, the initiator
will return with a timeout exception and rollback the TX anyway. So we don't need to
do anything here. If the state transfer takes less, the PREPAREs will succeeed *after* the
state transfer has completed.
|
| anonymous wrote : #3A5 -- Breaking locks for crashed members is triggered by receipt
of a view change. But what if the view change is stuck in the channel behind the blocked
getState() call? Again, we may need a "priority queue" concept to deal with
this. This is tricky though, because here the "priority queue" is in JGroups,
and basically means letting view change messages jump ahead of others. I certainly
wouldn't think that could be the default behavior in JGroups, and even if it was a
configurable option I imagine it could cause problems even in the JBC use case.
|
| I don't feel very good about view changes jumping ahead of regular messages, this
could badly screw up the QoS provided by JGroups. For
| example: Virtual Synchrony mandates that all messages sent in view V1 are *delivered*
in V1 *before* V2 is installed (this is done through the FLUSH protocol), so having V2
jump ahead of regular messages violates this property.
|
| anonymous wrote : Actually, once FLUSH is in place, if the getState() call discovers a
lock held by a remote gtx, the existence of the lock is actually proof of a dead (or at
least very unhealthy) member, isn't it?
|
| Yes, absolutely ! The FLUSH will probably have to have a timeout associated with it,
all locks still held after this phase can be broken.
|
| anonymous wrote : #3B1. Even with FLUSH, it is possible the getState() call won't
be able to acquire a lock. This would be because a tx initiated on the state provider
itself is holding the lock and has not yet started the 2PC phase. This kind of tx can
safely be rolled back though. Other kinds of long lasting locally-initiated locks are
possible too, though unlikely. E.g. a simple non-transactional put where the calling
thread got hung up in a badly written TreeCacheListener. So, the lock-breaking code is
still partly useful.
|
| *We *MUST* absolutely change this behavior and throw an exception if the state cannot
be retrieved !!!*
|
| anonymous wrote :
| | This is a problem that has existed in JBC all along. I don't see a good fix
for it in 1.3. Adding a lot of changed behaviors that cause rolled back transactions but
which don't really ensure a valid state transfer seems like making trouble. Maybe
better to just leave the behavior from 1.2.4 and wait for the real fix in 1.4.
|
| Yes, I agree. With the change, of course, that we fail if the state
| cannot be retrieved. Maybe we should also increase the state transfer
| timeout in the XML default files
|
|
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3970500#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...