On 24 Nov 2010, at 20:10, Vladimir Blagojevic wrote:
Thanks
I like option 1 as well. I am interested in details of how this can
be done. How do we set lock acquisition timeout for InvalidateL1Command, issue command to
abort specific tx etc etc?
Propose a design, we can review it here.
Cheers.
On 10-11-24 12:36 PM, Manik Surtani wrote:
> Right, I've spotted it. The test failure itself is intermittent due to the way
addresses are organised in the hash wheel, so you are correct that it is a timing issue.
Anyway, it still is a very real problem. Just to re-iterate and to make sure we are
talking about the same thing:
>
> 1. View is {A, B, C}
> 2. K is mapped to {A, B}
> 3. A tx starts to update K, and is prepared. Locks now held for K on {A, B}
> 4. D joins. D is placed on the hash wheel between A and B. So the new view is {A,
D, B, C}
> 5. As per the test (artificial, I know, but could still happen), the tx waits for a
long time before committing. In the case of the test, artificially waits until D has
finished joining before committing, by use of a latch.
> 6. D never joins as even though it receives the prepare for the tx and could
potentially commit itself (as a new owner), it fails as it is unable to invalidate K on
B.
>
> There are a few solutions here:
>
> 1) This is pretty easy to detect. Attempt to acquire the lock with a smaller lock
acquisition timeout and if the transaction is still stuck, abort the transaction and
proceed with the join.
> 2) If the blocking node is *not* the transaction originator (as in this case: the tx
was started on A), then just force lock removal and tx rollback on B *only*. Let the tx
complete on A, since the new joiner will receive the transactional event and will be able
to apply it as a new owner.
>
> My vote is to go for solution 1 - a bit more crude, but 2 would be very complex to
implement. And even then, would only solve for the invalidation being blocked on a node
that did not originate the transaction. E.g., the tx originated on A but the lock issue
was on B. If, however, the tx originated on B, *and* B no longer owns the entry in
question, then 2 is no longer a solution and the only solution would be 1.
>
> Thoughts?
>
> Cheers
> Manik
>
> PS: Do we have a JIRA for this?
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org