On 24 Nov 2010, at 15:36, Manik Surtani wrote:
Right, I've spotted it. The test failure itself is intermittent
due to the way addresses are organised in the hash wheel, so you are correct that it is a
timing issue. Anyway, it still is a very real problem. Just to re-iterate and to make
sure we are talking about the same thing:
1. View is {A, B, C}
2. K is mapped to {A, B}
3. A tx starts to update K, and is prepared. Locks now held for K on {A, B}
4. D joins. D is placed on the hash wheel between A and B. So the new view is {A, D,
B, C}
5. As per the test (artificial, I know, but could still happen), the tx waits for a long
time before committing. In the case of the test, artificially waits until D has finished
joining before committing, by use of a latch.
6. D never joins as even though it receives the prepare for the tx and could potentially
commit itself (as a new owner), it fails as it is unable to invalidate K on B.
There are a few solutions here:
1) This is pretty easy to detect. Attempt to acquire the lock with a smaller lock
acquisition timeout and if the transaction is still stuck, abort the transaction and
proceed with the join.
2) If the blocking node is *not* the transaction originator (as in this case: the tx was
started on A), then just force lock removal and tx rollback on B *only*. Let the tx
complete on A, since the new joiner will receive the transactional event and will be able
to apply it as a new owner.
What I'm saying It might be very wrong, but trying
:)
Isn't it possible to make the invalidation of K on B part of the transaction commit?
I.e. the invalidation on B sees that K is locked by an tx that is not committed and it
skips the invalidation. When tx commits (second phase) it multicasts the commit message to
all the lock owners _including_ B (all the owners == {A, B, D}). When CommitCOmmand is
received, B checks weather or not it still is an data owner for each key: if it is the it
applies it, otherwise it removes it.
My vote is to go for solution 1 - a bit more crude, but 2 would be
very complex to implement. And even then, would only solve for the invalidation being
blocked on a node that did not originate the transaction. E.g., the tx originated on A
but the lock issue was on B. If, however, the tx originated on B, *and* B no longer owns
the entry in question, then 2 is no longer a solution and the only solution would be 1.
Thoughts?
Cheers
Manik
PS: Do we have a JIRA for this?
On 23 Nov 2010, at 13:13, Vladimir Blagojevic wrote:
> On 10-11-23 8:11 AM, Manik Surtani wrote:
>> Is this related?
>>
>>
https://jira.jboss.org/browse/ISPN-595
>>
>> Lets have a chat when you're online later today...
>>
>
> No, this is different I believe. I do not see from his stack trace that
InvalidateCommand is involved.
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev