[infinispan-issues] [JBoss JIRA] Commented: (ISPN-425) Stale data read when L1 invalidation happens while UnionConsistentHash is in use

Manik Surtani (JIRA) jira-events at lists.jboss.org
Mon May 10 06:13:06 EDT 2010


    [ https://jira.jboss.org/jira/browse/ISPN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12529853#action_12529853 ] 

Manik Surtani commented on ISPN-425:
------------------------------------

Right, it looks like you have 2 distinct problems here, clouded together by the fact that you sometimes have L1-on-rehash set to true.

I think the issue with a PUT on a Joiner before rehash completes is a pretty odd bug (how did you get to do this PUT in the first place - cache.start() shouldn't return until the join  - and rehash - is complete!)

The second issue is with invalidation while an entry is being rehashed/moved.  E.g., "old" owner C2 should accept an invalidation event and not reject it, claiming ownership.

Again, if you could separate your test to isolate these two cases that would make things clearer.

> Stale data read when L1 invalidation happens while UnionConsistentHash is in use
> --------------------------------------------------------------------------------
>
>                 Key: ISPN-425
>                 URL: https://jira.jboss.org/jira/browse/ISPN-425
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Distributed Cache
>    Affects Versions: 4.1.0.BETA1
>            Reporter: Galder Zamarreno
>            Assignee: Galder Zamarreno
>            Priority: Blocker
>             Fix For: 4.1.0.CR1
>
>         Attachments: infinispan_isL1OnRehash_false.log, infinispan_isL1OnRehash_true.log.zip
>
>
> See below: 
> ----- "Manik Surtani" <manik at jboss.org> wrote:
> > On 3 May 2010, at 08:51, Galder Zamarreno wrote:
> > 
> > > Resending without log until the message is approved.
> > > 
> > > --
> > > Galder Zamarreño
> > > Sr. Software Engineer
> > > Infinispan, JBoss Cache
> > > 
> > > ----- Forwarded Message -----
> > > From: galder at redhat.com
> > > To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
> > > Sent: Friday, April 30, 2010 6:30:05 PM GMT +01:00 Amsterdam /
> > Berlin / Bern / Rome / Stockholm / Vienna
> > > Subject: Stale data read when L1 invalidation happens while
> > UnionConsistentHash is in use
> > > 
> > > Hi,
> > > 
> > > I've spent all day chasing down a random Hot Rod testsuite failure
> > related to distribution. This is the last hurdle to close
> > https://jira.jboss.org/jira/browse/ISPN-411. In
> > HotRodDistributionTest, which is still to be committed, I test adding
> > a new node, doing a put on this node, and then doing a get in a
> > different node and making sure that I get what was put. The test
> > randomly fails saying that the get returns the old value. The failure
> > is nothing to do with Hot Rod itself but rather a race condition where
> > union consistent hash is used. Let me explain:
> > > 
> > > 1. An earlier operation had set
> > "k-testDistributedPutWithTopologyChanges" key to
> > "v5-testDistributedPutWithTopologyChanges".
> > > 2. Start a new hot rod server in eq-7969.
> > > 2. eq-7969 node calls a put on that key with
> > "v6-testDistributedPutWithTopologyChanges". Recipients for the put
> > are: eq-7969 and eq-61332.
> > > 3. eq-7969 sends an invalidate L1 to all, including eq-13415
> > > 4. eq-13415 should invalidate
> > "k-testDistributedPutWithTopologyChanges" but it doesn't, since it
> > considers that "k-testDistributedPutWithTopologyChanges" is local to
> > eq-13415:
> > > 
> > > 2010-04-30 18:02:19,907 6046  TRACE
> > [org.infinispan.distribution.DefaultConsistentHash]
> > (OOB-2,Infinispan-Cluster,eq-13415:) Hash code for key
> > CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
> > 116, 101, 115, 116, 68, 105, 115, 116, ..]}} is 344897059
> > > 2010-04-30 18:02:19,907 6046  TRACE
> > [org.infinispan.distribution.DefaultConsistentHash]
> > (OOB-2,Infinispan-Cluster,eq-13415:) Candidates for key
> > CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
> > 116, 101, 115, 116, 68, 105, 115, 116, ..]}} are {5458=eq-7969,
> > 6831=eq-61332}
> > > 2010-04-30 18:02:19,907 6046  TRACE
> > [org.infinispan.distribution.DistributionManagerImpl]
> > (OOB-2,Infinispan-Cluster,eq-13415:) Is local
> > CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
> > 116, 101, 115, 116, 68, 105, 115, 116, ..]}} to eq-13415 query returns
> > true and consistentHash is
> > org.infinispan.distribution.UnionConsistentHash at 10747b4
> > > 
> > > This is a log with log messages that I added to debug it. The key
> > factor here is that UnionConsistentHash is in use, probably due to
> > rehashing not having fully finished.
> > > 
> > > 5. The end result is that a read of
> > "k-testDistributedPutWithTopologyChanges" in eq-13415 returns
> > "v5-testDistributedPutWithTopologyChanges".
> > > 
> > > I thought that maybe we could be more conservative here and if
> > rehashing is in progress (or UnionConsistentHash is in use) invalidate
> > regardless. Assuming that a put always follows an invalidation in
> > distribution and not viceversa, that would be fine. The only downside
> > is that you'd be invalidating too much but put would replace the data
> > in the node where invalidation should not have happened but it did, so
> > not a problem.
> > > 
> > > Thoughts? Alternatively, maybe I need to shape my test so that I
> > wait for rehashing to finish, but the problem would still be there.
> > 
> > Yes, this seems to be a bug with concurrent rehashing and invalidation
> > rather than HotRod.
> > 
> > Could you modify your test to so the following:
> > 
> > 1.  start 2 caches C1 and C2.
> > 2.  put a key K such that K maps on to C1 and C2
> > 3.  add a new node, C3.  K should now map to C1 and C3.
> > 4.  Modify the value on C1 *before* rehashing completes.
> > 5.  See if we see the stale value on C2.
> > 
> > To do this you would need a custom object for K that hashes the way
> > you would expect (this could be hardcoded) and a value which blocks
> > when serializing so we can control how long rehashing takes.
> Since logical addresses are used underneath and these change from one run to the other, I'm not sure how I can generate such key programatically. It's even more complicated to figure out a key that will later, when C3 starts, map to it. Without having these addresses locked somehow, or their hash codes, I can't see how this is doable. IOW, to be able to do this, I need to mock these addresses into giving fixed as hash codes. I'll dig further into this.
> > 
> > I never promised the test would be simple!  :)
> > 
> > Cheers
> > Manik
> > --
> > Manik Surtani
> > manik at jboss.org
> > Lead, Infinispan
> > Lead, JBoss Cache
> > http://www.infinispan.org
> > http://www.jbosscache.org
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       



More information about the infinispan-issues mailing list