[infinispan-issues] [JBoss JIRA] Commented: (ISPN-425) Stale data read when L1 invalidation happens while UnionConsistentHash is in use
Manik Surtani (JIRA)
jira-events at lists.jboss.org
Mon May 10 06:13:06 EDT 2010
[ https://jira.jboss.org/jira/browse/ISPN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12529853#action_12529853 ]
Manik Surtani commented on ISPN-425:
------------------------------------
Right, it looks like you have 2 distinct problems here, clouded together by the fact that you sometimes have L1-on-rehash set to true.
I think the issue with a PUT on a Joiner before rehash completes is a pretty odd bug (how did you get to do this PUT in the first place - cache.start() shouldn't return until the join - and rehash - is complete!)
The second issue is with invalidation while an entry is being rehashed/moved. E.g., "old" owner C2 should accept an invalidation event and not reject it, claiming ownership.
Again, if you could separate your test to isolate these two cases that would make things clearer.
> Stale data read when L1 invalidation happens while UnionConsistentHash is in use
> --------------------------------------------------------------------------------
>
> Key: ISPN-425
> URL: https://jira.jboss.org/jira/browse/ISPN-425
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 4.1.0.BETA1
> Reporter: Galder Zamarreno
> Assignee: Galder Zamarreno
> Priority: Blocker
> Fix For: 4.1.0.CR1
>
> Attachments: infinispan_isL1OnRehash_false.log, infinispan_isL1OnRehash_true.log.zip
>
>
> See below:
> ----- "Manik Surtani" <manik at jboss.org> wrote:
> > On 3 May 2010, at 08:51, Galder Zamarreno wrote:
> >
> > > Resending without log until the message is approved.
> > >
> > > --
> > > Galder Zamarreño
> > > Sr. Software Engineer
> > > Infinispan, JBoss Cache
> > >
> > > ----- Forwarded Message -----
> > > From: galder at redhat.com
> > > To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
> > > Sent: Friday, April 30, 2010 6:30:05 PM GMT +01:00 Amsterdam /
> > Berlin / Bern / Rome / Stockholm / Vienna
> > > Subject: Stale data read when L1 invalidation happens while
> > UnionConsistentHash is in use
> > >
> > > Hi,
> > >
> > > I've spent all day chasing down a random Hot Rod testsuite failure
> > related to distribution. This is the last hurdle to close
> > https://jira.jboss.org/jira/browse/ISPN-411. In
> > HotRodDistributionTest, which is still to be committed, I test adding
> > a new node, doing a put on this node, and then doing a get in a
> > different node and making sure that I get what was put. The test
> > randomly fails saying that the get returns the old value. The failure
> > is nothing to do with Hot Rod itself but rather a race condition where
> > union consistent hash is used. Let me explain:
> > >
> > > 1. An earlier operation had set
> > "k-testDistributedPutWithTopologyChanges" key to
> > "v5-testDistributedPutWithTopologyChanges".
> > > 2. Start a new hot rod server in eq-7969.
> > > 2. eq-7969 node calls a put on that key with
> > "v6-testDistributedPutWithTopologyChanges". Recipients for the put
> > are: eq-7969 and eq-61332.
> > > 3. eq-7969 sends an invalidate L1 to all, including eq-13415
> > > 4. eq-13415 should invalidate
> > "k-testDistributedPutWithTopologyChanges" but it doesn't, since it
> > considers that "k-testDistributedPutWithTopologyChanges" is local to
> > eq-13415:
> > >
> > > 2010-04-30 18:02:19,907 6046 TRACE
> > [org.infinispan.distribution.DefaultConsistentHash]
> > (OOB-2,Infinispan-Cluster,eq-13415:) Hash code for key
> > CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
> > 116, 101, 115, 116, 68, 105, 115, 116, ..]}} is 344897059
> > > 2010-04-30 18:02:19,907 6046 TRACE
> > [org.infinispan.distribution.DefaultConsistentHash]
> > (OOB-2,Infinispan-Cluster,eq-13415:) Candidates for key
> > CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
> > 116, 101, 115, 116, 68, 105, 115, 116, ..]}} are {5458=eq-7969,
> > 6831=eq-61332}
> > > 2010-04-30 18:02:19,907 6046 TRACE
> > [org.infinispan.distribution.DistributionManagerImpl]
> > (OOB-2,Infinispan-Cluster,eq-13415:) Is local
> > CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
> > 116, 101, 115, 116, 68, 105, 115, 116, ..]}} to eq-13415 query returns
> > true and consistentHash is
> > org.infinispan.distribution.UnionConsistentHash at 10747b4
> > >
> > > This is a log with log messages that I added to debug it. The key
> > factor here is that UnionConsistentHash is in use, probably due to
> > rehashing not having fully finished.
> > >
> > > 5. The end result is that a read of
> > "k-testDistributedPutWithTopologyChanges" in eq-13415 returns
> > "v5-testDistributedPutWithTopologyChanges".
> > >
> > > I thought that maybe we could be more conservative here and if
> > rehashing is in progress (or UnionConsistentHash is in use) invalidate
> > regardless. Assuming that a put always follows an invalidation in
> > distribution and not viceversa, that would be fine. The only downside
> > is that you'd be invalidating too much but put would replace the data
> > in the node where invalidation should not have happened but it did, so
> > not a problem.
> > >
> > > Thoughts? Alternatively, maybe I need to shape my test so that I
> > wait for rehashing to finish, but the problem would still be there.
> >
> > Yes, this seems to be a bug with concurrent rehashing and invalidation
> > rather than HotRod.
> >
> > Could you modify your test to so the following:
> >
> > 1. start 2 caches C1 and C2.
> > 2. put a key K such that K maps on to C1 and C2
> > 3. add a new node, C3. K should now map to C1 and C3.
> > 4. Modify the value on C1 *before* rehashing completes.
> > 5. See if we see the stale value on C2.
> >
> > To do this you would need a custom object for K that hashes the way
> > you would expect (this could be hardcoded) and a value which blocks
> > when serializing so we can control how long rehashing takes.
> Since logical addresses are used underneath and these change from one run to the other, I'm not sure how I can generate such key programatically. It's even more complicated to figure out a key that will later, when C3 starts, map to it. Without having these addresses locked somehow, or their hash codes, I can't see how this is doable. IOW, to be able to do this, I need to mock these addresses into giving fixed as hash codes. I'll dig further into this.
> >
> > I never promised the test would be simple! :)
> >
> > Cheers
> > Manik
> > --
> > Manik Surtani
> > manik at jboss.org
> > Lead, Infinispan
> > Lead, JBoss Cache
> > http://www.infinispan.org
> > http://www.jbosscache.org
> >
> >
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list