[infinispan-dev] Fwd: Stale data read when L1 invalidation happens while UnionConsistentHash is in use

galder at jboss.org galder at jboss.org
Wed May 5 11:13:03 EDT 2010


See below: 

----- "Manik Surtani" <manik at jboss.org> wrote:

> On 3 May 2010, at 08:51, Galder Zamarreno wrote:
> 
> > Resending without log until the message is approved.
> > 
> > --
> > Galder Zamarreño
> > Sr. Software Engineer
> > Infinispan, JBoss Cache
> > 
> > ----- Forwarded Message -----
> > From: galder at redhat.com
> > To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
> > Sent: Friday, April 30, 2010 6:30:05 PM GMT +01:00 Amsterdam /
> Berlin / Bern / Rome / Stockholm / Vienna
> > Subject: Stale data read when L1 invalidation happens while
> UnionConsistentHash is in use
> > 
> > Hi,
> > 
> > I've spent all day chasing down a random Hot Rod testsuite failure
> related to distribution. This is the last hurdle to close
> https://jira.jboss.org/jira/browse/ISPN-411. In
> HotRodDistributionTest, which is still to be committed, I test adding
> a new node, doing a put on this node, and then doing a get in a
> different node and making sure that I get what was put. The test
> randomly fails saying that the get returns the old value. The failure
> is nothing to do with Hot Rod itself but rather a race condition where
> union consistent hash is used. Let me explain:
> > 
> > 1. An earlier operation had set
> "k-testDistributedPutWithTopologyChanges" key to
> "v5-testDistributedPutWithTopologyChanges".
> > 2. Start a new hot rod server in eq-7969.
> > 2. eq-7969 node calls a put on that key with
> "v6-testDistributedPutWithTopologyChanges". Recipients for the put
> are: eq-7969 and eq-61332.
> > 3. eq-7969 sends an invalidate L1 to all, including eq-13415
> > 4. eq-13415 should invalidate
> "k-testDistributedPutWithTopologyChanges" but it doesn't, since it
> considers that "k-testDistributedPutWithTopologyChanges" is local to
> eq-13415:
> > 
> > 2010-04-30 18:02:19,907 6046  TRACE
> [org.infinispan.distribution.DefaultConsistentHash]
> (OOB-2,Infinispan-Cluster,eq-13415:) Hash code for key
> CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
> 116, 101, 115, 116, 68, 105, 115, 116, ..]}} is 344897059
> > 2010-04-30 18:02:19,907 6046  TRACE
> [org.infinispan.distribution.DefaultConsistentHash]
> (OOB-2,Infinispan-Cluster,eq-13415:) Candidates for key
> CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
> 116, 101, 115, 116, 68, 105, 115, 116, ..]}} are {5458=eq-7969,
> 6831=eq-61332}
> > 2010-04-30 18:02:19,907 6046  TRACE
> [org.infinispan.distribution.DistributionManagerImpl]
> (OOB-2,Infinispan-Cluster,eq-13415:) Is local
> CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
> 116, 101, 115, 116, 68, 105, 115, 116, ..]}} to eq-13415 query returns
> true and consistentHash is
> org.infinispan.distribution.UnionConsistentHash at 10747b4
> > 
> > This is a log with log messages that I added to debug it. The key
> factor here is that UnionConsistentHash is in use, probably due to
> rehashing not having fully finished.
> > 
> > 5. The end result is that a read of
> "k-testDistributedPutWithTopologyChanges" in eq-13415 returns
> "v5-testDistributedPutWithTopologyChanges".
> > 
> > I thought that maybe we could be more conservative here and if
> rehashing is in progress (or UnionConsistentHash is in use) invalidate
> regardless. Assuming that a put always follows an invalidation in
> distribution and not viceversa, that would be fine. The only downside
> is that you'd be invalidating too much but put would replace the data
> in the node where invalidation should not have happened but it did, so
> not a problem.
> > 
> > Thoughts? Alternatively, maybe I need to shape my test so that I
> wait for rehashing to finish, but the problem would still be there.
> 
> Yes, this seems to be a bug with concurrent rehashing and invalidation
> rather than HotRod.
> 
> Could you modify your test to so the following:
> 
> 1.  start 2 caches C1 and C2.
> 2.  put a key K such that K maps on to C1 and C2
> 3.  add a new node, C3.  K should now map to C1 and C3.
> 4.  Modify the value on C1 *before* rehashing completes.
> 5.  See if we see the stale value on C2.
> 
> To do this you would need a custom object for K that hashes the way
> you would expect (this could be hardcoded) and a value which blocks
> when serializing so we can control how long rehashing takes.

Since logical addresses are used underneath and these change from one run to the other, I'm not sure how I can generate such key programatically. It's even more complicated to figure out a key that will later, when C3 starts, map to it. Without having these addresses locked somehow, or their hash codes, I can't see how this is doable. IOW, to be able to do this, I need to mock these addresses into giving fixed as hash codes. I'll dig further into this.

> 
> I never promised the test would be simple!  :)
> 
> Cheers
> Manik
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev



More information about the infinispan-dev mailing list