[infinispan-issues] [JBoss JIRA] Created: (ISPN-425) Stale data read when L1 invalidation happens while UnionConsistentHash is in use

Thursday, 6 May 2010

Stale data read when L1 invalidation happens while UnionConsistentHash is in use
--------------------------------------------------------------------------------

                 Key: ISPN-425
                 URL: https://jira.jboss.org/jira/browse/ISPN-425
             Project: Infinispan
          Issue Type: Bug
          Components: Distributed Cache
    Affects Versions: 4.1.0.BETA1
            Reporter: Galder Zamarreno
            Assignee: Galder Zamarreno
             Fix For: 4.1.0.CR1

See below: 

----- "Manik Surtani" <manik(a)jboss.org&gt; wrote:

...
 On 3 May 2010, at 08:51, Galder Zamarreno wrote:

 > Resending without log until the message is approved.
 > 
 > --
 > Galder Zamarreño
 > Sr. Software Engineer
 > Infinispan, JBoss Cache
 > 
 > ----- Forwarded Message -----
 > From: galder(a)redhat.com
 > To: "infinispan -Dev List" <infinispan-dev(a)lists.jboss.org&gt;
 > Sent: Friday, April 30, 2010 6:30:05 PM GMT +01:00 Amsterdam /
 Berlin / Bern / Rome / Stockholm / Vienna
 > Subject: Stale data read when L1 invalidation happens while
 UnionConsistentHash is in use
 > 
 > Hi,
 > 
 > I've spent all day chasing down a random Hot Rod testsuite failure
 related to distribution. This is the last hurdle to close
 https://jira.jboss.org/jira/browse/ISPN-411. In
 HotRodDistributionTest, which is still to be committed, I test adding
 a new node, doing a put on this node, and then doing a get in a
 different node and making sure that I get what was put. The test
 randomly fails saying that the get returns the old value. The failure
 is nothing to do with Hot Rod itself but rather a race condition where
 union consistent hash is used. Let me explain:
 > 
 > 1. An earlier operation had set
 "k-testDistributedPutWithTopologyChanges" key to
 "v5-testDistributedPutWithTopologyChanges".
 > 2. Start a new hot rod server in eq-7969.
 > 2. eq-7969 node calls a put on that key with
 "v6-testDistributedPutWithTopologyChanges". Recipients for the put
 are: eq-7969 and eq-61332.
 > 3. eq-7969 sends an invalidate L1 to all, including eq-13415
 > 4. eq-13415 should invalidate
 "k-testDistributedPutWithTopologyChanges" but it doesn't, since it
 considers that "k-testDistributedPutWithTopologyChanges" is local to
 eq-13415:
 > 
 > 2010-04-30 18:02:19,907 6046  TRACE
 [org.infinispan.distribution.DefaultConsistentHash]
 (OOB-2,Infinispan-Cluster,eq-13415:) Hash code for key
 CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
 116, 101, 115, 116, 68, 105, 115, 116, ..]}} is 344897059
 > 2010-04-30 18:02:19,907 6046  TRACE
 [org.infinispan.distribution.DefaultConsistentHash]
 (OOB-2,Infinispan-Cluster,eq-13415:) Candidates for key
 CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
 116, 101, 115, 116, 68, 105, 115, 116, ..]}} are {5458=eq-7969,
 6831=eq-61332}
 > 2010-04-30 18:02:19,907 6046  TRACE
 [org.infinispan.distribution.DistributionManagerImpl]
 (OOB-2,Infinispan-Cluster,eq-13415:) Is local
 CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
 116, 101, 115, 116, 68, 105, 115, 116, ..]}} to eq-13415 query returns
 true and consistentHash is
 org.infinispan.distribution.UnionConsistentHash@10747b4
 > 
 > This is a log with log messages that I added to debug it. The key
 factor here is that UnionConsistentHash is in use, probably due to
 rehashing not having fully finished.
 > 
 > 5. The end result is that a read of
 "k-testDistributedPutWithTopologyChanges" in eq-13415 returns
 "v5-testDistributedPutWithTopologyChanges".
 > 
 > I thought that maybe we could be more conservative here and if
 rehashing is in progress (or UnionConsistentHash is in use) invalidate
 regardless. Assuming that a put always follows an invalidation in
 distribution and not viceversa, that would be fine. The only downside
 is that you'd be invalidating too much but put would replace the data
 in the node where invalidation should not have happened but it did, so
 not a problem.
 > 
 > Thoughts? Alternatively, maybe I need to shape my test so that I
 wait for rehashing to finish, but the problem would still be there.

 Yes, this seems to be a bug with concurrent rehashing and invalidation
 rather than HotRod.

 Could you modify your test to so the following:

 1.  start 2 caches C1 and C2.
 2.  put a key K such that K maps on to C1 and C2
 3.  add a new node, C3.  K should now map to C1 and C3.
 4.  Modify the value on C1 *before* rehashing completes.
 5.  See if we see the stale value on C2.

 To do this you would need a custom object for K that hashes the way
 you would expect (this could be hardcoded) and a value which blocks
 when serializing so we can control how long rehashing takes. 
Since logical addresses are used underneath and these change from one run to the other,
I'm not sure how I can generate such key programatically. It's even more
complicated to figure out a key that will later, when C3 starts, map to it. Without having
these addresses locked somehow, or their hash codes, I can't see how this is doable.
IOW, to be able to do this, I need to mock these addresses into giving fixed as hash
codes. I'll dig further into this.

...

 I never promised the test would be simple!  :)

 Cheers
 Manik
 --
 Manik Surtani
 manik(a)jboss.org
 Lead, Infinispan
 Lead, JBoss Cache
 http://www.infinispan.org
 http://www.jbosscache.org

 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev 
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] Created: (ISPN-425) Stale data read when L1 invalidation happens while UnionConsistentHash is in use