[infinispan-dev] Fwd: Stale data read when L1 invalidation happens while UnionConsistentHash is in use

Galder Zamarreno galder at redhat.com
Mon May 3 03:51:17 EDT 2010


Resending without log until the message is approved.

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache

----- Forwarded Message -----
From: galder at redhat.com
To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
Sent: Friday, April 30, 2010 6:30:05 PM GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna
Subject: Stale data read when L1 invalidation happens while UnionConsistentHash is in use

Hi,

I've spent all day chasing down a random Hot Rod testsuite failure related to distribution. This is the last hurdle to close https://jira.jboss.org/jira/browse/ISPN-411. In HotRodDistributionTest, which is still to be committed, I test adding a new node, doing a put on this node, and then doing a get in a different node and making sure that I get what was put. The test randomly fails saying that the get returns the old value. The failure is nothing to do with Hot Rod itself but rather a race condition where union consistent hash is used. Let me explain:

1. An earlier operation had set "k-testDistributedPutWithTopologyChanges" key to "v5-testDistributedPutWithTopologyChanges".
2. Start a new hot rod server in eq-7969.
2. eq-7969 node calls a put on that key with "v6-testDistributedPutWithTopologyChanges". Recipients for the put are: eq-7969 and eq-61332.
3. eq-7969 sends an invalidate L1 to all, including eq-13415
4. eq-13415 should invalidate "k-testDistributedPutWithTopologyChanges" but it doesn't, since it considers that "k-testDistributedPutWithTopologyChanges" is local to eq-13415:

2010-04-30 18:02:19,907 6046  TRACE [org.infinispan.distribution.DefaultConsistentHash] (OOB-2,Infinispan-Cluster,eq-13415:) Hash code for key CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45, 116, 101, 115, 116, 68, 105, 115, 116, ..]}} is 344897059
2010-04-30 18:02:19,907 6046  TRACE [org.infinispan.distribution.DefaultConsistentHash] (OOB-2,Infinispan-Cluster,eq-13415:) Candidates for key CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45, 116, 101, 115, 116, 68, 105, 115, 116, ..]}} are {5458=eq-7969, 6831=eq-61332}
2010-04-30 18:02:19,907 6046  TRACE [org.infinispan.distribution.DistributionManagerImpl] (OOB-2,Infinispan-Cluster,eq-13415:) Is local CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45, 116, 101, 115, 116, 68, 105, 115, 116, ..]}} to eq-13415 query returns true and consistentHash is org.infinispan.distribution.UnionConsistentHash at 10747b4

This is a log with log messages that I added to debug it. The key factor here is that UnionConsistentHash is in use, probably due to rehashing not having fully finished.

5. The end result is that a read of "k-testDistributedPutWithTopologyChanges" in eq-13415 returns "v5-testDistributedPutWithTopologyChanges".

I thought that maybe we could be more conservative here and if rehashing is in progress (or UnionConsistentHash is in use) invalidate regardless. Assuming that a put always follows an invalidation in distribution and not viceversa, that would be fine. The only downside is that you'd be invalidating too much but put would replace the data in the node where invalidation should not have happened but it did, so not a problem.

Thoughts? Alternatively, maybe I need to shape my test so that I wait for rehashing to finish, but the problem would still be there.

Cheers,
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache



More information about the infinispan-dev mailing list