[infinispan-issues] [JBoss JIRA] (ISPN-4995) ClusteredGet served for non-member of CH

Fri Nov 21 06:14:39 EST 2014

    [ https://issues.jboss.org/browse/ISPN-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021870#comment-13021870 ] 

Radim Vansa commented on ISPN-4995:
-----------------------------------

I think that the example with detecting post-write and pre-write status is somewhat different: this is experienced *during* the write and therefore the write and those reads are not causally related. I require the second write to be happening *causally after* the first write, and therefore, if any read finds out that the second write is in progress, the first write must have finished successfully and further reads should be able to see that.

The second level of indirection does not change this: either the first write is completely confirmed and the second can begin, or the second must not be executed and therefore clients can't expect to read the first one.

I know I am forcing you into consistency guarantees *I want*, however, this is the closest I can figure out from the definition of 'not losing any data' which I should test :) So I want to enforce PRAM consistency. Eventual consistency may be more performant, but without any defined guarantees when the 'eventual' operation happens, I could read nulls from the cache all the time and it would be OK.

> ClusteredGet served for non-member of CH
> ----------------------------------------
>
>                 Key: ISPN-4995
>                 URL: https://issues.jboss.org/browse/ISPN-4995
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core, State Transfer
>            Reporter: Radim Vansa
>            Priority: Critical
>
> When nodes accept ClusteredGetCommand from node that is not member of CH, it can happen that when one thread does
> {code}
> put(K1, V1);
> put(K2, V2)
> {code}
> and another gets
> {code}
> get(K2) -> V2
> get(K1) -> V0 (some old value)
> {code}
> edg-perf01, 02 and 03 share this view and topology:
> {code}
> 04:40:08,714 TRACE [org.jgroups.protocols.FD_SOCK] (INT-8,edg-perf01-63779) edg-perf01-63779: i-have-sock: edg-perf02-45117 --> 172.18.1.3:37476 (cache is {edg-perf01-63779=172.18.1.1:40099, edg-perf02-45117=172.18.1.3:37476})
> 04:40:08,715 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t6) Received new cluster view: 8, isCoordinator = true, becameCoordinator = false
> 04:40:11,203 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl] (transport-thread--p2-t1) Updating local consistent hash(es) for cache testCache: new topology = CacheTopology{id=16, rebalanceId=4, currentC
> H=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf02-45117: 171+170, edg-perf03-6264: 171+171, edg-perf01-63779: 170+171]}, pendingCH=null, unionCH=null, actualMembers=[edg-perf02-45117, edg-perf03-6264, edg-perf01-63779]}
> {code}
> Later, edg-perf02 and edg-perf03 get new view and install a new topology, where edg-perf01 does not exist:
> {code}
> 04:41:13,681 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf03-6264) ISPN000093: Received new, MERGED cluster view for channel default: MergeView::[edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264, edg-perf04-10989], 1 subgroups: [edg-perf04-10989|7] (1) [edg-perf04-10989]
> 04:41:13,681 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t22) Received new cluster view: 9, isCoordinator = false, becameCoordinator = false
> 04:41:13,760 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (remote-thread--p3-t32) Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=testCache, type=CH_UPDATE, sender=edg-perf02-45117, joinInfo=null, topologyId=18, rebalanceId=4, currentCH=DefaultConsistentHash{ns = 512, owners = (2)[edg-perf02-45117: 256+85, edg-perf03-6264: 256+86]}, pendingCH=null, availabilityMode=AVAILABLE, actualMembers=[edg-perf02-45117, edg-perf03-6264], throwable=null, viewId=9}[sender=edg-perf02-45117]
> {code}
> After that, edg-perf04 writes to {{key_00000000000020DB}} which is currently owned only by edg-perf03 - this key servers as K1 in example above. It is not backed up to edg-perf01, but edg-perf01 still thinks it's an owner of this key as it did not get any new view (this is a log from edg-perf03) :
> {code}
> 04:41:30,884 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (remote-thread--p3-t45) edg-perf03-6264 invoking PutKeyValueCommand{key=key_00000000000020DB, value=[33 #4: 0, 169, 284, 634, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false, valueMatcher=MATCH_ALWAYS,  metadata=EmbeddedMetadata{version=null}, successful=true} to recipient list [edg-perf03-6264] with options RpcOptions{timeout=60000, unit=MILLISECONDS, fifoOrder=true, totalOrder=false, responseFilter=null, responseMode=SYNCHRONOUS, skipReplicationQueue=false}
> {code}
> Later, edg-perf04 writes to another key {{stressor_33}} (K2 in the example) value with operationId=650 (previous value is 600) which is replicated to edg-perf02 and edg-perf03.
> Now a merge view with all 4 nodes is installed:
> {code}
> 04:41:31,258 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf01-63779) ISPN000093: Received new, MERGED cluster view for channel default: MergeView::[edg-perf01-63779|10] (4) [edg-perf01-63779, edg-perf03-6264, edg-perf02-45117, edg-perf04-10989], 6 subgroups: [edg-perf02-45117|7] (2) [edg-perf02-45117, edg-perf03-6264], [edg-perf01-63779|4] (2) [edg-perf01-63779, edg-perf02-45117], [edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264, edg-perf04-10989], [edg-perf03-6264|4] (2) [edg-perf03-6264, edg-perf04-10989], [edg-perf01-63779|8] (3) [edg-perf01-63779, edg-perf02-45117, edg-perf03-6264], [edg-perf01-63779|6] (1) [edg-perf01-63779]
> 04:41:31,258 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t2) Received new cluster view: 10, isCoordinator = true, becameCoordinator = false
> {code}
> edg-perf01 now issues a remote get to edg-perf02 for key stressor_33 and receives the correct answer (operationId=650):
> {code}
> 04:41:32,494 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (BackgroundOps-Checker-1) Response(s) to ClusteredGetCommand{key=stressor_33, flags=null} is {edg-perf02-45117=SuccessfulResponse{responseValue=ImmortalCacheValue {value=LastOperation{operationId=650, seed=0000A15A4C2DD25A}}} }
> {code}
> However, when edg-perf01 reads {{key_00000000000020DB}}, it loads the old value from local data container as no CH update/rebalance happened so far:
> {code}
> 04:41:32,496 TRACE [org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl] (BackgroundOps-Checker-1) Checking availability for key=key_00000000000020DB, status=AVAILABLE
> 04:41:32,497 ERROR [org.radargun.stages.cache.background.LogChecker] (BackgroundOps-Checker-1) Missing operation 634 for thread 33 on key 8411 (key_00000000000020DB) 
> 04:41:32,499 DEBUG [org.radargun.service.InfinispanDebugable] (BackgroundOps-Checker-1) Debug info for key testCache key_00000000000020DB: owners=edg-perf01-63779, edg-perf03-6264, local=true, uncertain=false, container.key_00000000000020DB=ImmortalCacheEntry[key=key_00000000000020DB, value=[33 #3: 0, 169, 284, ], created=-1, isCreated=false, lastUsed=-1, isChanged=false, expires=-1, isExpired=false, canExpire=false, isEvicted=true, isRemoved=false, isValid=false, lifespan=-1, maxIdle=-1], segmentId=173
> {code}
> Note that this was found on branch https://github.com/infinispan/infinispan/pull/3062/files trying to fix ISPN-4949.

--
This message was sent by Atlassian JIRA
(v6.3.8#6338)