]
Dan Berindei resolved ISPN-4995.
--------------------------------
Fix Version/s: 7.0.3.Final
Resolution: Won't Fix
I documented the current behaviour in
ClusteredGet served for non-member of CH
----------------------------------------
Key: ISPN-4995
URL:
https://issues.jboss.org/browse/ISPN-4995
Project: Infinispan
Issue Type: Bug
Components: Core, State Transfer
Reporter: Radim Vansa
Assignee: Dan Berindei
Priority: Critical
Fix For: 7.0.3.Final
When nodes accept ClusteredGetCommand from node that is not member of CH, it can happen
that when one thread does
{code}
put(K1, V1);
put(K2, V2)
{code}
and another gets
{code}
get(K2) -> V2
get(K1) -> V0 (some old value)
{code}
edg-perf01, 02 and 03 share this view and topology:
{code}
04:40:08,714 TRACE [org.jgroups.protocols.FD_SOCK] (INT-8,edg-perf01-63779)
edg-perf01-63779: i-have-sock: edg-perf02-45117 --> 172.18.1.3:37476 (cache is
{edg-perf01-63779=172.18.1.1:40099, edg-perf02-45117=172.18.1.3:37476})
04:40:08,715 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl]
(transport-thread--p2-t6) Received new cluster view: 8, isCoordinator = true,
becameCoordinator = false
04:40:11,203 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl]
(transport-thread--p2-t1) Updating local consistent hash(es) for cache testCache: new
topology = CacheTopology{id=16, rebalanceId=4, currentC
H=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf02-45117: 171+170,
edg-perf03-6264: 171+171, edg-perf01-63779: 170+171]}, pendingCH=null, unionCH=null,
actualMembers=[edg-perf02-45117, edg-perf03-6264, edg-perf01-63779]}
{code}
Later, edg-perf02 and edg-perf03 get new view and install a new topology, where
edg-perf01 does not exist:
{code}
04:41:13,681 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-2,edg-perf03-6264) ISPN000093: Received new, MERGED cluster view for channel
default: MergeView::[edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264,
edg-perf04-10989], 1 subgroups: [edg-perf04-10989|7] (1) [edg-perf04-10989]
04:41:13,681 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl]
(transport-thread--p2-t22) Received new cluster view: 9, isCoordinator = false,
becameCoordinator = false
04:41:13,760 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(remote-thread--p3-t32) Attempting to execute non-CacheRpcCommand command:
CacheTopologyControlCommand{cache=testCache, type=CH_UPDATE, sender=edg-perf02-45117,
joinInfo=null, topologyId=18, rebalanceId=4, currentCH=DefaultConsistentHash{ns = 512,
owners = (2)[edg-perf02-45117: 256+85, edg-perf03-6264: 256+86]}, pendingCH=null,
availabilityMode=AVAILABLE, actualMembers=[edg-perf02-45117, edg-perf03-6264],
throwable=null, viewId=9}[sender=edg-perf02-45117]
{code}
After that, edg-perf04 writes to {{key_00000000000020DB}} which is currently owned only
by edg-perf03 - this key servers as K1 in example above. It is not backed up to
edg-perf01, but edg-perf01 still thinks it's an owner of this key as it did not get
any new view (this is a log from edg-perf03) :
{code}
04:41:30,884 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (remote-thread--p3-t45)
edg-perf03-6264 invoking PutKeyValueCommand{key=key_00000000000020DB, value=[33 #4: 0,
169, 284, 634, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false,
valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} to
recipient list [edg-perf03-6264] with options RpcOptions{timeout=60000, unit=MILLISECONDS,
fifoOrder=true, totalOrder=false, responseFilter=null, responseMode=SYNCHRONOUS,
skipReplicationQueue=false}
{code}
Later, edg-perf04 writes to another key {{stressor_33}} (K2 in the example) value with
operationId=650 (previous value is 600) which is replicated to edg-perf02 and edg-perf03.
Now a merge view with all 4 nodes is installed:
{code}
04:41:31,258 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-2,edg-perf01-63779) ISPN000093: Received new, MERGED cluster view for channel
default: MergeView::[edg-perf01-63779|10] (4) [edg-perf01-63779, edg-perf03-6264,
edg-perf02-45117, edg-perf04-10989], 6 subgroups: [edg-perf02-45117|7] (2)
[edg-perf02-45117, edg-perf03-6264], [edg-perf01-63779|4] (2) [edg-perf01-63779,
edg-perf02-45117], [edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264,
edg-perf04-10989], [edg-perf03-6264|4] (2) [edg-perf03-6264, edg-perf04-10989],
[edg-perf01-63779|8] (3) [edg-perf01-63779, edg-perf02-45117, edg-perf03-6264],
[edg-perf01-63779|6] (1) [edg-perf01-63779]
04:41:31,258 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl]
(transport-thread--p2-t2) Received new cluster view: 10, isCoordinator = true,
becameCoordinator = false
{code}
edg-perf01 now issues a remote get to edg-perf02 for key stressor_33 and receives the
correct answer (operationId=650):
{code}
04:41:32,494 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (BackgroundOps-Checker-1)
Response(s) to ClusteredGetCommand{key=stressor_33, flags=null} is
{edg-perf02-45117=SuccessfulResponse{responseValue=ImmortalCacheValue
{value=LastOperation{operationId=650, seed=0000A15A4C2DD25A}}} }
{code}
However, when edg-perf01 reads {{key_00000000000020DB}}, it loads the old value from
local data container as no CH update/rebalance happened so far:
{code}
04:41:32,496 TRACE [org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl]
(BackgroundOps-Checker-1) Checking availability for key=key_00000000000020DB,
status=AVAILABLE
04:41:32,497 ERROR [org.radargun.stages.cache.background.LogChecker]
(BackgroundOps-Checker-1) Missing operation 634 for thread 33 on key 8411
(key_00000000000020DB)
04:41:32,499 DEBUG [org.radargun.service.InfinispanDebugable] (BackgroundOps-Checker-1)
Debug info for key testCache key_00000000000020DB: owners=edg-perf01-63779,
edg-perf03-6264, local=true, uncertain=false,
container.key_00000000000020DB=ImmortalCacheEntry[key=key_00000000000020DB, value=[33 #3:
0, 169, 284, ], created=-1, isCreated=false, lastUsed=-1, isChanged=false, expires=-1,
isExpired=false, canExpire=false, isEvicted=true, isRemoved=false, isValid=false,
lifespan=-1, maxIdle=-1], segmentId=173
{code}
Note that this was found on branch
https://github.com/infinispan/infinispan/pull/3062/files trying to fix ISPN-4949.