]
Dan Berindei commented on ISPN-4995:
------------------------------------
[~rvansa] indeed, both in the configuration and in the documentation, the user only sees
the mechanisms we use and not the consistency guarantees we provide with a certain
configuration. As you can see, we're still debating what those guarantees should be
:)
I didn't mean to promote eventual consistence with my example. My point was that the
only way to fix that scenario is to only allow reads from the primary node only, not from
a backup or from L1. That may sound acceptable for HotRod users, but it also means if a
node crashes, we can't respond to any get operations until JGroups suspects that node
and eliminates it from the view, and I'm not sure even HotRod users would like that.
Your request sounds very much like what I had in mind when I was working on retrying
operations after a node crash: writes are
[
],
but overlapping reads can see values out of order. Applying this to your original example,
the problem is not that perf01 can see the updated value of K2, but that get(K1) returns
the old value of K1 after the put(K1, V1) update has finished in the other partition. That
should also be fixable by requiring backup owners to read the value from the primary
owners.
There may be another way to fix this: if we would know that it takes at most {{X}} seconds
to detect a network split, we could delay the installation of the cache topology without
the leaver by another {{X}} seconds to make sure writes are blocked until the minority
partition enters degraded mode. But that would mean write operations are _always_ blocked
for at least {{X}} (and potentially {{2X}} seconds after a crash.
I would rather keep what we have now and document that after a network split with a
majority partition, nodes in the minority partition may return old values until they
detect the split. We should also recommend always using {{FD_ALL[2]}} instead of {{FD}},
because the time to detect a split with {{FD}} can be highly variable.
ClusteredGet served for non-member of CH
----------------------------------------
Key: ISPN-4995
URL:
https://issues.jboss.org/browse/ISPN-4995
Project: Infinispan
Issue Type: Bug
Components: Core, State Transfer
Reporter: Radim Vansa
Priority: Critical
When nodes accept ClusteredGetCommand from node that is not member of CH, it can happen
that when one thread does
{code}
put(K1, V1);
put(K2, V2)
{code}
and another gets
{code}
get(K2) -> V2
get(K1) -> V0 (some old value)
{code}
edg-perf01, 02 and 03 share this view and topology:
{code}
04:40:08,714 TRACE [org.jgroups.protocols.FD_SOCK] (INT-8,edg-perf01-63779)
edg-perf01-63779: i-have-sock: edg-perf02-45117 --> 172.18.1.3:37476 (cache is
{edg-perf01-63779=172.18.1.1:40099, edg-perf02-45117=172.18.1.3:37476})
04:40:08,715 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl]
(transport-thread--p2-t6) Received new cluster view: 8, isCoordinator = true,
becameCoordinator = false
04:40:11,203 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl]
(transport-thread--p2-t1) Updating local consistent hash(es) for cache testCache: new
topology = CacheTopology{id=16, rebalanceId=4, currentC
H=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf02-45117: 171+170,
edg-perf03-6264: 171+171, edg-perf01-63779: 170+171]}, pendingCH=null, unionCH=null,
actualMembers=[edg-perf02-45117, edg-perf03-6264, edg-perf01-63779]}
{code}
Later, edg-perf02 and edg-perf03 get new view and install a new topology, where
edg-perf01 does not exist:
{code}
04:41:13,681 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-2,edg-perf03-6264) ISPN000093: Received new, MERGED cluster view for channel
default: MergeView::[edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264,
edg-perf04-10989], 1 subgroups: [edg-perf04-10989|7] (1) [edg-perf04-10989]
04:41:13,681 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl]
(transport-thread--p2-t22) Received new cluster view: 9, isCoordinator = false,
becameCoordinator = false
04:41:13,760 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(remote-thread--p3-t32) Attempting to execute non-CacheRpcCommand command:
CacheTopologyControlCommand{cache=testCache, type=CH_UPDATE, sender=edg-perf02-45117,
joinInfo=null, topologyId=18, rebalanceId=4, currentCH=DefaultConsistentHash{ns = 512,
owners = (2)[edg-perf02-45117: 256+85, edg-perf03-6264: 256+86]}, pendingCH=null,
availabilityMode=AVAILABLE, actualMembers=[edg-perf02-45117, edg-perf03-6264],
throwable=null, viewId=9}[sender=edg-perf02-45117]
{code}
After that, edg-perf04 writes to {{key_00000000000020DB}} which is currently owned only
by edg-perf03 - this key servers as K1 in example above. It is not backed up to
edg-perf01, but edg-perf01 still thinks it's an owner of this key as it did not get
any new view (this is a log from edg-perf03) :
{code}
04:41:30,884 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (remote-thread--p3-t45)
edg-perf03-6264 invoking PutKeyValueCommand{key=key_00000000000020DB, value=[33 #4: 0,
169, 284, 634, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false,
valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} to
recipient list [edg-perf03-6264] with options RpcOptions{timeout=60000, unit=MILLISECONDS,
fifoOrder=true, totalOrder=false, responseFilter=null, responseMode=SYNCHRONOUS,
skipReplicationQueue=false}
{code}
Later, edg-perf04 writes to another key {{stressor_33}} (K2 in the example) value with
operationId=650 (previous value is 600) which is replicated to edg-perf02 and edg-perf03.
Now a merge view with all 4 nodes is installed:
{code}
04:41:31,258 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-2,edg-perf01-63779) ISPN000093: Received new, MERGED cluster view for channel
default: MergeView::[edg-perf01-63779|10] (4) [edg-perf01-63779, edg-perf03-6264,
edg-perf02-45117, edg-perf04-10989], 6 subgroups: [edg-perf02-45117|7] (2)
[edg-perf02-45117, edg-perf03-6264], [edg-perf01-63779|4] (2) [edg-perf01-63779,
edg-perf02-45117], [edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264,
edg-perf04-10989], [edg-perf03-6264|4] (2) [edg-perf03-6264, edg-perf04-10989],
[edg-perf01-63779|8] (3) [edg-perf01-63779, edg-perf02-45117, edg-perf03-6264],
[edg-perf01-63779|6] (1) [edg-perf01-63779]
04:41:31,258 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl]
(transport-thread--p2-t2) Received new cluster view: 10, isCoordinator = true,
becameCoordinator = false
{code}
edg-perf01 now issues a remote get to edg-perf02 for key stressor_33 and receives the
correct answer (operationId=650):
{code}
04:41:32,494 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (BackgroundOps-Checker-1)
Response(s) to ClusteredGetCommand{key=stressor_33, flags=null} is
{edg-perf02-45117=SuccessfulResponse{responseValue=ImmortalCacheValue
{value=LastOperation{operationId=650, seed=0000A15A4C2DD25A}}} }
{code}
However, when edg-perf01 reads {{key_00000000000020DB}}, it loads the old value from
local data container as no CH update/rebalance happened so far:
{code}
04:41:32,496 TRACE [org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl]
(BackgroundOps-Checker-1) Checking availability for key=key_00000000000020DB,
status=AVAILABLE
04:41:32,497 ERROR [org.radargun.stages.cache.background.LogChecker]
(BackgroundOps-Checker-1) Missing operation 634 for thread 33 on key 8411
(key_00000000000020DB)
04:41:32,499 DEBUG [org.radargun.service.InfinispanDebugable] (BackgroundOps-Checker-1)
Debug info for key testCache key_00000000000020DB: owners=edg-perf01-63779,
edg-perf03-6264, local=true, uncertain=false,
container.key_00000000000020DB=ImmortalCacheEntry[key=key_00000000000020DB, value=[33 #3:
0, 169, 284, ], created=-1, isCreated=false, lastUsed=-1, isChanged=false, expires=-1,
isExpired=false, canExpire=false, isEvicted=true, isRemoved=false, isValid=false,
lifespan=-1, maxIdle=-1], segmentId=173
{code}
Note that this was found on branch
https://github.com/infinispan/infinispan/pull/3062/files trying to fix ISPN-4949.