[infinispan-dev] L1 Consistency with Sync Caches

Dan Berindei dan.berindei at gmail.com
Tue Jul 9 10:33:38 EDT 2013


On Fri, Jul 5, 2013 at 5:31 PM, William Burns <mudokonman at gmail.com> wrote:

> On Tue, Jul 2, 2013 at 5:23 AM, Dan Berindei <dan.berindei at gmail.com>
> wrote:
> >
> >
> >
> > On Fri, Jun 28, 2013 at 4:39 PM, William Burns <mudokonman at gmail.com>
> wrote:
> >>
> >> On Fri, Jun 28, 2013 at 5:14 AM, Dan Berindei <dan.berindei at gmail.com>
> >> wrote:
> >> >
> >> > On Fri, Jun 28, 2013 at 12:17 AM, William Burns <mudokonman at gmail.com
> >
> >> > wrote:
> >> >>
> >> >> Trying to leave my points that would most likely have responses to
> >> >> second email so we can try to get back to a single thread :)
> >> >>
> >> >
> >> > No such luck :)
> >> >
> >> > Sorry for sending 2 replies in the first place, but it seemed more
> >> > natural -
> >> > I meant to comment on your proposal in one email and to describe my
> >> > alternative proposal in the second email.
> >> >
> >> >>
> >> >> On Thu, Jun 27, 2013 at 4:12 PM, Dan Berindei <
> dan.berindei at gmail.com>
> >> >> wrote:
> >> >> >
> >> >> > On Thu, Jun 27, 2013 at 4:18 PM, William Burns <
> mudokonman at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> First off I apologize for the length.
> >> >> >>
> >> >> >> There have been a few Jiras recently that have identified L1
> >> >> >> consistency
> >> >> >> issues with both TX and non TX sync caches.  Async caches with L1
> >> >> >> have
> >> >> >> their
> >> >> >> own issues as well, but I only wanted to talk about sync caches.
> >> >> >>
> >> >> >> https://issues.jboss.org/browse/ISPN-3197
> >> >> >> https://issues.jboss.org/browse/ISPN-2965
> >> >> >> https://issues.jboss.org/browse/ISPN-2990
> >> >> >>
> >> >> >> I have proposed a solution in
> >> >> >> https://github.com/infinispan/infinispan/pull/1922 which should
> >> >> >> start
> >> >> >> L1
> >> >> >> consistency down the right track.  There are quite a few comments
> on
> >> >> >> it
> >> >> >> if
> >> >> >> you want to look into it more, but because of that I am moving
> this
> >> >> >> to
> >> >> >> the
> >> >> >> dev mailing list.
> >> >> >>
> >> >> >> The key changes in the PR are the following (non-tx):
> >> >> >>
> >> >> >> 1. Concurrent reads for a key that can retrieve a remote value are
> >> >> >> "corralled" into a single thread of execution for that given key.
> >> >> >> This
> >> >> >> would reduce network traffic with concurrent gets for the same
> key.
> >> >> >> Note
> >> >> >> the "corralling" only happens on a per key basis.
> >> >> >
> >> >> >
> >> >> > Get commands on owners should not be serialized. Get commands on
> >> >> > non-owners
> >> >> > should not be serialized either, if the key already exists in L1.
> So
> >> >> > I'd
> >> >> > say
> >> >> > L1ReadSynchronizer should be L1WriteSynchronizer instead :)
> >> >>
> >> >> You are suggesting to check the context to see if the key is present
> >> >> before attempting the synchronizer right?  Reading your second email
> >> >> that seems the case :)
> >> >>
> >> >
> >> > Nope, I meant we should check the data container (aka the L1 cache).
> But
> >> > obviously we have to check the invocation context first in a tx cache,
> >> > if
> >> > the tx read the key before it should see the same value.
> >> >
> >> I was thinking because the non-tx wraps the value if it is in the data
> >> container before hand.  But actually for non-tx it seems I should
> >> check the data container only and tx I should only check the ctx (to
> >> guarantee read consistency)
> >
> >
> > For tx you probably have to check the context first, and the data
> container
> > only if you don't find the entry in the context (non-existent entries
> that
> > were previously read in the tx should have a NullMarkerEntry in the
> > context).
> >
>
> Good to know.
>

I think Pedro is removing NullMarkerEntry in
https://github.com/infinispan/infinispan/pull/1937/files, though.


> >> >>
> >> >> >
> >> >> >>
> >> >> >> 2. The single thread that is doing the remote get would update the
> >> >> >> L1
> >> >> >> if
> >> >> >> able (without locking) and make available the value to all the
> >> >> >> requests
> >> >> >> waiting on the get.
> >> >> >
> >> >> >
> >> >> > Well, L1ReadSynchronizer does prevent other threads from modifying
> >> >> > the
> >> >> > same
> >> >> > key, so we are locking the key - just not using LockManager.
> >> >> > It would also require StateTransferLock.acquireSharedTopologyLock()
> >> >> > to
> >> >> > make
> >> >> > sure it doesn't write an L1 entry after the node became a proper
> >> >> > owner.
> >> >>
> >> >> Agree, when I was saying locking I was meaning through the use of the
> >> >> lock manager.
> >> >>
> >> >> >
> >> >> >>
> >> >> >> 3. Invalidations that are received would first check to see if
> there
> >> >> >> is
> >> >> >> a
> >> >> >> current remote get occurring for it's keys.  If there is it will
> >> >> >> attempt to
> >> >> >> cancel the L1 write(s) before it occurs.  If it cannot cancel the
> L1
> >> >> >> write,
> >> >> >> then it must also wait on the current remote get completion and
> >> >> >> subsequently
> >> >> >> run the invalidation.  Note the cancellation would fail when the
> >> >> >> remote
> >> >> >> get
> >> >> >> was done and it is in the middle of updating the L1, so this would
> >> >> >> be
> >> >> >> very
> >> >> >> small window.
> >> >> >
> >> >> >
> >> >> > I think it would be clearer to describe this as the L1 invalidation
> >> >> > cancelling the remote get, not the L1 update, because the actual L1
> >> >> > update
> >> >> > can't be cancelled.
> >> >>
> >> >> When I say L1 update I meant the write to the data container after
> the
> >> >> remote get.  The invalidation can't stop the remote get, all it does
> >> >> is tell the caller that "Hey don't write the remote value you
> >> >> retrieved into the L1."
> >> >>
> >> >
> >> > Oh right, the get command will still use the value it got from the
> >> > remote
> >> > node, it just won't write it.
> >> > That makes me wonder, though, if something like this can happen:
> >> >
> >> > 1. A invokes get(k), starts a L1ReadSynchronizer and a remote get to B
> >> > 2. B invokes put(k, v) and sends an invalidation command to A
> >> > 3. The invalidation command cancels the L1 put on A
> >> > 4. A invokes get(k) again, finds the L1ReadSynchronizer from step 1)
> and
> >> > queues on it
> >> > 5. Both get(k) commands return the same value, even though the value
> has
> >> > changed on the owner(s).
> >> >
> >>
> >> Yeah that was going to be covered in ISPN-2965.  This would be a
> >> problem even with locking since it could be a different node doing the
> >> get.
> >>
> >>
> >>
> https://github.com/wburns/infinispan/blob/ISPN-3197-singlethread/core/src/main/java/org/infinispan/interceptors/distribution/L1NonTxInterceptor.java#L341
> >>
> >> I also have a test currently that is disabled in the test class that
> >> reproduces this.
> >>
> >>
> >>
> https://github.com/wburns/infinispan/blob/ISPN-3197-singlethread/core/src/test/java/org/infinispan/distribution/DistSyncL1FuncTest.java#L351
> >>
> >
> > I think we're talking about slightly different things here. Even if you
> > wanted to send another invalidation command after the entry was
> committed to
> > the data container on the owner, you won't get a second get request on
> the
> > owner because that get request has been corralled on the requestor:
> >
> > 1st get initiated -> remote get sent -> invalidation -> remote get
> returns
> > -> 2nd get initiated -> both get ops return with initial value
>
> The 1st get removes it's value from the map when it is done, to
> prevent memory "leaks".  So in this case the second get would actually
> require a new remote "corral".
>

Not if the 2nd get is initiated before we receive the response for the 1st
get...


> >
> > I think you need to either remove the L1ReadSynchronizer from the map
> during
> > invalidation, or to re-fetch the value if there was an invalidation, in
> > order to avoid this.
>
> I originally had it remove from the map during an invalidation, but it
> shouldn't be needed.  This would just cause an additional get to be
> sent remotely possibly.  Also consistency wise in either case if it
> got the new or old value while it is still in the middle of an update
> is still consistent and would follow the semantics that are described
> in the ConcurrentMap interface.  Gets by default for ConcurrentHashMap
> for example don't acquire locks unless there is no value in the map,
> so it is entirely possible for 2 threads to see a different value if
> the value wasn't flushed to memory yet (stored in CPU register).
>

How about this scenario?

A: tx1 at A: get(k)
A: tx1 at A: invoke get(k) on B
B: tx1 at A: execute get(k), reply with v1
B: tx2 at B: put(k, v2)
B: tx2 at B: invoke invalidate(k) on A
A: tx2 at B: execute invalidate(k)
A: tx2 at B: skip update for get(k)
B: tx2 at B: report that tx2 was committed
A: tx3 at A: get(k)
A: tx3 at A: queue on tx1's remote call
A: tx1 at A: receive v1 from B
A: tx1 at A: return v1
A: tx3 at A: return v1 as well

I'd argue that it's not ok for tx3 to return v1, because it only started
the get operation after tx2 was committed. E.g. in a web app without
session affinity, we could handle a first request from user X on node B,
writing k = v2, and a second request on node A, reading k = v1.

Of course, this only applies to tx caches in DIST_SYNC mode, with
syncCommitPhase = true. With other cache configurations, we have plenty
other opportunities for inconsistency to worry about this...


>
> >
> > Of course, you still have to deal with ISPN-2965 and the fact that a get
> > command can reach the owner after the invalidations were sent but before
> the
> > entry is committed to the data container. Maybe it would be better if
> we'd
> > send L1 invalidations only after the entry was committed?
>
> This was my thought before as well.  I talked briefly with mmarkus
> regarding this.  His thought was we wanted it to participate in the
> prepare step.  However looking at tx caches it only flushes the L1 on
> a commit, so I guess that isn't true.  For non-tx caches I am guessing
> we do this for performance reasons since we can have the L1
> invalidations and the other owner nodes update at the same time and
> then block on them all later.
>
> Doing the call after the data is committed is actually the fix I had
> for ISPN-2965.  My plan was that we would have both invalidation
> points.  One before the data was committed and one after the data
> container was updated and the latter would always be asynchronous:
> kind of best effort to remove any stragglers.  From your point though,
> I am debating if we still want the initial invalidation though.  I
> think the performance benefit would be nice though, as I would hope we
> would very very rarely require the later invalidation.  WDYT?
>

You have a point about non-tx caches: sending the invalidation to
non-owners in parallel with sending the write to backup owners is clearly
faster then sending them sequentially. The chance of a remote get arriving
at that precise time should be very small, so the additional invalidation
command shouldn't be a problem.

OTOH, I think sending the second invalidation asynchronously would allow
the same temporary inconsistency as the invalidation command not removing
the get "corral" immediately when receiving the invalidation command. And
for tx caches, invoking the invalidation command only after the entry was
committed wouldn't add any costs.


> >
> >>
> >> >
> >> >>
> >> >> >
> >> >> > We also have to remove the logic in AbstractLockingInterceptor that
> >> >> > skips L1
> >> >> > invalidation for a key if it can't acquire a lock with a 0 timeout.
> >> >> >
> >> >> >>
> >> >> >> 4. Local writes will also do the same thing as the invalidation
> with
> >> >> >> cancelling or waiting.  Note that non tx local writes only do L1
> >> >> >> invalidations and don't write the value to the data container.
> >> >> >> Reasons
> >> >> >> why
> >> >> >> I found at https://issues.jboss.org/browse/ISPN-3214
> >> >> >
> >> >> >
> >> >> > I didn't know about ISPN-3214 or that non-tx writes don't write to
> >> >> > L1,
> >> >> > but
> >> >> > it sounds fair.
> >> >>
> >> >> Yeah I really wanted that to work, but without some additional checks
> >> >> such as versioned data, I don't see a way to do this without locking
> >> >> at the primary node like tx caches.
> >> >>
> >> >
> >> > In theory, the primary owner could send a synchronous RPC back to the
> >> > originator while it is holding the lock, saying "ok, you can now write
> >> > the
> >> > value to L1". But I don't think the slowdown from an additional RPC
> >> > would be
> >> > acceptable.
> >> >
> >>
> >> I would agree ;(
> >>
> >> >>
> >> >> >
> >> >> >>
> >> >> >> 5. Writes that require the previous value and don't have it in the
> >> >> >> L1
> >> >> >> would also do it's get operations using the same "corralling"
> >> >> >> method.
> >> >> >
> >> >> >
> >> >> > The remoteGetBeforeWrites are a bit different - they don't happen
> on
> >> >> > non-owners, they only happen on writeCH-owners that didn't receive
> >> >> > that
> >> >> > entry via state transfer yet. They put the value in the
> >> >> > InvocationContext,
> >> >> > but they don't write it to the data container - nor do they
> >> >> > invalidate
> >> >> > the
> >> >> > L1 entry, if it exists.
> >> >>
> >> >> Ah yes that is true, but only for non tx caches it seems.
> >> >>
> >> >
> >> > Right, I wasn't considering the fact that a conditional command may
> >> > fail...
> >> > I think if that happens, even in non-tx caches
> EntryWrappingInterceptor
> >> > may
> >> > write the entry to the data container as an L1 entry. If we move the
> L1
> >> > writes to the L1 interceptor, we must ensure that
> >> > EntryWrappingInterceptor
> >> > doesn't write anything to L1 any more.
> >> >
> >>
> >> We still need EntryWrappingInterceptor to write to the L1 cache for tx
> >> caches I would think.  We only want to write to the L1 after the tx is
> >> committed for write commands.  Read commands would be fine writing to
> >> the container immediately.  Also I would hope conditional writes
> >> wouldn't update the context with the new value until we knew the write
> >> was successful, should prevent bad data being written there.
> >>
> >
> > The L1 interceptor could still only write to the data container during
> the
> > commit phase. I was thinking of separating the regular writes from the L1
> > writes because L1 writes would need additional synchronization, but maybe
> > that's not the case.
> >
>
> For non-tx caches I wouldn't think it would matter as it is just an
> invalidation.  At worse case it would cause an additional get.
>
> For tx caches it should still be okay as well since the invalidation
> would be done in the scope of a cache holding the write lock, so the
> only updates that would occur would be a single invalidation with a
> get or a single write with a get.  If the get is done before, then
> either will overwrite.  If the get is done after it will be
> invalidated again with ISPN-2965.
>
> Is there something here you can think of in addition?
>

Well, the additional invalidation for ISPN-2965 won't necessarily be done
while holding a lock, but I don't see a problem with it either.


> >>
> >> >
> >> >>
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> 4/5 are not currently implemented in PR.
> >> >> >>
> >> >> >> This approach would use no locking for non tx caches for all L1
> >> >> >> operations.  The synchronization point would be done through the
> >> >> >> "corralling" method and invalidations/writes communicating to it.
> >> >> >>
> >> >> >> Transactional caches would do almost the same thing as non-tx.
>  Note
> >> >> >> these
> >> >> >> changes are not done in any way yet.
> >> >> >>
> >> >> >> 1. Gets would now update the L1 immediately after retrieving the
> >> >> >> value
> >> >> >> without locking, but still using the "corralling" technique that
> >> >> >> non-tx
> >> >> >> does.  Previously the L1 update from a get was transactional.
>  This
> >> >> >> actually
> >> >> >> would remedy issue [1]
> >> >> >>
> >> >> >> 2. Writes currently acquire the remote lock when committing, which
> >> >> >> is
> >> >> >> why
> >> >> >> tx caches are able to update the L1 with the value.  Writes would
> do
> >> >> >> the
> >> >> >> same cancellation/wait method as non-tx.
> >> >> >>
> >> >
> >> >
> >> > Hmm, I don't think your current approach for L1 invalidations would
> work
> >> > for
> >> > L1 writes, because the actual write to the data container is not
> >> > synchronized (well, technically you still have the 0-timeout locking
> for
> >> > invalidation commands, but I think you're planning to remove that). So
> >> > it's
> >> > possible for an L1 write and an L1 invalidation to wait for the same
> >> > remote
> >> > get and then to get executed in the wrong order.
> >> >
> >>
> >> For non-tx that doesn't matter as a write and L1 invalidation both do
> >> invalidation.
> >>
> >> For tx it should be fine as well since it will hold the primary owner
> >> lock for the write duration, which includes L1 writes (so only that tx
> >> is writing it's value).  Tx caches don't process the L1 invalidation
> >> message when it comes back since they were the originator also.
> >>
> >
> > True, in most cases different writers would be synchronized via the lock
> on
> > the primary owner. There are 2 situations where that doesn't happen,
> though:
> > * When syncCommitPhase = false, the primary lock may be released before
> the
> > entry is actually committed to L1.
> > * During state transfer, the received state is applied without holding a
> > lock on the primary, or even a local lock. State transfer only uses
> > StateConsumerImpl.updatedKeys for synchronization, which turns out to be
> a
> > problem with regular entries as well:
> > https://issues.jboss.org/browse/ISPN-3287
> >
>
> Yeah to be honest I was going to log a Jira a while back for
> syncCommitPhase = false causing L1 inconsistencies.  I didn't since
> realistically there is no easy way to guarantee consistency with
> async, until we have some way of knowing what order the updates were
> completed from the owner's perspective.
>

Agree, syncCommitPhase = false has its own issues.

I was talking with Mircea about that, and how we may be able to fix (part
of) the issues by making syncCommitPhase = false behave more like
syncCommitPhase = true. I.e. we could return immediately to the user, but
spin another thread to invoke the commit synchronously and the tx
completion command to release locks asynchronously. I think that would make
things easier with L1 as well.


> ISPN-3287 seems like we will want to look into that one more closely
> as to how that will work with L1 at some point, I agree.
>
> >>
> >> >
> >> >>
> >> >> >> 3. Writes that require the previous value and don't have it in the
> >> >> >> L1
> >> >> >> would also do it's get operations using the same method.
> >> >> >
> >> >> >
> >> >> > Just like for non-tx caches, I don't think these remote gets have
> to
> >> >> > be
> >> >> > stored in L1.
> >> >>
> >> >> Tx caches do the remote get and could cache the L1 value immediately.
> >> >> This would help if the transaction is rolled back or a conditional
> >> >> operation failed etc.  There are some locking concerns, here but I
> >> >> will leave that the other post.
> >> >>
> >> >
> >> > Right, the L1 entry could be immediately written to the data
> container,
> >> > but
> >> > not if it complicates things too much: most writes should be
> successful
> >> > anyway.
> >> >
> >>
> >> I agree the writes should be successful, but there is no telling if
> >> the transaction will be committed.  Something else outside of
> >> infinispan could error causing a rollback or at worst case 2PC could
> >> fail prepare (not touching heuristic failures).
> >>
> >> Also this would help plug up this hole:
> >>
> >>
> https://issues.jboss.org/browse/ISPN-2965?focusedCommentId=12779780&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12779780
> >>
> >
> > Sorry, I wasn't clear... I meant if 99.9% of conditional writes write the
> > initial value to L1 and then immediately write the new value, perhaps the
> > additional overhead of an extra write to the data container is not worth
> the
> > benefit of having the value in L1 in the 0.1% of cases where the write is
> > not successful or the tx fails.
>
> The overhead of writing to the data container should be relatively
> small though as we already did the remote get anyways and everything
> is local.
>
> I agree Infinispan conditional writes could be successful 99.9% or
> more even.  My concern with failures would be someone explicitly
> rolling back the transaction due to conflict or processing error in
> say user code, which can be quite common, at least in the systems I
> have seen :P
>

Fair enough.


>
> >
> > Of course, we don't have proper statistics about the percentage of
> > conditional writes that succeed in real-life applications, so in the end
> > it's your decision.
> >
> >>
> >> >> >
> >> >> >>
> >> >> >> 4. For tx cache [2] would also have to be done.
> >> >> >>
> >> >> >> [1] -
> >> >> >>
> >> >> >>
> >> >> >>
> https://issues.jboss.org/browse/ISPN-2965?focusedCommentId=12779780&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12779780
> >> >> >> [2] - https://issues.jboss.org/browse/ISPN-1540
> >> >> >>
> >> >> >> Also rehashing is another issue, but we should be able to acquire
> >> >> >> the
> >> >> >> state transfer lock before updating the L1 on a get, just like
> when
> >> >> >> an
> >> >> >> entry
> >> >> >> is committed to the data container.
> >> >> >>
> >> >> >
> >> >> > The same for L1 invalidations - we don't want to remove real
> entries
> >> >> > from
> >> >> > the data container after the local node became an owner.
> >> >>
> >> >> Yeah actually as you mentioned this, it sounds like a hole currently
> >> >> even.  I don't know if this case can happen, but what if you received
> >> >> a L1 invalidation and then got a rehash event right before it was
> >> >> committing it to the container?  It seems the L1 commit to the
> >> >> container would block until it could get the shared topology lock and
> >> >> after it could then removes the value.  I probably need to dig into
> >> >> the state transfer stuff deeper to know myself.
> >> >>
> >> >
> >> > I think that's ok, because the value is stale, and any future
> >> > get/conditional write command will request the new value from the
> >> > previous
> >> > owners.
> >> >
> >> > I don't think the new value can arrive via a StateResponseCommand and
> be
> >> > applied to the data container before the invalidation command manages
> to
> >> > commit, because the put command sends the invalidations first
> >> > (synchronously) and only then commits on the owner.
> >> >
> >>
> >> I defer to your judgement here ;-)
> >>
> >
> > On a second read, that second paragraph doesn't make a lot of sense...
> >
> > To rephrase: the InvalidateL1Command could remove an entry after it was
> > written via state transfer. Since the invalidation is synchronous, that
> > means the commit command will be forwarded to the new owner, and it will
> > write the new value gain. If any tx tries to read the key in the
> meantime,
> > it will re-read it from the old owners, so that shouldn't be a problem.
> >
> > But if there are 2 old owners (let's call them A and B), the commit
> command
> > forwarded from A could be executed on the new owner just before the
> > invalidation command from B. And once the forwarded commit from A is
> > executed, the forwarded commit from B for the same tx won't do anything,
> and
> > k will be lost. Even if state transfer receives k after that, it won't
> > overwrite it because it's in the StateConsumerImpl.updatedKeys set.
> >
> > There is another hitch with async commit - it's possible that the new
> owner
> > didn't receive the transaction data via state transfer (because it
> requested
> > transaction data from the B, and the transaction had already been
> committed
> > there). Then any forwarded commit won't do anything and the entry will be
> > lost.
> >
> >
> > Going back to my proposal of adding an ownership check before actually
> > removing the L1 entry (and while holding the shared topology lock)...
> >
> > Say nodes [A, B] were the previous owners of key k, C has k in its L1
> cache,
> > and [B, C] are the new owners (so the owners in the write CH are [A, B,
> C]).
> > If there is a transaction modifying key k, the timing could be something
> > like this:
> >
> > 1. The commit command finishes on A and B before they install the new
> > topology. Because StateRequestCommands block until the state provider
> > installs the new topology, that means node C didn't request the state
> for k
> > yet, so the invalidateL1(k) command runs on C before the state transfer
> > put(k, v) command.
> >
> > However, there is a chance that C already installed the new topology
> before
> > executing the invalidate(k) command, so if we do just the ownership
> check in
> > the L1 interceptor, the key won't be deleted. That's not ok, because
> we'll
> > have a stale value until state transfer overwrites it. And even worse,
> if we
> > had a remove(k) instead of a put(k, v), C would only remove the stale
> value
> > for k when it expired.
> >
> > We may need to clear all the keys for which C is becoming an owner after
> > installing the new topology, but before requesting transaction data
> (which
> > means before processing transactions with the new topology id as well),
> when
> > L1 is enabled.
>
> Yeah I think any keys that are moving owners it would be best just to
> remove those from the L1 and requestors.  Or maybe that is too heavy
> handed?
>

Nope, what we already do is very similar, we just don't clear the entries
owned by the node in the new CH. See StateConsumerImpl.invalidateSegments().

BTW, I was assuming here that we send the invalidation command from all the
owners, but like Pedro pointed out we're only sending it from the primary
owner. So we need to revisit these scenarios once we decide on how to treat

One thing I'm worrying about, except I haven't figured out if it can really
happen yet, is that DistributionLogic.commitEntry can write an the same
entry as a regular entry or as an L1 entry, depending on whether the local
node is an owner. If we were to call it before installing a new topology to
write an L1 entry and it wrote a regular entry instead, after the new
topology was installed, we might end up with a stale value.


> >
> > 2. The commit starts on A and B before the new topology is installed,
> but by
> > the time it finishes on A, A installs the new topology.
> > StateTransferInterceptor then forwards the commit command to C, and on C
> it
> > writes the new value (eventually waiting for C to receive the transaction
> > data from A). With sync commit + tx completion notification, C is
> guaranteed
> > to receive the tx whether it requests it from A or B.
> >
> > If StateConsumer receives a value for k later, it won't overwrite it,
> except
> > for this bug: https://issues.jboss.org/browse/ISPN-3287
> >
> > There is a chance that the forwarded commit command from A reaches C
> around
> > the same time with the invalidateL1(k) command from B. But the commit
> > command is sent with the new topology id, so it will only execute after C
> > installed the new topology as well. If the invalidateL1(k) command
> executes
> > at the same time, the ownership check will prevent it from doing
> anything.
> > (Without the ownership check we could remove key k just after writing
> it.)
> >
> > With async commit it's more complicated: if C requested the transaction
> data
> > from B, and B had already finished committing by the time it sent
> > transaction data, the forwarded commit on C won't do anything. Instead, C
> > should receive the new value via state transfer, going back to case 1.
> >
> > 3. The commit is initiated after the new topology is installed. The
> commit
> > command is sent to C directly, it waits there for C to receive
> transaction
> > data from A or B, and it will write the new value. Since the commit was
> sent
> > with the new topology id, it will block on A and B until they install the
> > new topology, and won't trigger any invalidateL1(k) command to C.
> >
> > With async commit there is still a chance that A and/or B already finish
> the
> > commit by the time C requests the transaction data, and C will only
> receive
> > the new value via state transfer - just like in case 1.
> >
> >>
> >> >>
> >> >> >
> >> >> >>
> >> >> >> Any comments/concerns would be appreciated.
> >> >> >>
> >> >> >> Thanks,
> >> >> >>
> >> >> >>  - Will
> >> >> >>
> >> >> >> _______________________________________________
> >> >> >> infinispan-dev mailing list
> >> >> >> infinispan-dev at lists.jboss.org
> >> >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >> >
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > infinispan-dev mailing list
> >> >> > infinispan-dev at lists.jboss.org
> >> >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >> _______________________________________________
> >> >> infinispan-dev mailing list
> >> >> infinispan-dev at lists.jboss.org
> >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > infinispan-dev mailing list
> >> > infinispan-dev at lists.jboss.org
> >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20130709/35e33d67/attachment-0001.html 


More information about the infinispan-dev mailing list