[infinispan-dev] L1 consistency for transactional caches.

Dan Berindei dan.berindei at gmail.com
Wed Jul 3 05:26:26 EDT 2013


On Tue, Jul 2, 2013 at 8:41 PM, Sanne Grinovero <sanne at infinispan.org>wrote:

> On 2 July 2013 17:24, Dan Berindei <dan.berindei at gmail.com> wrote:
> > It's not wrong, sending the invalidation only from the primary owner is
> > wrong :)
>
> Agreed, sending a GET operation to multiple nodes might not be wrong
> per-se but is the root cause of such race conditions, and other subtle
> complexities we might not even be aware of yet.
>
> I don't know why it was slower, but since the result doesn't make
> sense we should look at it a second time rather than throwing the code
> away.
>
>
It does make sense: statistically, the backup owner will sometimes reply
faster than the primary owner.

http://markmail.org/message/qmpn7yueym4tbnve

http://www.bailis.org/blog/doing-redundant-work-to-speed-up-distributed-queries/


> Sending invalidations from a non-primary owner is an interesting
> approach, but then we're having each owner to maintain an independent
> list of nodes who have read the value.
> For each write, the primary node would send an invalidation to each
> registered node, plus the copy to the secondary nodes, which in turn
> sends more L1 invalidation nodes to each of their registered nodes..
> what's the likelihood of duplication of invalidation messages here?
> Sounds like a big network traffic amplifier, lots of network traffic
> triggered for each write.
>
>
The likelihood of duplication is very near to 100%, indeed, and in non-tx
caches it would add another RPC to the critical path.

As always, it's a compromise: if we do something to speed up writes, it
will slow down reads. Perhaps we could send the request to the primary
owners only when L1 is enabled, as the number of remote gets should be
smaller, and send the request to all the owners when L1 is disabled, and
the number of remote gets is higher.

Pedro's suggestion to send the request to all the owners, but only write
the value to L1 if the first reply was from the primary owner, sounds like
it should work just as well. It would make L1 slightly less efficient, but
it wouldn't have latency spikes caused by a delay on the primary owner.


> It also implies that we don't have reliability on the list of
> registered nodes, as each owner will be maintaining a different set.
> In this case we should also have each node invalidate its L1 stored
> entries when the node from which they got these entries has left the
> cluster.
>
>
Right now we invalidate from L1 all the keys for which the list of owners
changed, whether they're still alive or not, because we don't keep track of
the node we got each entry from.

If we only sent remote get commands to the primary owner, we'd have to
invalidate from L1 all the keys for which the primary owner changed.

One thing that we don't do at the moment, but we should do whether we send
the invalidations from the primary owner or from all the owners, is to
clean up the requestor lists for the keys that a node no longer owns.


> Having it all dealt by the primary owner makes for a much simpler
> design and also makes it more likely that a single L1 invalidate
> message is sent via multicast, or at least with less duplication.
>
>
The simplest design would be to never keep track of requestors and always
send a multicast from the originator. In fact, the default configuration is
to always send multicasts (but we still keep track of requestors and we
send the invalidation from the primary owner).

Intuitively, unicasts would be preferable for keys that have a low
read:write ratio, as in a write-intensive scenario, but I wonder if
disabling L1 wouldn't be even better for that scenario.

Cheers
Dan



> Cheers,
> Sanne
>
>
>
>
> >
> >
> >
> > On Tue, Jul 2, 2013 at 7:14 PM, Sanne Grinovero <sanne at infinispan.org>
> > wrote:
> >>
> >> I see, so we keep the wrong implementation because it's faster?
> >>
> >> :D
> >>
> >> On 2 July 2013 16:38, Dan Berindei <dan.berindei at gmail.com> wrote:
> >> >
> >> >
> >> >
> >> > On Tue, Jul 2, 2013 at 6:36 PM, Pedro Ruivo <pedro at infinispan.org>
> >> > wrote:
> >> >>
> >> >>
> >> >>
> >> >> On 07/02/2013 04:21 PM, Sanne Grinovero wrote:
> >> >> > +1 for considering it a BUG
> >> >> >
> >> >> > Didn't we decide a year ago that GET operations should be sent to a
> >> >> > single node only (the primary) ?
> >> >>
> >> >> +1 :)
> >> >>
> >> >
> >> > Manik had a patch for staggering remote GET calls, but it was slowing
> >> > down
> >> > reads by 25%: http://markmail.org/message/vsx46qbfzzxkkl4w
> >> >
> >> >>
> >> >> >
> >> >> > On 2 July 2013 15:59, Pedro Ruivo <pedro at infinispan.org> wrote:
> >> >> >> Hi all,
> >> >> >>
> >> >> >> simple question: What are the consistency guaranties that is
> >> >> >> supposed
> >> >> >> to
> >> >> >> be ensured?
> >> >> >>
> >> >> >> I have the following scenario (happened in a test case):
> >> >> >>
> >> >> >> NonOwner: remote get key
> >> >> >> BackupOwner: receives the remote get and replies (with the correct
> >> >> >> value)
> >> >> >> BackupOwner: put in L1 the value
> >> >> >> PrimaryOwner: [at the same time] is committing a transaction that
> >> >> >> will
> >> >> >> update the key.
> >> >> >> PrimaryOwer: receives the remote get after sending the commit. The
> >> >> >> invalidation for L1 is not sent to NonOwner.
> >> >> >>
> >> >> >> The test finishes and I perform a check for the key value in all
> the
> >> >> >> caches. The NonOwner returns the L1 cached value (==test fail).
> >> >> >>
> >> >> >> IMO, this is bug (or not) depending what guaranties we provide.
> >> >> >>
> >> >> >> wdyt?
> >> >> >>
> >> >> >> Pedro
> >> >> >> _______________________________________________
> >> >> >> infinispan-dev mailing list
> >> >> >> infinispan-dev at lists.jboss.org
> >> >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >> > _______________________________________________
> >> >> > infinispan-dev mailing list
> >> >> > infinispan-dev at lists.jboss.org
> >> >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >> >
> >> >> _______________________________________________
> >> >> infinispan-dev mailing list
> >> >> infinispan-dev at lists.jboss.org
> >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > infinispan-dev mailing list
> >> > infinispan-dev at lists.jboss.org
> >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20130703/d6688f3a/attachment.html 


More information about the infinispan-dev mailing list