On Tue, Jul 2, 2013 at 8:41 PM, Sanne Grinovero <sanne@infinispan.org> wrote:
On 2 July 2013 17:24, Dan Berindei <dan.berindei@gmail.com> wrote:
> It's not wrong, sending the invalidation only from the primary owner is
> wrong :)

Agreed, sending a GET operation to multiple nodes might not be wrong
per-se but is the root cause of such race conditions, and other subtle
complexities we might not even be aware of yet.

I don't know why it was slower, but since the result doesn't make
sense we should look at it a second time rather than throwing the code
away.


It does make sense: statistically, the backup owner will sometimes reply faster than the primary owner.

http://markmail.org/message/qmpn7yueym4tbnve

http://www.bailis.org/blog/doing-redundant-work-to-speed-up-distributed-queries/
 
Sending invalidations from a non-primary owner is an interesting
approach, but then we're having each owner to maintain an independent
list of nodes who have read the value.
For each write, the primary node would send an invalidation to each
registered node, plus the copy to the secondary nodes, which in turn
sends more L1 invalidation nodes to each of their registered nodes..
what's the likelihood of duplication of invalidation messages here?
Sounds like a big network traffic amplifier, lots of network traffic
triggered for each write.


The likelihood of duplication is very near to 100%, indeed, and in non-tx caches it would add another RPC to the critical path.

As always, it's a compromise: if we do something to speed up writes, it will slow down reads. Perhaps we could send the request to the primary owners only when L1 is enabled, as the number of remote gets should be smaller, and send the request to all the owners when L1 is disabled, and the number of remote gets is higher.

Pedro's suggestion to send the request to all the owners, but only write the value to L1 if the first reply was from the primary owner, sounds like it should work just as well. It would make L1 slightly less efficient, but it wouldn't have latency spikes caused by a delay on the primary owner.
 
It also implies that we don't have reliability on the list of
registered nodes, as each owner will be maintaining a different set.
In this case we should also have each node invalidate its L1 stored
entries when the node from which they got these entries has left the
cluster.


Right now we invalidate from L1 all the keys for which the list of owners changed, whether they're still alive or not, because we don't keep track of the node we got each entry from.

If we only sent remote get commands to the primary owner, we'd have to invalidate from L1 all the keys for which the primary owner changed.

One thing that we don't do at the moment, but we should do whether we send the invalidations from the primary owner or from all the owners, is to clean up the requestor lists for the keys that a node no longer owns.
 
Having it all dealt by the primary owner makes for a much simpler
design and also makes it more likely that a single L1 invalidate
message is sent via multicast, or at least with less duplication.


The simplest design would be to never keep track of requestors and always send a multicast from the originator. In fact, the default configuration is to always send multicasts (but we still keep track of requestors and we send the invalidation from the primary owner).

Intuitively, unicasts would be preferable for keys that have a low read:write ratio, as in a write-intensive scenario, but I wonder if disabling L1 wouldn't be even better for that scenario.

Cheers
Dan

 
Cheers,
Sanne




>
>
>
> On Tue, Jul 2, 2013 at 7:14 PM, Sanne Grinovero <sanne@infinispan.org>
> wrote:
>>
>> I see, so we keep the wrong implementation because it's faster?
>>
>> :D
>>
>> On 2 July 2013 16:38, Dan Berindei <dan.berindei@gmail.com> wrote:
>> >
>> >
>> >
>> > On Tue, Jul 2, 2013 at 6:36 PM, Pedro Ruivo <pedro@infinispan.org>
>> > wrote:
>> >>
>> >>
>> >>
>> >> On 07/02/2013 04:21 PM, Sanne Grinovero wrote:
>> >> > +1 for considering it a BUG
>> >> >
>> >> > Didn't we decide a year ago that GET operations should be sent to a
>> >> > single node only (the primary) ?
>> >>
>> >> +1 :)
>> >>
>> >
>> > Manik had a patch for staggering remote GET calls, but it was slowing
>> > down
>> > reads by 25%: http://markmail.org/message/vsx46qbfzzxkkl4w
>> >
>> >>
>> >> >
>> >> > On 2 July 2013 15:59, Pedro Ruivo <pedro@infinispan.org> wrote:
>> >> >> Hi all,
>> >> >>
>> >> >> simple question: What are the consistency guaranties that is
>> >> >> supposed
>> >> >> to
>> >> >> be ensured?
>> >> >>
>> >> >> I have the following scenario (happened in a test case):
>> >> >>
>> >> >> NonOwner: remote get key
>> >> >> BackupOwner: receives the remote get and replies (with the correct
>> >> >> value)
>> >> >> BackupOwner: put in L1 the value
>> >> >> PrimaryOwner: [at the same time] is committing a transaction that
>> >> >> will
>> >> >> update the key.
>> >> >> PrimaryOwer: receives the remote get after sending the commit. The
>> >> >> invalidation for L1 is not sent to NonOwner.
>> >> >>
>> >> >> The test finishes and I perform a check for the key value in all the
>> >> >> caches. The NonOwner returns the L1 cached value (==test fail).
>> >> >>
>> >> >> IMO, this is bug (or not) depending what guaranties we provide.
>> >> >>
>> >> >> wdyt?
>> >> >>
>> >> >> Pedro
>> >> >> _______________________________________________
>> >> >> infinispan-dev mailing list
>> >> >> infinispan-dev@lists.jboss.org
>> >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >> > _______________________________________________
>> >> > infinispan-dev mailing list
>> >> > infinispan-dev@lists.jboss.org
>> >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >> >
>> >> _______________________________________________
>> >> infinispan-dev mailing list
>> >> infinispan-dev@lists.jboss.org
>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >
>> >
>> >
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev@lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev