[infinispan-dev] Write-only commands

Radim Vansa rvansa at redhat.com
Thu Jun 29 09:51:22 EDT 2017


On 06/29/2017 02:36 PM, Dan Berindei wrote:
> On Thu, Jun 29, 2017 at 2:19 PM, Radim Vansa <rvansa at redhat.com> wrote:
>> On 06/29/2017 11:16 AM, Dan Berindei wrote:
>>> On Thu, Jun 29, 2017 at 11:53 AM, Radim Vansa <rvansa at redhat.com> wrote:
>>>> On 06/28/2017 04:20 PM, Dan Berindei wrote:
>>>>> On Wed, Jun 28, 2017 at 2:17 PM, Radim Vansa <rvansa at redhat.com> wrote:
>>>>>> On 06/28/2017 10:40 AM, Dan Berindei wrote:
>>>>>>> On Wed, Jun 28, 2017 at 10:17 AM, Radim Vansa <rvansa at redhat.com> wrote:
>>>>>>>> On 06/27/2017 03:54 PM, Dan Berindei wrote:
>>>>>>>>> On Tue, Jun 27, 2017 at 2:43 PM, Adrian Nistor <anistor at redhat.com> wrote:
>>>>>>>>>> I've said this in a previous thread on this same issue, I will repeat myself
>>>>>>>>>> as many times as needed.
>>>>>>>>>>
>>>>>>>>>> Continuous queries require the previous value itself, not just knowledge of
>>>>>>>>>> the type of the previous value. Strongly typed caches solve no problem here.
>>>>>>>>>>
>>>>>>>>>> So if we half-fix query but leave CQ broken I will be half-happy (ie. very
>>>>>>>>>> depressed) :)
>>>>>>>>>>
>>>>>>>>>> I'd remove these commands completely or possibly remove them just from
>>>>>>>>>> public API and keep them internal.
>>>>>>>>>>
>>>>>>>>> +1 to remove the flags from the public API. Most of them are not safe
>>>>>>>>> for applications to use, and ignoring them when they can lead to
>>>>>>>>> inconsistencies would make them useless.
>>>>>>>>>
>>>>>>>>> E.g. the whole point of SKIP_INDEX_CLEANUP is that the cache doesn't
>>>>>>>>> know when it is safe to skip the delete statement, and it relies on
>>>>>>>>> the application making a (possibly wrong) choice.
>>>>>>>>>
>>>>>>>>> IGNORE_RETURN_VALUES should be safe to use, and we actually recommend
>>>>>>>>> that applications use it right now. If query or listeners need the
>>>>>>>>> previous value, then we should load it internally, but hide it from
>>>>>>>>> the user.
>>>>>>>>>
>>>>>>>>> But removing it opens another discussion: should we replace it in the
>>>>>>>>> public API with a new method AdvancedCache.ignoreReturnValues(), or
>>>>>>>>> should we make it the default and add a method
>>>>>>>>> AdvancedCache.forceReturnPreviousValues()?
>>>>>>>> Please don't derail the thread.
>>>>>>>>
>>>>>>> I don't think I'm derailing the thread: IGNORE_PREVIOUS_VALUES also
>>>>>>> breaks the previous value for listeners, even if the QueryInterceptor
>>>>>>> removes it from write commands. And it is public (+recommended) API,
>>>>>>> in fact most if not all of our performance tests use it.
>>>>>> That's just a flawed implementation. IPV is documented to be a 'safe'
>>>>>> flag that should affect mostly primary -> origin replication, all the
>>>>>> other is implementation. And we can fix that. Users should *not* expect
>>>>>> that it e.g. skips loading from a cache store. We have already removed
>>>>>> the modes that would be broken-by-design.
>>>>>>
>>>>> I think you're confusing IGNORE_RETURN_VALUES with SKIP_REMOTE_LOOKUP
>>>>> here. The IVR javadoc doesn't say anything about remote lookups, only
>>>>> SRL does.
>>>> No, I am not; While IRV does not mention the replication, it's said to
>>>> be 'safe'. So omitting the primary -> origin replication is basically
>>>> all it can do when listeners are in place. You're right that I have
>>>> missed the second part in SRL talking about put()s; I took it as a flag
>>>> prohibiting any remote lookup (as the RPC operation in its whole) any
>>>> time the remote value is needed. Yes, the second part seems equal to my
>>>> understanding of IRV.
>>>>
>>>>> And I agree that the current status is far from ideal, but there is
>>>>> one more valid alternative: we can decide that the previous value is
>>>>> only reliable in clustered listeners, and local listeners don't always
>>>>> have it. Document that, make sure continuous query uses clustered
>>>>> listeners, and we're done :)
>>>> Unreliable return values are worse than none; I would rather remove them
>>>> if we can't guarantee that these are right. Though, clustered listeners
>>>> are based on regular listeners, so you'd need some means to make them
>>>> reliable.
>>> We could change the clustered listeners so that they're not based on
>>> the regular listeners... I've been pestering Will about this ever
>>> since the clustered listeners landed!
>>>
>>> But I should have been clearer: I didn't mean that the listeners on
>>> the backups should receive the previous value whenever we feel like
>>> it, I meant we should document and enforce that the previous value is
>>> only included in the event for listeners on the primary owner.
>>>>>> On the other hand, write-only commands are not about *returning* the
>>>>>> value but about (not) *reading* it, therefore (in my eyes) user could
>>>>>> make that assumption and would like to enforce it this way. Even some
>>>>>> docs explaining PersistenceMode.SKIP suggest that.
>>>>>>
>>>>> To me the purpose the same, there is no difference between returning
>>>>> the previous value to the application or providing the previous value
>>>>> via EntryView.
>>>> There is a difference between what's provided locally and what's send
>>>> over the network.
>>>>
>>>>> Applying this logic to the JCache API, it would mean
>>>>> put() should never read the previous value, because some users could
>>>>> assume that only getAndPut() reads it.
>>>> OK, this is a valid point.
>>>>
>>>>> In the old times we didn't have IGNORE_RETURN_VALUES, only
>>>>> SKIP_REMOTE_LOOKUP+SKIP_CACHE_LOAD, and they would sometimes be
>>>>> ignored (e.g. if the write was conditional). I think that's what
>>>>> Galder had in mind when he wrote the PersistenceMode api note, not the
>>>>> current behaviour of SKIP_CACHE_LOAD. I'll let Galder clarify this
>>>>> himself, but I'll be very disappointed if he says he designed the
>>>>> write-only operations so that they'll never work with query.
>>>>>
>>>>>
>>>>>> I don't want to talk about flags, because I see all flags but IPV as
>>>>>> 'effectively internal'. Let's discuss it more high-level. Some API
>>>>>> exposes non-reading operation - we can see that under some circumstances
>>>>>> this is not possible so we have options to 1) break stuff 2) break API
>>>>>> assumptions 3) sometimes break API assumptions 4) remove such API (to
>>>>>> not allow the user to make such assumptions). There's also an option 5)
>>>>>> to fail the operation if the API assumption would be broken. Though, I
>>>>>> don't fancy getting exception from a WriteOnlyMap.eval just because
>>>>>> someone has registered a listener.
>>>>>>
>>>>> I disagree with the premise: there's no good reason for the user to
>>>>> assume that write-only commands are *guaranteed* to never load the
>>>>> previous value from a store. We just need to add a clarification to
>>>>> the write-only operations' javadoc, no need to break anything.
>>>> OK then, though it diminishes the value of write-only commands a lot.
>>>>
>>>>>>> For that matter, ClusteredCacheLoaderInterceptor also doesn't load the
>>>>>>> previous value on backup owners for most write commands
>>>>>>> (LoadType.PRIMARY), we'd need to change that as well.
>>>>>> Yes, all commands will have to load current value on all owners.
>>>>>>
>>>>>>>>>> On 06/27/2017 01:28 PM, Sanne Grinovero wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 27 Jun 2017 10:13, "Radim Vansa" <rvansa at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am working on entry version history (again). In Como we've discussed
>>>>>>>>>> that previous values are needed for (continuous) query and reliable
>>>>>>>>>> listeners,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Index based queries also require the previous value on a write - unless we
>>>>>>>>>> can get "strongly typed caches" giving guarantees about the class to
>>>>>>>>>> represent the content of a cache to be unique.
>>>>>>>>>>
>>>>>>>>>> Essentially we only need to know the type of the previous object. It might
>>>>>>>>>> be worth having a way to load the type metadata if the previous value only.
>>>>>>>>>>
>>>>>>>>>> so I wonder what should we do with functional write-only
>>>>>>>>>> commands. These are different to commands with flags, because flags
>>>>>>>>>> (other than ignore return value) are expected to break something.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sorry I hope to not derail the thread but let's remind that we hope to
>>>>>>>>>> evolve beyond "flags are expected to break stuff" ; we never got to it but
>>>>>>>>>> search the mailing list.
>>>>>>>>>>
>>>>>>>>>> Since flags are exposed to the user I would rather they're not allowed to
>>>>>>>>>> break things.
>>>>>>>>>> Could they be treated as hints? Ignore the flag (and warn?) if the used
>>>>>>>>>> configuration/integrations veto them.
>>>>>>>>>>
>>>>>>>>>> Alternatively, let's remove them from API. Remember "The Jokre" POC was
>>>>>>>>>> intentionally designed to explore pushing the limits on performance w/o end
>>>>>>>>>> users having to solve puzzles, such as learning details about these flags
>>>>>>>>>> and their possible side effects.
>>>>>>>>>>
>>>>>>>>>> So assuming they become either "safe" or internal, maybe you can take
>>>>>>>>>> advantage of them?
>>>>>>>>>>
>>>>>>>>>> I see
>>>>>>>>>> the available options as:
>>>>>>>>>>
>>>>>>>>>> 1) run write-only commands 'optimized', ignoring any querying and such
>>>>>>>>>> (warn user that he will break it)
>>>>>>>>>>
>>>>>>>>>> 2) run write-only without any optimization, rendering them useless
>>>>>>>>>>
>>>>>>>>>> 3) detect when querying is set up (ignoring listeners and maybe other
>>>>>>>>>> stuff that could get broken)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Might be useful for making a POC work, but I believe query will be very
>>>>>>>>>> likely to be often enabled.
>>>>>>>>>> Having an either / or switch for different features in Infinispan will make
>>>>>>>>>> it harder to use and understand, so I'd rather see work on the right design
>>>>>>>>>> as taking temporary shortcuts risks baking into stone features which we
>>>>>>>>>> later struggle to fix or maintain.
>>>>>>>>>>
>>>>>>>>> I vote for this option.
>>>>>>>>>
>>>>>>>>> Query, listeners, and other components that need the previous value
>>>>>>>>> should not just assume that the application knows better, they should
>>>>>>>>> be able to change how operations works based on their needs. Of
>>>>>>>>> course, the reverse is also true: if the application uses write-only
>>>>>>>>> commands (or IGNORE_RETURN_VALUES) for performance reasons, it should
>>>>>>>>> be possible for the user to detect why the previous values are still
>>>>>>>>> loaded.
>>>>>>>> If it were just query (static configuration), I would be okay with this
>>>>>>>> idea. But as per listeners - besides tainting the design (event source
>>>>>>>> should not check if there's a listener) you'd need to check *before*
>>>>>>> The source wouldn't check for listeners explicitly, the notifier would
>>>>>>> have an isPreviousValueNeeded() method and precompute that before a
>>>>>>> listener is added or after a listener is removed. I was am assuming
>>>>>>> some listeners will not need the previous value, e.g. the listeners
>>>>>>> installed by streams.
>>>>>> You can cover your warts with a make-up but you'll still have warts :)
>>>>> Cutting them off doesn't necessarily work, either :)
>>>> Yep, some people tend to fix w/ hacks instead of designing :)
>>>>
>>>>>>>> (DistributionI, CacheLoaderI) you have to call notify (cmd.perform,
>>>>>>>> EWI). So this is a space for race conditions or weird handling (if
>>>>>>>> there's a listener when I am about to call notify and my flags are not
>>>>>>>> cleared, skip the notification and pretend that this code was invoked
>>>>>>>> before the listener was registered...). Or do you have another solution
>>>>>>>> in mind (config option to disable listeners && all features using those?).
>>>>>>>>
>>>>>>> I was definitely going for the weird handling...
>>>>>>>
>>>>>>> My plan was to set a HAS_PREVIOUS_VALUE flag on the context entry when
>>>>>>> it's loaded, and check that before invoking a listener that needs the
>>>>>>> previous value. It is missing one edge case: if one thread starts a
>>>>>>> write operation, then another thread installs a listener that requires
>>>>>>> the previous value and iterates over the cache, the second thread may
>>>>>>> not see the value written by the first thread.
>>>>>> If the operations overlap, you could pretend that the write has finished
>>>>>> before the listener was invoked and simply not notify the listener. If I
>>>>>> am missing it please write it down in code. But handling this in any way
>>>>>> is still clumsy.
>>>>> I hope pseudo-code is fine...
>>>>>
>>>>> 1. cache.put(k, v1) starts, doesn't load the previous value v0 in the context
>>>>> 2. cache.addListener(l) runs, doesn't block
>>>>> 3. cache.entrySet().forEach() runs, finds k->v0
>>>>> 4. cache.put(k, v1) commits k->v1, should notify the listener but
>>>>> doesn't have the previous value
>>>>> 5. cache.put(k, v0) returns, but the code that installed the listener
>>>>> thinks the value of k is still v0
>>>> Oh OK, I should have drawn that myself when considering the scenario.
>>>> You're right, here we'll have to retry.
>>>>
>>>> All in all, I think this discussion is done. We'll tell users to stick
>>>> their flags where the sun doesn't shine and remove any inconvenient
>>>> ones. Should we issue a warning any time we're removing the flag?
>>>>
>>> If you mean that we should remove the flags from the public API, I
>>> agree. If you mean we should just ignore them, then no, because most
>>> of the flags were added for internal components that really need their
>>> semantics.
>> We can't remove them from public API before Infinspan 10, and I think
>> that it will be a quite an unpopular step even after that. But until 10,
>> I think that the common agreement was to not break query, that is ignore
>> the flags. And make write-only reading.
>>
> So SKIP_INDEXING should not skip indexing because it can break query??

Ehm... Talking about all flags was wrong, and I think that I've also 
mixed your input on write-only command and on flags. Let's reiterate, 
until we hide the flags (in 10+):

A) how should we treat SKIP_CACHE_LOAD with respect to (clustered) 
listeners, query, and write skew check? (IIRC we ignore that for 
purposes of WSC)
B) for write-only, will we load the value if necessary 
(listeners/query/wsc)? (I guess that the answer is yes)
C) for write-only, will we treat PersistenceMode.SKIP differently?
D) how should we treat SKIP_REMOTE_LOOKUP when the current write-owner 
is not a read-owner?

R.

>
>> R.
>>
>>> Dan
>>>
>>>
>>>> Radim
>>>>
>>>>>>> So now I'm thinking we should retry the write commands when
>>>>>>> isPreviousValueNeeded() changes... Not very appealing, but I think the
>>>>>>> performance difference is worth it.
>>>>>>>
>>>>>>>> R.
>>>>>>>>
>>>>>>>>>> 4) remove write-only commands completely (and probably functional
>>>>>>>>>> listeners as well because these will lose their purpose)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +1 to remove "unconditional writes", at least an entry version check should
>>>>>>>>>> be applied.
>>>>>>>>>> I believe we had already pointed out this would eventually happen, pretty
>>>>>>>>>> much for the reasons you're hitting now.
>>>>>>>>>>
>>>>>>>>> IMO version checks should be done internally, we shouldn't force the
>>>>>>>>> users of the functional API to deal with versions themselves because
>>>>>>>>> we know how hard making write skew checks work is for us :)
>>>>>>>>>
>>>>>>>>> And I wouldn't go as far as to remove the functional listeners,
>>>>>>>>> instead I would change them so that read-write listeners are invoked
>>>>>>>>> on write-only operations and they force the loading of the previous
>>>>>>>>> value. I would also add a way for the regular listeners to say whether
>>>>>>>>> they need the previous value or not.
>>>>>>>>>
>>>>>>>>>> Right now I am inclined towards 4). There could be some internal use
>>>>>>>>>> (e.g. multimaps) that could use 1) which is ran without a fancy setup,
>>>>>>>>>> though, but it's asking for trouble.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I agree!
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> WDYT?
>>>>>>>>>>
>>>>>>>>>> Radim
>>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>> Dan
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>> --
>>>>>> Radim Vansa <rvansa at redhat.com>
>>>>>> JBoss Performance Team
>>>>>>
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> --
>>>> Radim Vansa <rvansa at redhat.com>
>>>> JBoss Performance Team
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Radim Vansa <rvansa at redhat.com>
>> JBoss Performance Team
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


-- 
Radim Vansa <rvansa at redhat.com>
JBoss Performance Team



More information about the infinispan-dev mailing list