[infinispan-dev] Write-only commands

Dan Berindei dan.berindei at gmail.com
Thu Jun 29 04:23:48 EDT 2017


On Wed, Jun 28, 2017 at 5:25 PM, Radim Vansa <rvansa at redhat.com> wrote:
> On 06/28/2017 01:17 PM, Radim Vansa wrote:
>> On 06/28/2017 10:40 AM, Dan Berindei wrote:
>>> On Wed, Jun 28, 2017 at 10:17 AM, Radim Vansa <rvansa at redhat.com> wrote:
>>>> On 06/27/2017 03:54 PM, Dan Berindei wrote:
>>>>> On Tue, Jun 27, 2017 at 2:43 PM, Adrian Nistor <anistor at redhat.com> wrote:
>>>>>> I've said this in a previous thread on this same issue, I will repeat myself
>>>>>> as many times as needed.
>>>>>>
>>>>>> Continuous queries require the previous value itself, not just knowledge of
>>>>>> the type of the previous value. Strongly typed caches solve no problem here.
>>>>>>
>>>>>> So if we half-fix query but leave CQ broken I will be half-happy (ie. very
>>>>>> depressed) :)
>>>>>>
>>>>>> I'd remove these commands completely or possibly remove them just from
>>>>>> public API and keep them internal.
>>>>>>
>>>>> +1 to remove the flags from the public API. Most of them are not safe
>>>>> for applications to use, and ignoring them when they can lead to
>>>>> inconsistencies would make them useless.
>>>>>
>>>>> E.g. the whole point of SKIP_INDEX_CLEANUP is that the cache doesn't
>>>>> know when it is safe to skip the delete statement, and it relies on
>>>>> the application making a (possibly wrong) choice.
>>>>>
>>>>> IGNORE_RETURN_VALUES should be safe to use, and we actually recommend
>>>>> that applications use it right now. If query or listeners need the
>>>>> previous value, then we should load it internally, but hide it from
>>>>> the user.
>>>>>
>>>>> But removing it opens another discussion: should we replace it in the
>>>>> public API with a new method AdvancedCache.ignoreReturnValues(), or
>>>>> should we make it the default and add a method
>>>>> AdvancedCache.forceReturnPreviousValues()?
>>>> Please don't derail the thread.
>>>>
>>> I don't think I'm derailing the thread: IGNORE_PREVIOUS_VALUES also
>>> breaks the previous value for listeners, even if the QueryInterceptor
>>> removes it from write commands. And it is public (+recommended) API,
>>> in fact most if not all of our performance tests use it.
>> That's just a flawed implementation. IPV is documented to be a 'safe'
>> flag that should affect mostly primary -> origin replication, all the
>> other is implementation. And we can fix that. Users should *not* expect
>> that it e.g. skips loading from a cache store. We have already removed
>> the modes that would be broken-by-design.
>>
>> On the other hand, write-only commands are not about *returning* the
>> value but about (not) *reading* it, therefore (in my eyes) user could
>> make that assumption and would like to enforce it this way. Even some
>> docs explaining PersistenceMode.SKIP suggest that.
>>
>> I don't want to talk about flags, because I see all flags but IPV as
>> 'effectively internal'. Let's discuss it more high-level. Some API
>> exposes non-reading operation - we can see that under some circumstances
>> this is not possible so we have options to 1) break stuff 2) break API
>> assumptions 3) sometimes break API assumptions 4) remove such API (to
>> not allow the user to make such assumptions). There's also an option 5)
>> to fail the operation if the API assumption would be broken. Though, I
>> don't fancy getting exception from a WriteOnlyMap.eval just because
>> someone has registered a listener.
>>
>>> For that matter, ClusteredCacheLoaderInterceptor also doesn't load the
>>> previous value on backup owners for most write commands
>>> (LoadType.PRIMARY), we'd need to change that as well.
>> Yes, all commands will have to load current value on all owners.
>
> Btw., this has definitely some impact on shared cachestores. It's not
> just that you have to load the value from cachestore - you also need to
> store the previous value along with the current value (in metadata),
> because if the primary owner crashes and the backups don't have it in
> memory, these need to check the persistence to see if the command was
> already executed.
>

I don't think this would work with the triangle algorithm. The primary
owner updates the shared store without waiting for acks from the
backups, so a backup trying to load the previous value is very likely
to find the updated value instead.

I'll hazard a guess that none of the operations using LoadType.OWNER
currently works with triangle + shared store.

> This means that the metadata could grow up to several times the size of
> the value itself (if there's high concurrency). And even worse with
> async caches, as these don't know when the command was actually finished
> and therefore rely on time-based expiration of the previous values :-/
>
> Non-shared cachestores don't need to store previous values as long if we
> can guarantee (flag) the entry as not eligible for eviction.
>
>>
>>>>>> On 06/27/2017 01:28 PM, Sanne Grinovero wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 27 Jun 2017 10:13, "Radim Vansa" <rvansa at redhat.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am working on entry version history (again). In Como we've discussed
>>>>>> that previous values are needed for (continuous) query and reliable
>>>>>> listeners,
>>>>>>
>>>>>>
>>>>>> Index based queries also require the previous value on a write - unless we
>>>>>> can get "strongly typed caches" giving guarantees about the class to
>>>>>> represent the content of a cache to be unique.
>>>>>>
>>>>>> Essentially we only need to know the type of the previous object. It might
>>>>>> be worth having a way to load the type metadata if the previous value only.
>>>>>>
>>>>>> so I wonder what should we do with functional write-only
>>>>>> commands. These are different to commands with flags, because flags
>>>>>> (other than ignore return value) are expected to break something.
>>>>>>
>>>>>>
>>>>>> Sorry I hope to not derail the thread but let's remind that we hope to
>>>>>> evolve beyond "flags are expected to break stuff" ; we never got to it but
>>>>>> search the mailing list.
>>>>>>
>>>>>> Since flags are exposed to the user I would rather they're not allowed to
>>>>>> break things.
>>>>>> Could they be treated as hints? Ignore the flag (and warn?) if the used
>>>>>> configuration/integrations veto them.
>>>>>>
>>>>>> Alternatively, let's remove them from API. Remember "The Jokre" POC was
>>>>>> intentionally designed to explore pushing the limits on performance w/o end
>>>>>> users having to solve puzzles, such as learning details about these flags
>>>>>> and their possible side effects.
>>>>>>
>>>>>> So assuming they become either "safe" or internal, maybe you can take
>>>>>> advantage of them?
>>>>>>
>>>>>> I see
>>>>>> the available options as:
>>>>>>
>>>>>> 1) run write-only commands 'optimized', ignoring any querying and such
>>>>>> (warn user that he will break it)
>>>>>>
>>>>>> 2) run write-only without any optimization, rendering them useless
>>>>>>
>>>>>> 3) detect when querying is set up (ignoring listeners and maybe other
>>>>>> stuff that could get broken)
>>>>>>
>>>>>>
>>>>>> Might be useful for making a POC work, but I believe query will be very
>>>>>> likely to be often enabled.
>>>>>> Having an either / or switch for different features in Infinispan will make
>>>>>> it harder to use and understand, so I'd rather see work on the right design
>>>>>> as taking temporary shortcuts risks baking into stone features which we
>>>>>> later struggle to fix or maintain.
>>>>>>
>>>>> I vote for this option.
>>>>>
>>>>> Query, listeners, and other components that need the previous value
>>>>> should not just assume that the application knows better, they should
>>>>> be able to change how operations works based on their needs. Of
>>>>> course, the reverse is also true: if the application uses write-only
>>>>> commands (or IGNORE_RETURN_VALUES) for performance reasons, it should
>>>>> be possible for the user to detect why the previous values are still
>>>>> loaded.
>>>> If it were just query (static configuration), I would be okay with this
>>>> idea. But as per listeners - besides tainting the design (event source
>>>> should not check if there's a listener) you'd need to check *before*
>>> The source wouldn't check for listeners explicitly, the notifier would
>>> have an isPreviousValueNeeded() method and precompute that before a
>>> listener is added or after a listener is removed. I was am assuming
>>> some listeners will not need the previous value, e.g. the listeners
>>> installed by streams.
>> You can cover your warts with a make-up but you'll still have warts :)
>>>> (DistributionI, CacheLoaderI) you have to call notify (cmd.perform,
>>>> EWI). So this is a space for race conditions or weird handling (if
>>>> there's a listener when I am about to call notify and my flags are not
>>>> cleared, skip the notification and pretend that this code was invoked
>>>> before the listener was registered...). Or do you have another solution
>>>> in mind (config option to disable listeners && all features using those?).
>>>>
>>> I was definitely going for the weird handling...
>>>
>>> My plan was to set a HAS_PREVIOUS_VALUE flag on the context entry when
>>> it's loaded, and check that before invoking a listener that needs the
>>> previous value. It is missing one edge case: if one thread starts a
>>> write operation, then another thread installs a listener that requires
>>> the previous value and iterates over the cache, the second thread may
>>> not see the value written by the first thread.
>> If the operations overlap, you could pretend that the write has finished
>> before the listener was invoked and simply not notify the listener. If I
>> am missing it please write it down in code. But handling this in any way
>> is still clumsy.
>>> So now I'm thinking we should retry the write commands when
>>> isPreviousValueNeeded() changes... Not very appealing, but I think the
>>> performance difference is worth it.
>>>
>>>> R.
>>>>
>>>>>> 4) remove write-only commands completely (and probably functional
>>>>>> listeners as well because these will lose their purpose)
>>>>>>
>>>>>>
>>>>>> +1 to remove "unconditional writes", at least an entry version check should
>>>>>> be applied.
>>>>>> I believe we had already pointed out this would eventually happen, pretty
>>>>>> much for the reasons you're hitting now.
>>>>>>
>>>>> IMO version checks should be done internally, we shouldn't force the
>>>>> users of the functional API to deal with versions themselves because
>>>>> we know how hard making write skew checks work is for us :)
>>>>>
>>>>> And I wouldn't go as far as to remove the functional listeners,
>>>>> instead I would change them so that read-write listeners are invoked
>>>>> on write-only operations and they force the loading of the previous
>>>>> value. I would also add a way for the regular listeners to say whether
>>>>> they need the previous value or not.
>>>>>
>>>>>> Right now I am inclined towards 4). There could be some internal use
>>>>>> (e.g. multimaps) that could use 1) which is ran without a fancy setup,
>>>>>> though, but it's asking for trouble.
>>>>>>
>>>>>>
>>>>>> I agree!
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>> Radim
>>>>>>
>>>>> Cheers
>>>>> Dan
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
>
> --
> Radim Vansa <rvansa at redhat.com>
> JBoss Performance Team
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


More information about the infinispan-dev mailing list