[infinispan-dev] Strict Expiration

Wed Jul 22 10:53:19 EDT 2015

Is it possible/feasible to skip the notification from the backups to
the primary (and back) when there is no clustered expiration listener?

Dan

On Tue, Jul 21, 2015 at 5:25 PM, William Burns <mudokonman at gmail.com> wrote:
> So I wanted to sum up what it looks like the plan is for this in regards to
> cluster expiration for ISPN 8.
>
> First off to not make it ambiguous, maxIdle being used with a clustered
> cache will provide undefined and unsupported behavior.  This can and will
> expire entries on a single node without notifying other cluster members
> (essentially it will operate as it does today unchanged).
>
> This leaves me to talk solely about lifespan cluster expiration.
>
> Lifespan Expiration events are fired by the primary owner of an expired key
>
> - when accessing an expired entry.
>
> - by the reaper thread.
>
> If the expiration is detected by a node other than the primary owner, an
> expiration command is sent to it and null is returned immediately not
> waiting for a response.
>
> Expiration event listeners follow the usual rules for sync/async: in the
> case of a sync listener, the handler is invoked while holding the lock,
> whereas an async listener will not hold locks.
>
> It is desirable for expiration events to contain both the key and value.
> However currently cache stores do not provide the value when they expire
> values.  Thus we can only guarantee the value is present when an in memory
> expiration event occurs.  We could plan on adding this later.
>
> Also as you may have guessed this doesn't touch strict expiration, which I
> think we have come to the conclusion should only work with maxIdle and as
> such this is not explored with this iteration.
>
> Let me know if you guys think this approach is okay.
>
> Cheers,
>
>  - Will
>
> On Tue, Jul 14, 2015 at 1:51 PM Radim Vansa <rvansa at redhat.com> wrote:
>>
>> Yes, I know about [1]. I've worked that around by storing timestamp in
>> the entry as well and when a new record is added, the 'expired'
>> invalidations are purged. But I can't purge that if I don't access it -
>> Infinispan needs to handle that internally.
>>
>> Radim
>>
>> [1] https://hibernate.atlassian.net/browse/HHH-6219
>>
>> On 07/14/2015 05:45 PM, Dennis Reed wrote:
>> > On 07/14/2015 11:08 AM, Radim Vansa wrote:
>> >> On 07/14/2015 04:19 PM, William Burns wrote:
>> >>>
>> >>> On Tue, Jul 14, 2015 at 9:37 AM William Burns <mudokonman at gmail.com
>> >>> <mailto:mudokonman at gmail.com>> wrote:
>> >>>
>> >>>       On Tue, Jul 14, 2015 at 4:41 AM Dan Berindei
>> >>>       <dan.berindei at gmail.com <mailto:dan.berindei at gmail.com>> wrote:
>> >>>
>> >>>           Processing expiration only on the reaper thread sounds nice,
>> >>> but I
>> >>>           have one reservation: processing 1 million entries to see
>> >>> that
>> >>>           1 of
>> >>>           them is expired is a lot of work, and in the general case we
>> >>>           will not
>> >>>           be able to ensure an expiration precision of less than 1
>> >>>           minute (maybe
>> >>>           more, with a huge SingleFileStore attached).
>> >>>
>> >>>
>> >>>       This isn't much different then before.  The only difference is
>> >>>       that if a user touched a value after it expired it wouldn't show
>> >>>       up (which is unlikely with maxIdle especially).
>> >>>
>> >>>
>> >>>           What happens to users who need better precision? In
>> >>>           particular, I know
>> >>>           some JCache tests were failing because HotRod was only
>> >>> supporting
>> >>>           1-second resolution instead of the 1-millisecond resolution
>> >>>           they were
>> >>>           expecting.
>> >>>
>> >>>
>> >>>       JCache is an interesting piece.  The thing about JCache is that
>> >>>       the spec is only defined for local caches.  However I wouldn't
>> >>>       want to muddy up the waters in regards to it behaving
>> >>> differently
>> >>>       for local/remote.  In the JCache scenario we could add an
>> >>>       interceptor to prevent it returning such values (we do something
>> >>>       similar already for events).  JCache behavior vs ISPN behavior
>> >>>       seems a bit easier to differentiate.  But like you are getting
>> >>> at,
>> >>>       either way is not very appealing.
>> >>>
>> >>>
>> >>>
>> >>>           I'm even less convinced about the need to guarantee that a
>> >>>           clustered
>> >>>           expiration listener will only be triggered once, and that
>> >>> the
>> >>>           entry
>> >>>           must be null everywhere after that listener was invoked.
>> >>>           What's the
>> >>>           use case?
>> >>>
>> >>>
>> >>>       Maybe Tristan would know more to answer.  To be honest this work
>> >>>       seems fruitless unless we know what our end users want here.
>> >>>       Spending time on something for it to thrown out is never fun :(
>> >>>
>> >>>       And the more I thought about this the more I question the
>> >>> validity
>> >>>       of maxIdle even.  It seems like a very poor way to prevent
>> >>> memory
>> >>>       exhaustion, which eviction does in a much better way and has
>> >>> much
>> >>>       more flexible algorithms.  Does anyone know what maxIdle would
>> >>> be
>> >>>       used for that wouldn't be covered by eviction?  The only thing I
>> >>>       can think of is cleaning up the cache store as well.
>> >>>
>> >>>
>> >>> Actually I guess for session/authentication related information this
>> >>> would be important.  However maxIdle isn't really as usable in that
>> >>> case since most likely you would have a sticky session to go back to
>> >>> that node which means you would never refresh the last used date on
>> >>> the copies (current implementation).  Without cluster expiration you
>> >>> could lose that session information on a failover very easily.
>> >> I would say that maxIdle can be used as for memory management as kind
>> >> of
>> >> WeakHashMap - e.g. in 2LC the maxIdle is used to store some record for
>> >> a
>> >> short while (regular transaction lifespan ~ seconds to minutes), and
>> >> regularly the record is removed. However, to make sure that we don't
>> >> leak records in this cache (if something goes wrong and the remove does
>> >> not occur), it is removed.
>> > Note that just relying on maxIdle doesn't guarantee you won't leak
>> > records in this use case (specifically with the way the current
>> > hibernate-infinispan 2LC implementation uses it).
>> >
>> > Hibernate-infinispan adds entries to its own Map stored in Infinispan,
>> > and expects maxIdle to remove the map if it skips a remove.  But in a
>> > current case, we found that due to frequent accesses to that same map
>> > the entries never idle out and it ends up in OOME).
>> >
>> > -Dennis
>> >
>> >> I can guess how long the transaction takes place, but not how many
>> >> parallel transactions there are. With eviction algorithms (where I am
>> >> not sure about the exact guarantees) I can set the cache to not hold
>> >> more than N entries, but I can't know for sure that my record does not
>> >> suddenly get evicted after shorter period, possibly causing some
>> >> inconsistency.
>> >> So this is similar to WeakHashMap by removing the key "when it can't be
>> >> used anymore" because I know that the transaction will finish before
>> >> the
>> >> deadline. I don't care about the exact size, I don't want to tune that,
>> >> I just don't want to leak.
>> >>
>> >>    From my POV the non-strict maxIdle and strict expiration would be a
>> >> nice compromise.
>> >>
>> >> Radim
>> >>
>> >>>           Note that this would make the reaper thread less efficient:
>> >>> with
>> >>>           numOwners=2 (best case), half of the entries that the reaper
>> >>>           touches
>> >>>           cannot be expired, because the node isn't the primary node.
>> >>> And to
>> >>>           make matters worse, the same reaper thread would have to
>> >>> perform a
>> >>>           (synchronous?) RPC for each entry to ensure it expires
>> >>> everywhere.
>> >>>
>> >>>
>> >>>       I have debated about this, it could something like a sync
>> >>>       removeAll which has a special marker to tell it is due to
>> >>>       expiration (which would raise listeners there), while also
>> >>> sending
>> >>>       a cluster expiration event to other non owners.
>> >>>
>> >>>
>> >>>           For maxIdle I'd like to know more information about how
>> >>>           exactly the
>> >>>           owners would coordinate to expire an entry. I'm pretty sure
>> >>> we
>> >>>           cannot
>> >>>           avoid ignoring some reads (expiring an entry immediately
>> >>> after
>> >>>           it was
>> >>>           read), and ensuring that we don't accidentally extend an
>> >>>           entry's life
>> >>>           (like the current code does, when we transfer an entry to a
>> >>>           new owner)
>> >>>           also sounds problematic.
>> >>>
>> >>>
>> >>>       For lifespan it is simple, the primary owner just expires it
>> >>> when
>> >>>       it expires there.  There is no coordination needed in this case
>> >>> it
>> >>>       just sends the expired remove to owners etc.
>> >>>
>> >>>       Max idle is more complicated as we all know.  The primary owner
>> >>>       would send a request for the last used time for a given key or
>> >>> set
>> >>>       of keys.  Then the owner would take those times and check for a
>> >>>       new access it isn't aware of.  If there isn't then it would send
>> >>> a
>> >>>       remove command for the key(s).  If there is a new access the
>> >>> owner
>> >>>       would instead send the last used time to all of the owners.  The
>> >>>       expiration obviously would have a window that if a read occurred
>> >>>       after sending a response that could be ignored.  This could be
>> >>>       resolved by using some sort of 2PC and blocking reads during
>> >>> that
>> >>>       period but I would say it isn't worth it.
>> >>>
>> >>>       The issue with transferring to a new node refreshing the last
>> >>>       update/lifespan seems like just a bug we need to fix
>> >>> irrespective
>> >>>       of this issue IMO.
>> >>>
>> >>>
>> >>>           I'm not saying expiring entries on each node independently
>> >>> is
>> >>>           perfect,
>> >>>           far from it. But I wouldn't want us to provide new
>> >>> guarantees that
>> >>>           could hurt performance without a really good use case.
>> >>>
>> >>>
>> >>>       I would guess that user perceived performance should be a little
>> >>>       faster with this.  But this also depends on an alternative that
>> >>> we
>> >>>       decided on :)
>> >>>
>> >>>       Also the expiration thread pool is set to min priority atm so it
>> >>>       may delay removal of said objects but hopefully (if the jvm
>> >>>       supports) it wouldn't overrun a CPU while processing unless it
>> >>> has
>> >>>       availability.
>> >>>
>> >>>
>> >>>           Cheers
>> >>>           Dan
>> >>>
>> >>>
>> >>>           On Mon, Jul 13, 2015 at 9:25 PM, Tristan Tarrant
>> >>>           <ttarrant at redhat.com <mailto:ttarrant at redhat.com>> wrote:
>> >>>           > After re-reading the whole original thread, I agree with
>> >>> the
>> >>>           proposal
>> >>>           > with two caveats:
>> >>>           >
>> >>>           > - ensure that we don't break JCache compatibility
>> >>>           > - ensure that we document this properly
>> >>>           >
>> >>>           > Tristan
>> >>>           >
>> >>>           > On 13/07/2015 18:41, Sanne Grinovero wrote:
>> >>>           >> +1
>> >>>           >> You had me convinced at the first line, although "A lot
>> >>> of
>> >>>           code can now
>> >>>           >> be removed and made simpler" makes it look extremely
>> >>> nice.
>> >>>           >>
>> >>>           >> On 13 Jul 2015 18:14, "William Burns"
>> >>> <mudokonman at gmail.com
>> >>>           <mailto:mudokonman at gmail.com>
>> >>>           >> <mailto:mudokonman at gmail.com
>>
>> >>>           <mailto:mudokonman at gmail.com>>> wrote:
>> >>>           >>
>> >>>           >>     This is a necro of [1].
>> >>>           >>
>> >>>           >>     With Infinispan 8.0 we are adding in clustered
>> >>>           expiration.  That
>> >>>           >>     includes an expiration event raised that is clustered
>> >>>           as well.
>> >>>           >>     Unfortunately expiration events currently occur
>> >>>           multiple times (if
>> >>>           >>     numOwners > 1) at different times across nodes in a
>> >>>           cluster.  This
>> >>>           >>     makes coordinating a single cluster expiration event
>> >>>           quite difficult.
>> >>>           >>
>> >>>           >>     To work around this I am proposing that the
>> >>> expiration
>> >>>           of an event
>> >>>           >>     is done solely by the owner of the given key that is
>> >>>           now expired.
>> >>>           >>     This would fix the issue of having multiple events
>> >>> and
>> >>>           the event can
>> >>>           >>     be raised while holding the lock for the given key so
>> >>>           concurrent
>> >>>           >>     modifications would not be an issue.
>> >>>           >>
>> >>>           >>     The problem arises when you have other nodes that
>> >>> have
>> >>>           expiration
>> >>>           >>     set but expire at different times.  Max idle is the
>> >>>           biggest offender
>> >>>           >>     with this as a read on an owner only refreshes the
>> >>>           owners timestamp,
>> >>>           >>     meaning other owners would not be updated and expire
>> >>>           preemptively.
>> >>>           >>     To have expiration work properly in this case you
>> >>> would
>> >>>           need
>> >>>           >>     coordination between the owners to see if anyone has
>> >>> a
>> >>>           higher
>> >>>           >>     value.  This requires blocking and would have to be
>> >>>           done while
>> >>>           >>     accessing a key that is expired to be sure if
>> >>>           expiration happened or
>> >>>           >>     not.
>> >>>           >>
>> >>>           >>     The linked dev listing proposed instead to only
>> >>> expire
>> >>>           an entry by
>> >>>           >>     the reaper thread and not on access.  In this case a
>> >>>           read will
>> >>>           >>     return a non null value until it is fully expired,
>> >>>           increasing hit
>> >>>           >>     ratios possibly.
>> >>>           >>
>> >>>           >>     Their are quire a bit of real benefits for this:
>> >>>           >>
>> >>>           >>     1. Cluster cache reads would be much simpler and
>> >>>           wouldn't have to
>> >>>           >>     block to verify the object exists or not since this
>> >>>           would only be
>> >>>           >>     done by the reaper thread (note this would have only
>> >>>           happened if the
>> >>>           >>     entry was expired locally).  An access would just
>> >>>           return the value
>> >>>           >>     immediately.
>> >>>           >>     2. Each node only expires entries it owns in the
>> >>> reaper
>> >>>           thread
>> >>>           >>     reducing how many entries they must check or remove.
>> >>>           This also
>> >>>           >>     provides a single point where events would be raised
>> >>> as
>> >>>           we need.
>> >>>           >>     3. A lot of code can now be removed and made simpler
>> >>> as
>> >>>           it no longer
>> >>>           >>     has to check for expiration.  The expiration check
>> >>>           would only be
>> >>>           >>     done in 1 place, the expiration reaper thread.
>> >>>           >>
>> >>>           >>     The main issue with this proposal is as the other
>> >>>           listing mentions
>> >>>           >>     is if user code expects the value to be gone after
>> >>>           expiration for
>> >>>           >>     correctness.  I would say this use case is not as
>> >>>           compelling for
>> >>>           >>     maxIdle, especially since we never supported it
>> >>>           properly.  And in
>> >>>           >>     the case of lifespan the user could very easily store
>> >>>           the expiration
>> >>>           >>     time in the object that they can check after a get as
>> >>>           pointed out in
>> >>>           >>     the other thread.
>> >>>           >>
>> >>>           >>     [1]
>> >>>           >>
>> >>>
>> >>> http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-strictly-not-returning-expired-values-td3428763.html
>> >>>           >>
>> >>>           >>  _______________________________________________
>> >>>           >>     infinispan-dev mailing list
>> >>>           >> infinispan-dev at lists.jboss.org
>> >>>           <mailto:infinispan-dev at lists.jboss.org>
>> >>>           <mailto:infinispan-dev at lists.jboss.org
>> >>>           <mailto:infinispan-dev at lists.jboss.org>>
>> >>>           >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>>           >>
>> >>>           >>
>> >>>           >>
>> >>>           >> _______________________________________________
>> >>>           >> infinispan-dev mailing list
>> >>>           >> infinispan-dev at lists.jboss.org
>> >>>           <mailto:infinispan-dev at lists.jboss.org>
>> >>>           >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>>           >>
>> >>>           >
>> >>>           > --
>> >>>           > Tristan Tarrant
>> >>>           > Infinispan Lead
>> >>>           > JBoss, a division of Red Hat
>> >>>           > _______________________________________________
>> >>>           > infinispan-dev mailing list
>> >>>           > infinispan-dev at lists.jboss.org
>> >>>           <mailto:infinispan-dev at lists.jboss.org>
>> >>>           > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>>           _______________________________________________
>> >>>           infinispan-dev mailing list
>> >>>           infinispan-dev at lists.jboss.org
>> >>>           <mailto:infinispan-dev at lists.jboss.org>
>> >>>           https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> infinispan-dev mailing list
>> >>> infinispan-dev at lists.jboss.org
>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> --
>> Radim Vansa <rvansa at redhat.com>
>> JBoss Performance Team
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev