[infinispan-dev] Strict Expiration
Dennis Reed
dereed at redhat.com
Tue Jul 14 11:45:16 EDT 2015
On 07/14/2015 11:08 AM, Radim Vansa wrote:
> On 07/14/2015 04:19 PM, William Burns wrote:
>>
>>
>> On Tue, Jul 14, 2015 at 9:37 AM William Burns <mudokonman at gmail.com
>> <mailto:mudokonman at gmail.com>> wrote:
>>
>> On Tue, Jul 14, 2015 at 4:41 AM Dan Berindei
>> <dan.berindei at gmail.com <mailto:dan.berindei at gmail.com>> wrote:
>>
>> Processing expiration only on the reaper thread sounds nice, but I
>> have one reservation: processing 1 million entries to see that
>> 1 of
>> them is expired is a lot of work, and in the general case we
>> will not
>> be able to ensure an expiration precision of less than 1
>> minute (maybe
>> more, with a huge SingleFileStore attached).
>>
>>
>> This isn't much different then before. The only difference is
>> that if a user touched a value after it expired it wouldn't show
>> up (which is unlikely with maxIdle especially).
>>
>>
>> What happens to users who need better precision? In
>> particular, I know
>> some JCache tests were failing because HotRod was only supporting
>> 1-second resolution instead of the 1-millisecond resolution
>> they were
>> expecting.
>>
>>
>> JCache is an interesting piece. The thing about JCache is that
>> the spec is only defined for local caches. However I wouldn't
>> want to muddy up the waters in regards to it behaving differently
>> for local/remote. In the JCache scenario we could add an
>> interceptor to prevent it returning such values (we do something
>> similar already for events). JCache behavior vs ISPN behavior
>> seems a bit easier to differentiate. But like you are getting at,
>> either way is not very appealing.
>>
>>
>>
>> I'm even less convinced about the need to guarantee that a
>> clustered
>> expiration listener will only be triggered once, and that the
>> entry
>> must be null everywhere after that listener was invoked.
>> What's the
>> use case?
>>
>>
>> Maybe Tristan would know more to answer. To be honest this work
>> seems fruitless unless we know what our end users want here.
>> Spending time on something for it to thrown out is never fun :(
>>
>> And the more I thought about this the more I question the validity
>> of maxIdle even. It seems like a very poor way to prevent memory
>> exhaustion, which eviction does in a much better way and has much
>> more flexible algorithms. Does anyone know what maxIdle would be
>> used for that wouldn't be covered by eviction? The only thing I
>> can think of is cleaning up the cache store as well.
>>
>>
>> Actually I guess for session/authentication related information this
>> would be important. However maxIdle isn't really as usable in that
>> case since most likely you would have a sticky session to go back to
>> that node which means you would never refresh the last used date on
>> the copies (current implementation). Without cluster expiration you
>> could lose that session information on a failover very easily.
>
> I would say that maxIdle can be used as for memory management as kind of
> WeakHashMap - e.g. in 2LC the maxIdle is used to store some record for a
> short while (regular transaction lifespan ~ seconds to minutes), and
> regularly the record is removed. However, to make sure that we don't
> leak records in this cache (if something goes wrong and the remove does
> not occur), it is removed.
Note that just relying on maxIdle doesn't guarantee you won't leak
records in this use case (specifically with the way the current
hibernate-infinispan 2LC implementation uses it).
Hibernate-infinispan adds entries to its own Map stored in Infinispan,
and expects maxIdle to remove the map if it skips a remove. But in a
current case, we found that due to frequent accesses to that same map
the entries never idle out and it ends up in OOME).
-Dennis
> I can guess how long the transaction takes place, but not how many
> parallel transactions there are. With eviction algorithms (where I am
> not sure about the exact guarantees) I can set the cache to not hold
> more than N entries, but I can't know for sure that my record does not
> suddenly get evicted after shorter period, possibly causing some
> inconsistency.
> So this is similar to WeakHashMap by removing the key "when it can't be
> used anymore" because I know that the transaction will finish before the
> deadline. I don't care about the exact size, I don't want to tune that,
> I just don't want to leak.
>
> From my POV the non-strict maxIdle and strict expiration would be a
> nice compromise.
>
> Radim
>
>>
>> Note that this would make the reaper thread less efficient: with
>> numOwners=2 (best case), half of the entries that the reaper
>> touches
>> cannot be expired, because the node isn't the primary node. And to
>> make matters worse, the same reaper thread would have to perform a
>> (synchronous?) RPC for each entry to ensure it expires everywhere.
>>
>>
>> I have debated about this, it could something like a sync
>> removeAll which has a special marker to tell it is due to
>> expiration (which would raise listeners there), while also sending
>> a cluster expiration event to other non owners.
>>
>>
>> For maxIdle I'd like to know more information about how
>> exactly the
>> owners would coordinate to expire an entry. I'm pretty sure we
>> cannot
>> avoid ignoring some reads (expiring an entry immediately after
>> it was
>> read), and ensuring that we don't accidentally extend an
>> entry's life
>> (like the current code does, when we transfer an entry to a
>> new owner)
>> also sounds problematic.
>>
>>
>> For lifespan it is simple, the primary owner just expires it when
>> it expires there. There is no coordination needed in this case it
>> just sends the expired remove to owners etc.
>>
>> Max idle is more complicated as we all know. The primary owner
>> would send a request for the last used time for a given key or set
>> of keys. Then the owner would take those times and check for a
>> new access it isn't aware of. If there isn't then it would send a
>> remove command for the key(s). If there is a new access the owner
>> would instead send the last used time to all of the owners. The
>> expiration obviously would have a window that if a read occurred
>> after sending a response that could be ignored. This could be
>> resolved by using some sort of 2PC and blocking reads during that
>> period but I would say it isn't worth it.
>>
>> The issue with transferring to a new node refreshing the last
>> update/lifespan seems like just a bug we need to fix irrespective
>> of this issue IMO.
>>
>>
>> I'm not saying expiring entries on each node independently is
>> perfect,
>> far from it. But I wouldn't want us to provide new guarantees that
>> could hurt performance without a really good use case.
>>
>>
>> I would guess that user perceived performance should be a little
>> faster with this. But this also depends on an alternative that we
>> decided on :)
>>
>> Also the expiration thread pool is set to min priority atm so it
>> may delay removal of said objects but hopefully (if the jvm
>> supports) it wouldn't overrun a CPU while processing unless it has
>> availability.
>>
>>
>> Cheers
>> Dan
>>
>>
>> On Mon, Jul 13, 2015 at 9:25 PM, Tristan Tarrant
>> <ttarrant at redhat.com <mailto:ttarrant at redhat.com>> wrote:
>> > After re-reading the whole original thread, I agree with the
>> proposal
>> > with two caveats:
>> >
>> > - ensure that we don't break JCache compatibility
>> > - ensure that we document this properly
>> >
>> > Tristan
>> >
>> > On 13/07/2015 18:41, Sanne Grinovero wrote:
>> >> +1
>> >> You had me convinced at the first line, although "A lot of
>> code can now
>> >> be removed and made simpler" makes it look extremely nice.
>> >>
>> >> On 13 Jul 2015 18:14, "William Burns" <mudokonman at gmail.com
>> <mailto:mudokonman at gmail.com>
>> >> <mailto:mudokonman at gmail.com
>> <mailto:mudokonman at gmail.com>>> wrote:
>> >>
>> >> This is a necro of [1].
>> >>
>> >> With Infinispan 8.0 we are adding in clustered
>> expiration. That
>> >> includes an expiration event raised that is clustered
>> as well.
>> >> Unfortunately expiration events currently occur
>> multiple times (if
>> >> numOwners > 1) at different times across nodes in a
>> cluster. This
>> >> makes coordinating a single cluster expiration event
>> quite difficult.
>> >>
>> >> To work around this I am proposing that the expiration
>> of an event
>> >> is done solely by the owner of the given key that is
>> now expired.
>> >> This would fix the issue of having multiple events and
>> the event can
>> >> be raised while holding the lock for the given key so
>> concurrent
>> >> modifications would not be an issue.
>> >>
>> >> The problem arises when you have other nodes that have
>> expiration
>> >> set but expire at different times. Max idle is the
>> biggest offender
>> >> with this as a read on an owner only refreshes the
>> owners timestamp,
>> >> meaning other owners would not be updated and expire
>> preemptively.
>> >> To have expiration work properly in this case you would
>> need
>> >> coordination between the owners to see if anyone has a
>> higher
>> >> value. This requires blocking and would have to be
>> done while
>> >> accessing a key that is expired to be sure if
>> expiration happened or
>> >> not.
>> >>
>> >> The linked dev listing proposed instead to only expire
>> an entry by
>> >> the reaper thread and not on access. In this case a
>> read will
>> >> return a non null value until it is fully expired,
>> increasing hit
>> >> ratios possibly.
>> >>
>> >> Their are quire a bit of real benefits for this:
>> >>
>> >> 1. Cluster cache reads would be much simpler and
>> wouldn't have to
>> >> block to verify the object exists or not since this
>> would only be
>> >> done by the reaper thread (note this would have only
>> happened if the
>> >> entry was expired locally). An access would just
>> return the value
>> >> immediately.
>> >> 2. Each node only expires entries it owns in the reaper
>> thread
>> >> reducing how many entries they must check or remove.
>> This also
>> >> provides a single point where events would be raised as
>> we need.
>> >> 3. A lot of code can now be removed and made simpler as
>> it no longer
>> >> has to check for expiration. The expiration check
>> would only be
>> >> done in 1 place, the expiration reaper thread.
>> >>
>> >> The main issue with this proposal is as the other
>> listing mentions
>> >> is if user code expects the value to be gone after
>> expiration for
>> >> correctness. I would say this use case is not as
>> compelling for
>> >> maxIdle, especially since we never supported it
>> properly. And in
>> >> the case of lifespan the user could very easily store
>> the expiration
>> >> time in the object that they can check after a get as
>> pointed out in
>> >> the other thread.
>> >>
>> >> [1]
>> >>
>> http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-strictly-not-returning-expired-values-td3428763.html
>> >>
>> >> _______________________________________________
>> >> infinispan-dev mailing list
>> >> infinispan-dev at lists.jboss.org
>> <mailto:infinispan-dev at lists.jboss.org>
>> <mailto:infinispan-dev at lists.jboss.org
>> <mailto:infinispan-dev at lists.jboss.org>>
>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> infinispan-dev mailing list
>> >> infinispan-dev at lists.jboss.org
>> <mailto:infinispan-dev at lists.jboss.org>
>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>
>> >
>> > --
>> > Tristan Tarrant
>> > Infinispan Lead
>> > JBoss, a division of Red Hat
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev at lists.jboss.org
>> <mailto:infinispan-dev at lists.jboss.org>
>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> <mailto:infinispan-dev at lists.jboss.org>
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
More information about the infinispan-dev
mailing list