[infinispan-dev] Continuous Query Caching

Sanne Grinovero sanne.grinovero at gmail.com
Tue Sep 29 06:11:52 EDT 2009


When this is related to Lucene searches for repeating queries "tons of
times" it's not needed, as we do cache already the opened IndexReaders
and you can optionally even cache the results as reusable filters.
This IndexReader reopening strategy is customizable, and just
yesterday we opened an issue to create a new strategy especially
suited for Infinispan:
to check for fresh IndexReaders periodically, instead of at each
event, to cap the update frequency to some constant value and limit
network traffic.

Generally it's preferrable to not reopen the index for each insertion:
making sure it's not "too old" is enough.
When the cache reveals a new version of the index the new index can be
"pre-fetched" in background to be made available to searchers,
so when an IndexReader is needed to perform a new query the latency is
zero, and even indexwarmup might be performed when Lucene 2.9
will be released.

2009/9/29 Manik Surtani <manik at jboss.org>:
>
> On 29 Sep 2009, at 10:19, Mircea Markus wrote:
>
>>
>> On Sep 29, 2009, at 12:08 PM, Manik Surtani wrote:
>>
>>>
>>> On 29 Sep 2009, at 09:57, Mircea Markus wrote:
>>>
>>>> Hi,
>>>>
>>>> Again, this is a feature from Coherence[1].
>>>>
>>>> Basic idea is to execute a query against the cache, and hold the
>>>> result object. This result object will always have up to date
>>>> query result; this means that whenever something is modified in
>>>> the cache the result itself is updated. Advantage: if one performs
>>>> the same query very often(e.g. several times every millisecond)
>>>> the response will be fast and the system will not be overloaded.
>>>
>>> Is it really faster?  Surely all you save is the construction of
>>> the various query objects, but the query itself would have to be re-
>>> run every time.  Or does it attach a listener to the cache and
>>> check whether any new additions/removals should be used to update
>>> the result set?
>> this is the way it works. It is a sort of a near-cache, just that
>> instead of being invalidated it is updated whenever the cache is
>> updated. The documentation also suggests that they are using
>> listeners.
>>>  I don't see how that could be much faster though.
>> I think it might be if the you are running *the same query* tons of
>> times. Basically you don't do a map-reduce on all the nodes, but
>> rather on every insertion (especially if the number of insertion is
>> relative small compared to the number of same-query-bring-run) you
>> updated (if necessary) the cached query result.
>
> Hmm.  It would be pretty use-case-specific.  It's hard to see how this
> _generally_ performs better, since you need to make sure you are aware
> of all changes happening all over the cluster to keep this result set
> up to date (REPL-style scalability bottleneck!)
>
> Cheers
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>




More information about the infinispan-dev mailing list