On Sep 29, 2009, at 12:24 PM, Manik Surtani wrote:

On 29 Sep 2009, at 10:19, Mircea Markus wrote:

On Sep 29, 2009, at 12:08 PM, Manik Surtani wrote:

On 29 Sep 2009, at 09:57, Mircea Markus wrote:

Hi,

Again, this is a feature from Coherence[1].

Basic idea is to execute a query against the cache, and hold the
result object. This result object will always have up to date
query result; this means that whenever something is modified in
the cache the result itself is updated. Advantage: if one performs
the same query very often(e.g. several times every millisecond)
the response will be fast and the system will not be overloaded.

Is it really faster? Surely all you save is the construction of
the various query objects, but the query itself would have to be re-
run every time. Or does it attach a listener to the cache and
check whether any new additions/removals should be used to update
the result set?
this is the way it works. It is a sort of a near-cache, just that
instead of being invalidated it is updated whenever the cache is
updated. The documentation also suggests that they are using
listeners.
I don't see how that could be much faster though.
I think it might be if the you are running *the same query* tons of
times. Basically you don't do a map-reduce on all the nodes, but
rather on every insertion (especially if the number of insertion is
relative small compared to the number of same-query-bring-run) you
updated (if necessary) the cached query result.

Hmm. It would be pretty use-case-specific.

I think there are many usecases for this(from coherence doc):

It is an ideal building block for Complex Event Processing (CEP) systems and event correlation engines.

It is ideal for situations in which an application repeats a particular query, and would benefit from always having instant access to the up-to-date result of that query.

A Continuous Query Cache is analogous to a materialized view, and is useful for accessing and manipulating the results of a query using the standard NamedCache API, and receiving an ongoing stream of events related to that query.

It's hard to see how this
_generally_ performs better, since you need to make sure you are aware
of all changes happening all over the cluster to keep this result set
up to date (REPL-style scalability bottleneck!)

yes, the performance questions would definitely be in the case of DIST. Even for this, one way of handling it is to migrate the queries on each node, on put, the node would determine weather it should replicate to a certain node's(the query owner) near cache. These would reduce the replication overhead to a minimum.