[infinispan-dev] [infinispan-internal] Continuous Queries

Sanne Grinovero sanne at infinispan.org
Thu Oct 17 18:29:16 EDT 2013


On 17 October 2013 20:19, Mircea Markus <mmarkus at redhat.com> wrote:
> let's keep this on -dev.

+1

> On Oct 17, 2013, at 6:24 PM, Sanne Grinovero <sanne at redhat.com> wrote:
>> ----- Original Message -----
>>>
>>> On Oct 17, 2013, at 2:28 PM, Sanne Grinovero <sanne at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> On Oct 17, 2013, at 1:31 PM, Sanne Grinovero <sanne at redhat.com> wrote:
>>>>>
>>>>>> With some custom coding it's certainly possible to define an event
>>>>>> listener
>>>>>> which triggers when an entry is inserted/removed which matches a certain
>>>>>> Query.
>>>>>
>>>>> where would hold the  the query result? a cache perhaps?
>>>>
>>>> Why do you need to hold on to the query result?
>>>> I was thinking to just send an event "newly stored X matches query Q1".
>>>
>>> You don't have a single process receive all the notifications then, but
>>> multiple processes in the cluster. It's up to the user to aggregate these
>>> results (that's why I mentioned a cache) but without aggregation this
>>> feature is pretty limiting.
>>
>> I have no idea if it's limiting. For the use case I understood, that's pretty decent.
>
> Here's my understanding of CQ[1]: a user queries a cache 10000000( you add the rest of 0) per second.
> Instead of executing the query every time (very resource consuming) the system caches the query result, update it when underlying data gets modified, and return to the user on every invocation. Optionally you can register a listener on the query result, but that's just API sugar.

That's an implementation detail, I need a use case.

Assuming you store a good amount of entries, you know, maybe so many
that I actually need a data grid instead of a simple HashMap or a USB
stick, as a Query user I don't think I would always want to actually
fetch locally all data, when all I need is maybe sound an alarm bell.

A use case could be that I'm interested in some stock, specifically I
want to be notified ASAP for course changes for the stock traded on
market "Neverland", so I register a continuous query "from stock where
stock.market = 'Neverland' ".
Let's also assume that Neverland trades approximately 5,000 titles.

My application starts and fetches all current values with a one-off
full query (using that same query), so I fetch all 5,000 locally. Next
step, I want to be notified ASAP when one of these change value, so
that I can react on it.
Then I get my first notification! cool, my nice List API provides me
with the new value for 5,000 titles.. which one changed? let me find
out, I can scan on my previous results and find out..
(Note that I'm not even getting into the detail of how we got all
those titles locally: using deltas or not is irrelevant).

That's certainly doable, but what if you have more than 5,000 titles..
it's degenerating. Of course you could wrap this "resultset" in some
more syntactic sugar, but essentially what you need to implement the
client side API is to receive the single events.

I'm not focusing on the client side sugar because of Divya's original question:
 "a feasible path to achieve this functionality via some custom
coding, even though it is not the most efficient path (because
Continuous Queries are not available out of the box)."


>From a very different perspective, look at it in terms of a scalable
architecture: when dealing with large amounts of data, the List
interface is conceptually not cutting it; I would expect you to ban
it, not to encourage it.
Assuming the client is also designed as a a properly scalable system,
if you were to provide it with a List this would likely need to
iterate on it to forward each single element as a task to some
parallel executor. It's much simpler if you push them one by one: it
could still wrap each in a task, but you cut on the latency which you
would otherwise introduce to collect all single items and you can
allow users to insert a load balancer between your crazy scalable
event generator and the target of these notifications.

(Because really if you setup such a feature on a large grid, it will
be come a crazy scalable event generator)

>>>> You could register multiple such listeners, getting the effect of "newly
>>>> stored entry X matches Query set {Q1, Q3, Q7}"
>>>
>>> The listeners would not be collocated.
>>
>> I'm not going to implement distributed listeners, I indeed expect you to register such a listener on each node.
>
> If I run a query, continuous or not, I'd expect to be able to get all the result set of that query on the process on which I invoke it. Call me old fashion :-)
>
>>
>> I can show how to make Continous Queries on the Query API to accomplish this.
>
> I wouldn't name the problem your solution solve Continuous Query :-)
>
>> Anything else is out of scope for me :-) Technically I think it's out of scope for Infinispan too, it should delegate to a message bus.
>
> -1, for the reasons mentioned above.
>
> [1] http://coherence.oracle.com/display/COH31UG/Continuous+Query

Do you realize this page is confirming a List is fundamentally wrong :-)
it's listing a bunch of fallacies to explain common errors, which all
boil down to an attempt of iterating on the entries, and then states:

"The solution is to provide the listener during construction, and it
will receive one event for each item that is in the Continuous Query
Cache, whether it was there to begin with (because it was in the
query) or if it got added during or after the construction of the
cache"

Finally, a consistency consideration on how to create such a list: if
you get multiple events in short time, you'll never know which one is
correct because of interleaving of the notifications. There is no way
to iterate (search) a list of results in Infinispan in a consistent
transactional view, unless you want me to lock all entries and repeat
the query to confirm. By NOT providing a List access, you avoid the
problem of consistency and don't introduce contentions points like
"aggregating it all in one placeholder".

Also interesting from Coherence's wiki: they have their results
implement InvocableMap, essentially a representation of a conceptual
data partition on which you can the invoke operations, by moving
execution to the data. I think that's brilliant, and makes it quite
clear that no such list is sent to the client.

Cheers,
Sanne



More information about the infinispan-dev mailing list