[infinispan-dev] ISPN 200
Sanne Grinovero
sanne.grinovero at gmail.com
Wed Sep 15 09:42:32 EDT 2010
I agree with Emmanuel's approach, you have to be lazy but in chunks
for it, so you're able to advance selected streams you're being sent
and keep back the others according to selected sorting: also remember
a sorted result stream might not fit in the client's memory.
Lucene users are hit often by the memory requirements of a sort
operation, this would be very useful.
Also sending the query in broadcast is nice for a first
implementation, but this means you can scale the number of searched
items but we can't scale the number of queries, this should be
designed in such a way to make it possible in a future improvement to
send the query only to a subset o the nodes.
Cheers,
Sanne
2010/9/15 Emmanuel Bernard <emmanuel at hibernate.org>:
> Ah yes sorry guys scratch what I said before. I forgot you were distributing
> the query load and were fully resolving the results before sending them back
> (ie not the two step process done by default - load sorted ids then load the
> necessary objects)
> Well it's possible to have an algorithm that does something like that:
> - load the first few results of each node in sorted order
> - do a comparison between them on the client and return the highest results
> as long as there are other results of lower value queued up coming from all
> the other nodes (or that the end of result for a given node is reached)
> - if you are not in this situation, wait for results
> One of the problem with this algo is that the server has to keep the next
> results ready (has to remember) unless you push everything on the client wo
> waiting the next() calls which makes it not so lazy anymore :).
> Note that in practice, it does not work for relevance unless the dataset is
> known to be evenly distributed (term and stats wise) and we can compare the
> score value (which is a good first approximation until Sanne think more
> about the alternative he has explored with some other guys).
> You also need to deduplicate data.
> But I would do early perf comparisons between a solution where we load keys
> to the client and then load objects vs this super-lazy approach. I'm not
> convince the later is better.
>
> On 15 sept. 2010, at 14:04, Israel Lacerra wrote:
>
>>I think if we allways block iterator.next() until we have at least a value
>> of each node , we can apply sort in this values (since every node already
>> made this).
> I mean, the idea is to make a "merge" of the results... and the next()
> blocks until we have some results already sorted.
>
> Emmanuel, do you think this can work?
>
> On Wed, Sep 15, 2010 at 8:54 AM, Israel Lacerra <israeldl at gmail.com> wrote:
>>
>> >You give up any hope of implementing sorting (explicit or using
>> > relevance).
>>
>> I think if we allways block iterator.next() until we have at least a value
>> of each node , we can apply sort in this values (since every node already
>> made this). That's what you said, Navin?
>>
>> On Wed, Sep 15, 2010 at 8:16 AM, Navin Surtani <nsurtani at redhat.com>
>> wrote:
>>>
>>> Isn't that an issue that comes with lots of hits in a lazyIterator
>>> anyway?
>>>
>>> I mean, if you're only going to load up a 'few hits at a time' then you
>>> can in theory only sort the few hits that you have loaded up anyway
>>> right? :S
>>>
>>> On 15/09/10 12:01, Emmanuel Bernard wrote:
>>> > You give up any hope of implementing sorting (explicit or using
>>> > relevance).
>>> >
>>> > On 15 sept. 2010, at 12:16, Manik Surtani wrote:
>>> >
>>> >> How about something like:
>>> >>
>>> >> - Broadcast the query
>>> >> - Every node creates the QueryHits inst, runs the query and collects
>>> >> results. Starts streaming the results back immediately.
>>> >> - The lazy iterator returns to the user immediately, and maintains an
>>> >> internal cache of results coming in from N remote QueryHits instances
>>> >> - iterator.next() blocks until this cache has available entries to
>>> >> return.
>>> >> - on an implementation level, the GetHItsCommand (or something like
>>> >> that) could return with a single hit, or N hits, with a flag of
>>> >> whether more hits are available or not.
>>> >>
>>> >> Does that help?
>>> >>
>>> >> Manik
>>> >>
>>> >>
>>> >> On 14 Sep 2010, at 20:46, Israel Lacerra wrote:
>>> >>
>>> >>> Hi guys,
>>> >>>
>>> >>> I'm still thinking about it, but I don't have a really good idea
>>> >>> about the lazy iterator yet. The only way (that I see) I could make
>>> >>> it more lazily is:
>>> >>>
>>> >>> - Broadcast the query.
>>> >>> - Every node creates a QueryHits instance with the query and keep it
>>> >>> in a simple little cache (array, hash, etc)
>>> >>> - A "state" of the query is created and every lazyIterator.next()
>>> >>> must send a command to a node and get the next hit (the next key).
>>> >>> - After a certain time, the instances of queryHits "dies".
>>> >>>
>>> >>> It seems to me that this is not too efficient. But I don't have any
>>> >>> other ideas.
>>> >>>
>>> >>> Do you have any suggestions about it?
>>> >>>
>>> >>> thanks!
>>> >>> Israel
>>> >>>
>>>
>>>
>>> --
>>> Navin Surtani
>>> Intern Infinispan
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
More information about the infinispan-dev
mailing list