Ah yes sorry guys scratch what I said before. I forgot you were distributing the query
load and were fully resolving the results before sending them back (ie not the two step
process done by default - load sorted ids then load the necessary objects)
Well it's possible to have an algorithm that does something like that:
- load the first few results of each node in sorted order
- do a comparison between them on the client and return the highest results as long as
there are other results of lower value queued up coming from all the other nodes (or that
the end of result for a given node is reached)
- if you are not in this situation, wait for results
One of the problem with this algo is that the server has to keep the next results ready
(has to remember) unless you push everything on the client wo waiting the next() calls
which makes it not so lazy anymore :).
Note that in practice, it does not work for relevance unless the dataset is known to be
evenly distributed (term and stats wise) and we can compare the score value (which is a
good first approximation until Sanne think more about the alternative he has explored with
some other guys).
You also need to deduplicate data.
But I would do early perf comparisons between a solution where we load keys to the client
and then load objects vs this super-lazy approach. I'm not convince the later is
better.
On 15 sept. 2010, at 14:04, Israel Lacerra wrote:
>I think if we allways block iterator.next() until we have at
least a value of each node , we can apply sort in this values (since every node already
made this).
I mean, the idea is to make a "merge" of the results... and the next() blocks
until we have some results already sorted.
Emmanuel, do you think this can work?
On Wed, Sep 15, 2010 at 8:54 AM, Israel Lacerra <israeldl(a)gmail.com> wrote:
>You give up any hope of implementing sorting (explicit or using relevance).
I think if we allways block iterator.next() until we have at least a value of each node ,
we can apply sort in this values (since every node already made this). That's what you
said, Navin?
On Wed, Sep 15, 2010 at 8:16 AM, Navin Surtani <nsurtani(a)redhat.com> wrote:
Isn't that an issue that comes with lots of hits in a lazyIterator anyway?
I mean, if you're only going to load up a 'few hits at a time' then you
can in theory only sort the few hits that you have loaded up anyway
right? :S
On 15/09/10 12:01, Emmanuel Bernard wrote:
> You give up any hope of implementing sorting (explicit or using relevance).
>
> On 15 sept. 2010, at 12:16, Manik Surtani wrote:
>
>> How about something like:
>>
>> - Broadcast the query
>> - Every node creates the QueryHits inst, runs the query and collects
>> results. Starts streaming the results back immediately.
>> - The lazy iterator returns to the user immediately, and maintains an
>> internal cache of results coming in from N remote QueryHits instances
>> - iterator.next() blocks until this cache has available entries to return.
>> - on an implementation level, the GetHItsCommand (or something like
>> that) could return with a single hit, or N hits, with a flag of
>> whether more hits are available or not.
>>
>> Does that help?
>>
>> Manik
>>
>>
>> On 14 Sep 2010, at 20:46, Israel Lacerra wrote:
>>
>>> Hi guys,
>>>
>>> I'm still thinking about it, but I don't have a really good idea
>>> about the lazy iterator yet. The only way (that I see) I could make
>>> it more lazily is:
>>>
>>> - Broadcast the query.
>>> - Every node creates a QueryHits instance with the query and keep it
>>> in a simple little cache (array, hash, etc)
>>> - A "state" of the query is created and every lazyIterator.next()
>>> must send a command to a node and get the next hit (the next key).
>>> - After a certain time, the instances of queryHits "dies".
>>>
>>> It seems to me that this is not too efficient. But I don't have any
>>> other ideas.
>>>
>>> Do you have any suggestions about it?
>>>
>>> thanks!
>>> Israel
>>>
--
Navin Surtani
Intern Infinispan
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev