[infinispan-dev] QueryIterator inconsistencies

Tue Sep 25 05:51:05 EDT 2012

On 25 September 2012 09:35, Marko Lukša <marko.luksa at gmail.com> wrote:
> On 24.9.2012 19:03, Sanne Grinovero wrote:
>>> 3. QueryIterator.afterFirst() & beforeLast()
>>> These two are OK. Calling next() after afterFirst() should return the
>>> second element, while calling previous() should return the first element.
>> Are you sure they are ok? I'm actually wondering why they exist at
>> all; they don't look like very useful to me.
>> Why should someone want a method to directly jump to the second element?
>> Users already have the option to "beforeFirst && next" or use the
>> "jumpTo" / seek operation.
>>
> Oh yes, I was wondering about that myself when I first saw
> QueryIterator, but then figured someone needed/wanted it. If we
> deprecate the whole QueryIterator and introduce a new interface, I'd
> leave these methods out.
>
>>> 4. isFirst(), isLast()
>>> ListIterator does not have the concept of "current element", but the
>>> javadoc of all the is*() methods talk about the current element, which
>>> is ambiguous. Therefore isFirst() and isLast() should probably be
>>> renamed to isBeforeFirst() and isAfterLast().
>> I would remove these too. Am I missing some use case?
>>
> I think you're right. Also, instead of calling isBeforeFirst() and
> isAfterLast(), users can simply call !hasPrevious() and !hasNext(),
> which already exist.
>
>>> 5. jumpToResult(int index)
>>> When jumping to index=1, what does next() return? What about previous()?
>>> We should probably replace the method with two others: beforeResult(int
>>> index) and afterResult(int index)
>> I would add only "beforeResult(int)", again I don't see why we should
>> have both versions.
>> Since you have to do "next()" to retrieve a value, I'd consider
>> "beforeResults" more user friendly than "afterResults"
> Hmm, what if you want to iterate backwards. If you only had
> beforeResult(int), you wouldn't be able to jump behind the last element.
> You'd have to use afterLast(). For usecases where you always jump to the
> end and iterate backwards, afterLast() would be ok. But what if you
> needed to implement a method like iterateBackwards(int fromIndex,
> Visitor visitor). Then the method would need to have something like:  if
> (fromIndex == col.size()) it.afterLast() else it.beforeResult(fromIndex).
>
> Of course, we could allow users to call beforeResult(col.size()), but
> that's not completely clean, since there's no element with index
> col.size(). But still, it may be cleaner than having an additional
> method (afterResult).

Right I hadn't thought about backwards scrolling. Don't you think that
being able to revert the sort order at query level is enough?
If someone needs A-Z, runs the scroll in forward mode and defines
lexicographical ordering, if you then need Z-A you revert the sort and
run the iterator again in forward only mode:

AFAIR supporting the reverse iterator in distributed lazy queries is a
nightmare, so if we agree they are pointless that's a good
simplification in the codebase.

I can think of a single use case in which one can not revert the Sort:
when the search result is defined by relevance in a fulltext query,
but in this case the revert sort doesn't make any sense as the last
results (which would get on top) are a "tail" of uninteresting
information.

>
>
>>> Any comments / other ideas?
>>>
>>> Marko
>> That's a lot of changes needed. One would say we could deprecate all
>> methods and rewrite the new ones, but wouldn't it be better to
>> deprecate the interface and introduce a brand new one?
>> What about "ResultsIterator" ? after all, we're not iterating on queries..
>
> I agree. The only thing that bothers me is that CacheQuery would then
> need new methods called (lazy)resultIterator() (instead of iterator(),
> which already exists).

Right. though I like "resultIterator()" more so we're having luck in
this case ;)

Any better suggestion for names? Anyone thinks it's worth to remove
the old ones without a deprecation step to reuse the old names?

>
> This reminds me. What is the difference between LazyIterator and
> EagerIterator? Both of them execute the Lucene query immediately. The
> only difference is that EagerIterator loads EntityInfos straight away,
> while LazyIterator (only) loads the DocumentExtractor and then at first
> invocation of next(), it loads the EntityInfo and cache value at the
> same time (when filling the buffer).
>
> IMO, the difference between them is negligible and we could either scrap
> LazyResultIterator or make EagerResultIterator preload everything?

You have to consider that Infinispan could hold many more values than
what the current node/JVM is able to keep in memory;
for example when running a distributed query, it's likely that the
user doesn't want to preload all data; obviously when running "list()"
we *could* load things lazily on access, but that gets tricky as we
don't have a close() method and I would expect people to use an
iterator if they know the amount of data retrieved is having some
critical mass.
Of course if you know you're going to hit a limited number of entries
(and these entries are of limited size - some people store hd movie
streams) then the eager loading could be more efficient.

So the lazy-loaders are needed; it could be nice to think of a
different way to specify the option; we could have something like

  .iterate(LoadMode);

rather than having to define the methods twice.

Sanne

>
>
> Marko
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev