On Thu 2013-01-03 18:19, Ales Justin wrote:
>>>> I think anything handled by not-Lucene is wrong.
>>>
>>> I'm afraid Lucene won't do it, so we have no option. It's
definitely
>>> not designed to do this: even a custom Collector can't return more
>>> results than Documents in its segment, as all representations work by
>>> using int as relative ids.
>>
>> Even if we have Document(s) for every possible combination - all elements of
cartesian product?
>> Where current impl basically handles 1 x 1 x 1 x ... x 1.
>> And new HS impl just needs to take this multi Document approach into account.
>
> The problem with this approach is that it will makes things hard for
> most usecase except the one involving associated collections.
> We would need to filter out documents that refer to the same entity
> unless the query ought to be a cartesian one.
> this seems like a wrong thing to do as it would cost us more index size
> and speed as well as make the overall code more complex.
> I'd rather have a transformer cartesianize the end result. That would be
> much cheaper and code localized.
But this way you only know the real size once you expand the results into cartesian
results.
Which, as we discussed, completely beaks any counters and possible lazy handling.
Not to mention you do this in-memory, which brings additional problems.
I would argue that this version would be more optimized and faster than
your approach:
- less IO
- less bandwidth
- less memory overhead
In fact if I was writing a relational driver, I would exactly do that:
create the cartesian denormalization at the very last moment to keep my
internal memory consumption low and make the network chattiness minimal.
But yes, counts would require to read all the data and pagination could
work, simply be returning too much data in theory - in practice, I think
my explanation above invalidates that point)
But relying on the counts when you do a cartesian query is just nuts
unless you plan on querying results on the many side. In which case
reversing your query is much more efficient in Lucene term.
Emmanuel