[infinispan-dev] Query.getResultSize() to be available on the simplified DSL?

Dennis Reed dereed at redhat.com
Tue Mar 11 13:16:35 EDT 2014


Providing methods that work sometimes and don't work other times is 
generally a bad idea.

No matter how much you document it, users *will* try to use it and 
expect it to always work
(either because they didn't read the docs that say otherwise, they think 
they'll stick to a configuration where it does work, etc.)

And then when it doesn't work (because they pushed something to 
production which has a different configuration than dev, etc)
it's a frustrating experience.

-Dennis

On 03/11/2014 09:37 AM, Randall Hauch wrote:
> I’m struggling with this same question in ModeShape. The JCR API exposes a method that returns the number of results, but at least the spec allows the implementation to return -1 if the size is not known (or very expensive to compute). Yet this still does not satisfy all cases.
>
> Depending upon the technology, computing the **exact size** ranges from very cheap to extremely expensive to calculate. For example, consider a system that has to take into account access control limitations of the user. My current opinion is that few applications actually need an exact size, and if they do there may be alternatives (like counting as they iterate over the results).
>
> An alternative is to expose an **approximate size**, which is likely to be sufficient for generating display or other pre-computed information such as links or paging details. I think that this is sufficient for most needs, and that even an order of magnitude is sufficient. When the results are known to be small, the system might want to determine the exact size (e.g., by iterating).
>
> So one option is to expose both methods, but allow the exact size method to return -1 if the system can’t determine the size or if doing so is very expensive. This allows the system a way out for large/complex queries and flexibility in the implementation technology. The approximate size method probably always needs to return at least some usable value.
>
> BTW, computing an exact size by iterating can be expensive unless you can keep all the results in memory. That’s not ideal - a query with large results could fill up available memory. If you don’t keep all results in memory, then if you’re going to allow clients to access the results more than once you have to provide a way to buffer the results.
>
>
> On Mar 10, 2014, at 7:23 AM, Sanne Grinovero <sanne at infinispan.org> wrote:
>
>> Hi all,
>> we are exposing a nice feature inherited from the Search engine via
>> the "simple" DSL version, the one which is also available via Hot Rod:
>>
>> org.infinispan.query.dsl.Query.getResultSize()
>>
>> To be fair I hadn't noticed we do expose this, I just noticed after a
>> recent PR review and I found it surprising.
>>
>> This method returns the size of the full resultset, disregarding
>> pagination options; you can imagine it fit for situations like:
>>
>>    "found 6 million matches, these are the top 20: "
>>
>> A peculiarity of Hibernate Search is that the total number of matches
>> is extremely cheap to figure out as it's generally a side effect of
>> finding the 20 results. Essentially we're just exposing an int value
>> which was already computed: very cheap, and happens to be useful in
>> practice.
>>
>> This is not the case with a SQL statement, in this case you'd have to
>> craft 2 different SQL statements, often incurring the cost of 2 round
>> trips to the database. So this getResultSize() is not available on the
>> Hibernate ORM Query, only on our FullTextQuery extension.
>>
>> Now my doubt is if it is indeed a wise move to expose this method on
>> the simplified DSL. Of course some people might find it useful, still
>> I'm wondering how much we'll be swearing at needing to maintain this
>> feature vs its usefulness when we'll implement alternative execution
>> engines to run queries, not least on Map/Reduce based filtering, and
>> ultimately hybrid strategies.
>>
>> In case of Map/Reduce I think we'll need to keep track of possible
>> de-duplication of results, in case of a Teiid integration it might
>> need a second expensive query; so in this case I'd expect this method
>> to be lazily evaluated.
>>
>> Should we rather remove this functionality?
>>
>> Sanne
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev



More information about the infinispan-dev mailing list