[infinispan-dev] Query.getResultSize() to be available on the simplified DSL?

Mon Mar 10 13:09:29 EDT 2014

On Mar 10, 2014, at 15:12, Sanne Grinovero <sanne at infinispan.org> wrote:

> Ok you make some good points, and I've no doubts of it being useful.
> 
> My only concern is that this could slow us down significantly in
> providing other features which might be even more useful or pressing.
> You have to pick your battles and be wise on where to spend energy
> first.
> 
> Considering that it's easier to add methods than to remove them, what
> would you think of marking this as experimental for now?
> I'd prefer to see the non-indexed query engine delivered first; this
> sounds like being a stone on the critical path so it might be wise to
> have the option to drop the requirement from a first implementation.
> Definitely you're right that we should then implement "some" COUNT
> strategy, I'm just not comfortable in committing on this one yet.

I can imagine a lot of users emulating this by simply iterating over the entries in the result set. Even if we do just that and document it as slow, I think it's still worth exposing this somewhere.

> 
> Now on a general purpose COUNT: for sure we need one but it's a
> pandora's box you're opening. In a sense there is a parallelism
> conceptually with my concerns on the API contract we provide for the
> clear() method. too keep it short in this context as we're changing
> subject, I don't think we'll ever be able to provide a solid guarantee
> of a fully reliable value: indexes are not updated in transaction yet,
> and M/R does cross boundaries of nodes and datacontainer/cachestore
> without making a consistent read snapshot. We should document any such
> API as to providing a best effort estimate.

> 
> 
> 
> On 10 March 2014 13:16, Adrian Nistor <anistor at redhat.com> wrote:
>> I'd vote for keeping it, and executing it lazily in environments where it is
>> costly to compute it upfront.
>> 
>> And off course, document this properly so users will be aware it can incur a
>> second execution, with significant performance impact and also possibly a
>> data visibility/consistency impact. I'd do this because the api is meant to
>> be first of all user friendly and useful, not just machine friendly and
>> efficient.
>> 
>> There's another reason for having it. Say we remove it, how will users be
>> able to know the total number of matching results?  Our DSL does not
>> currently have a 'count' function. Maybe we should add such a thing first,
>> and then think about removing Query.getResultsSize().
>> 
>> But, if we implement a proper 'count', getResultsSize() could be trivially
>> implemented as some kind of syntactic sugar on top of it, so I would still
>> consider it worth being in the API.
>> 
>> And then it all boils down to the question: should the DSL provide a count
>> function? (+1 from me)
>> 
>> Cheers
>> 
>> 
>> On 03/10/2014 02:23 PM, Sanne Grinovero wrote:
>> 
>> Hi all,
>> we are exposing a nice feature inherited from the Search engine via
>> the "simple" DSL version, the one which is also available via Hot Rod:
>> 
>> org.infinispan.query.dsl.
>> Query.getResultSize()
>> 
>> To be fair I hadn't noticed we do expose this, I just noticed after a
>> recent PR review and I found it surprising.
>> 
>> This method returns the size of the full resultset, disregarding
>> pagination options; you can imagine it fit for situations like:
>> 
>>   "found 6 million matches, these are the top 20: "
>> 
>> A peculiarity of Hibernate Search is that the total number of matches
>> is extremely cheap to figure out as it's generally a side effect of
>> finding the 20 results. Essentially we're just exposing an int value
>> which was already computed: very cheap, and happens to be useful in
>> practice.
>> 
>> This is not the case with a SQL statement, in this case you'd have to
>> craft 2 different SQL statements, often incurring the cost of 2 round
>> trips to the database. So this getResultSize() is not available on the
>> Hibernate ORM Query, only on our FullTextQuery extension.
>> 
>> Now my doubt is if it is indeed a wise move to expose this method on
>> the simplified DSL. Of course some people might find it useful, still
>> I'm wondering how much we'll be swearing at needing to maintain this
>> feature vs its usefulness when we'll implement alternative execution
>> engines to run queries, not least on Map/Reduce based filtering, and
>> ultimately hybrid strategies.
>> 
>> In case of Map/Reduce I think we'll need to keep track of possible
>> de-duplication of results, in case of a Teiid integration it might
>> need a second expensive query; so in this case I'd expect this method
>> to be lazily evaluated.
>> 
>> Should we rather remove this functionality?
>> 
>> Sanne
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)