[infinispan-dev] inverted distributed query

Tue Jun 19 15:40:43 EDT 2012

Hi Ales,

there are several strategies, what might work best depends on several
factors, not least on how many queries, index size, how much memory we
can dedicate for query caches, and what the ratio of updates is.

A Lucene Query produces a sparse BitSet, you can think of it as an
ordered list of matching ids, and a common use case is to wrap this
BitSet as a Filter so that it can be cached, reused and applied as
mask on other queries.

Assuming your set of predefined queries is rather limited, you can
cache all these BitSets, and when you deal with a specific document,
you search for it by "primary key" in the index (which is a very
efficient query), so you get what identifier it has (as index in the
bitset), and then you just look which queries are having a match.

The good is that reusing those BitSets is very efficient, the bad news
is that you have to rebuild some part of each BitSets (average of 10%
with default configurations) every time an index update is applied.
As a consequence, if what you need to do is list which queries match
for every document you *insert* - compared to just reads -
this is going to be an expensive approach.

Are you going to need this both for a Map/Reduce Query and a Lucene
Query, or are you just implying that both approaches would be fine for
you?

Do you have a practical example of such a Query? I'm wondering if
you're looking for features like MoreLikeThis or tagging suggestions,
which can be implemented more efficiently in different ways.

Sanne

On 19 June 2012 18:58, Ales Justin <ales.justin at gmail.com> wrote:
> @Sanne, Vladimir: a think-task for you two :)
>
> With CapeDwarf we need the following feature -- just the opposite from query results.
> A user has a document, and a set of pre-defined queries.
> Now we need to see which queries match the given document.
>
> A dummy impl is to iterate over queries and find the ones that match.
> But, this is of course not scalable.
>
> Any idea / suggestion on how to prepare Infinispan Query together with Distributed Execution framework to handle such feature?
>
> -Ales
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev