[infinispan-dev] About ISPN-200 Distributed Queries

Thu May 6 07:23:31 EDT 2010

Thanks Manik!

Israel Lacerra

On Wed, May 5, 2010 at 6:27 AM, Manik Surtani <manik at jboss.org> wrote:

> Hi there
>
> On 4 May 2010, at 20:42, Israel Lacerra wrote:
>
> > I'm studying ISPN-200 cause I thinking about resolve this issue in my M.
> Sc. topic. About this, I want to make a couple of questions (and maybe they
> don't make sense):
> >
> > - Currently, If we have "-Dinfinispan.query.indexLocalOnly=true" the
> indexes are just local, right? And if
> "-Dinfinispan.query.indexLocalOnly=false", the indexes are global shared. Am
> I right?
>
> Yes.  Basically Lucene handles and stores the indexes.  Now you could have
> 2 scenarios.  Scenario 1: where each node has its own private, non-shared
> set of indexes.  Scenario 2: there is a shared, global index, where each
> node writes to and updates this global index (perhaps stored on NFS, etc).
>  The relevant scenario depends on how you configure Lucene.
>
> Now the switch in Infinispan controls which node(s) in the cluster actually
> do the indexing whenever there is a change in data in the cluster.  If you
> have configured Lucene to maintain non-shared indexes, then *every* node in
> the cache needs to update their own private index whenever there is a change
> in any entry, anywhere in the cluster.
>  -Dinfinispan.query.indexLocalOnly=false will force Infinispan nodes to
> index changes that happen anywhere in the cluster.
>
> If the indexes are global and shared, then there is no need for each node
> to update the indexes.  Only the node that initiated the change should
> update the indexes, and -Dinfinispan.query.indexLocalOnly=true will force
> this behaviour.
>
> > - So, how ISPN-200 will work on this two possibilities?
>
> As for ISPN-200, this is part of what we need to think about.  Ideally, the
> only approach that will truly scale is for each node to maintain not just
> shared or non-shared indexes, but a fragment of the global index.  A
> fragment that pertains to just the data it owns.  So, assume we have this
> setup with 4 nodes:
>
> Caches: {A, B, C, D}
>
> Keys:
>
> K1 -> {A, B}
> K2 -> {B, C}
> K3 -> {C, D}
>
> A's index would have {K1}
> B's index would have {K1, K2}
> C's index would have {K2, K3}
> D's index would have {K3}
>
> So if we were to write a query that matches K1, that query would be sent to
> every node in the cluster and the results returned would look like:
>
> A: {K1}
> B: {K1}
> C: {}
> D: {}
>
> Similarly, if we were to write a query that matches K1 and K2, that query
> would be sent to every node in the cluster and the results returned would
> look like:
>
> A: {K1}
> B: {K1, K2}
> C: {K2}
> D: {}
>
> Now the tricky part will be to efficiently collate these partial results
> into a proper resultset to pass back to the user, including removing
> duplicates, proper ranking and ordering, etc.
>
> Hope this helps!
>
> Cheers
> Manik
>
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20100506/b2050d92/attachment-0001.html