Re: [infinispan-dev] About ISPN-200 Distributed Queries

Thursday, 6 May 2010

Thanks Manik!

Israel Lacerra

On Wed, May 5, 2010 at 6:27 AM, Manik Surtani <manik(a)jboss.org&gt; wrote:

...
 Hi there

 On 4 May 2010, at 20:42, Israel Lacerra wrote:

 > I'm studying ISPN-200 cause I thinking about resolve this issue in my M.
 Sc. topic. About this, I want to make a couple of questions (and maybe they
 don't make sense):
 >
 > - Currently, If we have "-Dinfinispan.query.indexLocalOnly=true" the
 indexes are just local, right? And if
 "-Dinfinispan.query.indexLocalOnly=false", the indexes are global shared. Am
 I right?

 Yes.  Basically Lucene handles and stores the indexes.  Now you could have
 2 scenarios.  Scenario 1: where each node has its own private, non-shared
 set of indexes.  Scenario 2: there is a shared, global index, where each
 node writes to and updates this global index (perhaps stored on NFS, etc).
  The relevant scenario depends on how you configure Lucene.

 Now the switch in Infinispan controls which node(s) in the cluster actually
 do the indexing whenever there is a change in data in the cluster.  If you
 have configured Lucene to maintain non-shared indexes, then *every* node in
 the cache needs to update their own private index whenever there is a change
 in any entry, anywhere in the cluster.
  -Dinfinispan.query.indexLocalOnly=false will force Infinispan nodes to
 index changes that happen anywhere in the cluster.

 If the indexes are global and shared, then there is no need for each node
 to update the indexes.  Only the node that initiated the change should
 update the indexes, and -Dinfinispan.query.indexLocalOnly=true will force
 this behaviour.

 > - So, how ISPN-200 will work on this two possibilities?

 As for ISPN-200, this is part of what we need to think about.  Ideally, the
 only approach that will truly scale is for each node to maintain not just
 shared or non-shared indexes, but a fragment of the global index.  A
 fragment that pertains to just the data it owns.  So, assume we have this
 setup with 4 nodes:

 Caches: {A, B, C, D}

 Keys:

 K1 -> {A, B}
 K2 -> {B, C}
 K3 -> {C, D}

 A's index would have {K1}
 B's index would have {K1, K2}
 C's index would have {K2, K3}
 D's index would have {K3}

 So if we were to write a query that matches K1, that query would be sent to
 every node in the cluster and the results returned would look like:

 A: {K1}
 B: {K1}
 C: {}
 D: {}

 Similarly, if we were to write a query that matches K1 and K2, that query
 would be sent to every node in the cluster and the results returned would
 look like:

 A: {K1}
 B: {K1, K2}
 C: {K2}
 D: {}

 Now the tricky part will be to efficiently collate these partial results
 into a proper resultset to pass back to the user, including removing
 duplicates, proper ranking and ordering, etc.

 Hope this helps!

 Cheers
 Manik

 --
 Manik Surtani
 manik(a)jboss.org
 Lead, Infinispan
 Lead, JBoss Cache
 http://www.infinispan.org
 http://www.jbosscache.org

 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] About ISPN-200 Distributed Queries