Thanks Manik! <br><br>Israel Lacerra<br><br><div class="gmail_quote">On Wed, May 5, 2010 at 6:27 AM, Manik Surtani <span dir="ltr"><<a href="mailto:manik@jboss.org">manik@jboss.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Hi there<br>
<div class="im"><br>
On 4 May 2010, at 20:42, Israel Lacerra wrote:<br>
<br>
> I'm studying ISPN-200 cause I thinking about resolve this issue in my M. Sc. topic. About this, I want to make a couple of questions (and maybe they don't make sense):<br>
><br>
> - Currently, If we have "-Dinfinispan.query.indexLocalOnly=true" the indexes are just local, right? And if "-Dinfinispan.query.indexLocalOnly=false", the indexes are global shared. Am I right?<br>
<br>
</div>Yes. Basically Lucene handles and stores the indexes. Now you could have 2 scenarios. Scenario 1: where each node has its own private, non-shared set of indexes. Scenario 2: there is a shared, global index, where each node writes to and updates this global index (perhaps stored on NFS, etc). The relevant scenario depends on how you configure Lucene.<br>
<br>
Now the switch in Infinispan controls which node(s) in the cluster actually do the indexing whenever there is a change in data in the cluster. If you have configured Lucene to maintain non-shared indexes, then *every* node in the cache needs to update their own private index whenever there is a change in any entry, anywhere in the cluster. -Dinfinispan.query.indexLocalOnly=false will force Infinispan nodes to index changes that happen anywhere in the cluster.<br>
<br>
If the indexes are global and shared, then there is no need for each node to update the indexes. Only the node that initiated the change should update the indexes, and -Dinfinispan.query.indexLocalOnly=true will force this behaviour.<br>
<div class="im"><br>
> - So, how ISPN-200 will work on this two possibilities?<br>
<br>
</div>As for ISPN-200, this is part of what we need to think about. Ideally, the only approach that will truly scale is for each node to maintain not just shared or non-shared indexes, but a fragment of the global index. A fragment that pertains to just the data it owns. So, assume we have this setup with 4 nodes:<br>
<br>
Caches: {A, B, C, D}<br>
<br>
Keys:<br>
<br>
K1 -> {A, B}<br>
K2 -> {B, C}<br>
K3 -> {C, D}<br>
<br>
A's index would have {K1}<br>
B's index would have {K1, K2}<br>
C's index would have {K2, K3}<br>
D's index would have {K3}<br>
<br>
So if we were to write a query that matches K1, that query would be sent to every node in the cluster and the results returned would look like:<br>
<br>
A: {K1}<br>
B: {K1}<br>
C: {}<br>
D: {}<br>
<br>
Similarly, if we were to write a query that matches K1 and K2, that query would be sent to every node in the cluster and the results returned would look like:<br>
<br>
A: {K1}<br>
B: {K1, K2}<br>
C: {K2}<br>
D: {}<br>
<br>
Now the tricky part will be to efficiently collate these partial results into a proper resultset to pass back to the user, including removing duplicates, proper ranking and ordering, etc.<br>
<br>
Hope this helps!<br>
<div><div></div><div class="h5"><br>
Cheers<br>
Manik<br>
<br>
--<br>
Manik Surtani<br>
<a href="mailto:manik@jboss.org">manik@jboss.org</a><br>
Lead, Infinispan<br>
Lead, JBoss Cache<br>
<a href="http://www.infinispan.org" target="_blank">http://www.infinispan.org</a><br>
<a href="http://www.jbosscache.org" target="_blank">http://www.jbosscache.org</a><br>
<br>
<br>
<br>
<br>
<br>
_______________________________________________<br>
infinispan-dev mailing list<br>
<a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
<a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
</div></div></blockquote></div><br>