Was the cache loader shared?  Which cache loader were you using?<br><br><div class="gmail_quote">On Fri, Mar 15, 2013 at 8:03 AM, James Aley <span dir="ltr">&lt;<a href="mailto:james.aley@swiftkey.net" target="_blank">james.aley@swiftkey.net</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hey all,<br>

<br>

&lt;OT&gt;<br>

Seeing as this is my first post, I wanted to just quickly thank you<br>

all for Infinispan. So far I&#39;m really enjoying working with it - great<br>

product!<br>

&lt;/OT&gt;<br>

<br>

I&#39;m using the InfinispanDirectory for a Lucene project at the moment.<br>

We use Lucene directly to build a search product, which has high read<br>

requirements and likely very large indexes. I&#39;m hoping to make use of<br>

a distribution mode cache to keep the whole index in memory across a<br>

cluster of machines (the index will be too big for one server).<br>

<br>

The problem I&#39;m having is that after loading a filesystem-based Lucene<br>

directory into InfinispanDirectory via LuceneCacheLoader, no nodes are<br>

retrieving data from the cluster - they instead look up keys in their<br>

local CacheLoaders, which involves lots of disk I/O and is very slow.<br>

I was hoping to just use the CacheLoader to initialize the caches, but<br>

from there on read only from RAM (and network, of course). Is this<br>

supported? Maybe I&#39;ve misunderstood the purpose of the CacheLoader?<br>

<br>

To explain my observations in a little more detail:<br>

* I start a cluster of two servers, using [1] as the cache config.<br>

Both have a local copy of the Lucene index that will be loaded into<br>

the InfinispanDirectory via the loader. This is a test configuration,<br>

where I&#39;ve set numOwners=1 so that I only need two servers for<br>

distribution to happen.<br>

* Upon startup, things look good. I see the memory usage of the JVM<br>

reflect a pretty near 50/50 split of the data across both servers.<br>

Logging indicates both servers are in the cluster view, all seems<br>

fine.<br>

* When I send a search query to either one of the nodes, I notice the following:<br>

  - iotop shows huge (~100MB/s) disk I/O on that node alone from the<br>

JVM process.<br>

  - no change in network activity between nodes (~300b/s, same as when idle)<br>

  - memory usage on the node running the query increases dramatically,<br>

and stays higher even after the query is finished.<br>

<br>

So it seemed to me like each node was favouring use of the CacheLoader<br>

to retrieve keys that are not in memory, instead of using the cluster.<br>

Does that seem reasonable? Is this the expected behaviour?<br>

<br>

I started to investigate this by turning on trace logging, in this<br>

made me think perhaps the cause was that the CacheLoader&#39;s interceptor<br>

is higher priority in the chain than the the distribution interceptor?<br>

I&#39;m not at all familiar with the design in any level of detail - just<br>

what I picked up in the last 24 hours from browsing the code, so I<br>

could easily be way off. I&#39;ve attached the log snippets I thought<br>

relevant in [2].<br>

<br>

Any advice offered much appreciated.<br>

Thanks!<br>

<br>

James.<br>

<br>

<br>

[1] <a href="https://www.refheap.com/paste/12531" target="_blank">https://www.refheap.com/paste/12531</a><br>

[2] <a href="https://www.refheap.com/paste/12543" target="_blank">https://www.refheap.com/paste/12543</a><br>

_______________________________________________<br>

infinispan-dev mailing list<br>

<a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>

<a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

</blockquote></div><br>