Can you try adding a ClusterCacheLoader to see if that helps?<div><br></div><div>Thanks,<br><br><div class="gmail_quote">On Fri, Mar 15, 2013 at 8:49 AM, James Aley <span dir="ltr"><<a href="mailto:james.aley@swiftkey.net" target="_blank">james.aley@swiftkey.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Apologies - forgot to copy list.<br>
<br>
On 15 March 2013 15:48, James Aley <<a href="mailto:james.aley@swiftkey.net">james.aley@swiftkey.net</a>> wrote:<br>
> Hey Adrian,<br>
><br>
> Thanks for the response. I was chatting to Sanne on IRC yesterday, and<br>
> he suggested this to me. Actually the logging I attached was from a<br>
> cluster of 4 servers with numOwners=2. Sorry, I should have mentioned<br>
> this actually, but I thought seeing as it didn't appear to make any<br>
> difference that I'd just keep things simple in my previous email.<br>
><br>
> While it seemed not to make a difference in this case... I can see why<br>
> that would make sense. In future tests I guess I should probably stick<br>
> with numOwners > 1.<br>
><br>
><br>
> James.<br>
<div class="HOEnZb"><div class="h5">><br>
> On 15 March 2013 15:44, Adrian Nistor <<a href="mailto:anistor@redhat.com">anistor@redhat.com</a>> wrote:<br>
>> Hi James,<br>
>><br>
>> I'm not an expert on InfinispanDirectory but I've noticed in [1] that the<br>
>> lucene-index cache is distributed with numOwners = 1. That means each cache<br>
>> entry is owned by just one cluster node and there's nowhere else to go in<br>
>> the cluster if the key is not available in local memory, thus it needs<br>
>> fetching from the cache store. This can be solved with numOwners > 1.<br>
>> Please let me know if this solves your problem.<br>
>><br>
>> Cheers!<br>
>><br>
>><br>
>> On 03/15/2013 05:03 PM, James Aley wrote:<br>
>>><br>
>>> Hey all,<br>
>>><br>
>>> <OT><br>
>>> Seeing as this is my first post, I wanted to just quickly thank you<br>
>>> all for Infinispan. So far I'm really enjoying working with it - great<br>
>>> product!<br>
>>> </OT><br>
>>><br>
>>> I'm using the InfinispanDirectory for a Lucene project at the moment.<br>
>>> We use Lucene directly to build a search product, which has high read<br>
>>> requirements and likely very large indexes. I'm hoping to make use of<br>
>>> a distribution mode cache to keep the whole index in memory across a<br>
>>> cluster of machines (the index will be too big for one server).<br>
>>><br>
>>> The problem I'm having is that after loading a filesystem-based Lucene<br>
>>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are<br>
>>> retrieving data from the cluster - they instead look up keys in their<br>
>>> local CacheLoaders, which involves lots of disk I/O and is very slow.<br>
>>> I was hoping to just use the CacheLoader to initialize the caches, but<br>
>>> from there on read only from RAM (and network, of course). Is this<br>
>>> supported? Maybe I've misunderstood the purpose of the CacheLoader?<br>
>>><br>
>>> To explain my observations in a little more detail:<br>
>>> * I start a cluster of two servers, using [1] as the cache config.<br>
>>> Both have a local copy of the Lucene index that will be loaded into<br>
>>> the InfinispanDirectory via the loader. This is a test configuration,<br>
>>> where I've set numOwners=1 so that I only need two servers for<br>
>>> distribution to happen.<br>
>>> * Upon startup, things look good. I see the memory usage of the JVM<br>
>>> reflect a pretty near 50/50 split of the data across both servers.<br>
>>> Logging indicates both servers are in the cluster view, all seems<br>
>>> fine.<br>
>>> * When I send a search query to either one of the nodes, I notice the<br>
>>> following:<br>
>>> - iotop shows huge (~100MB/s) disk I/O on that node alone from the<br>
>>> JVM process.<br>
>>> - no change in network activity between nodes (~300b/s, same as when<br>
>>> idle)<br>
>>> - memory usage on the node running the query increases dramatically,<br>
>>> and stays higher even after the query is finished.<br>
>>><br>
>>> So it seemed to me like each node was favouring use of the CacheLoader<br>
>>> to retrieve keys that are not in memory, instead of using the cluster.<br>
>>> Does that seem reasonable? Is this the expected behaviour?<br>
>>><br>
>>> I started to investigate this by turning on trace logging, in this<br>
>>> made me think perhaps the cause was that the CacheLoader's interceptor<br>
>>> is higher priority in the chain than the the distribution interceptor?<br>
>>> I'm not at all familiar with the design in any level of detail - just<br>
>>> what I picked up in the last 24 hours from browsing the code, so I<br>
>>> could easily be way off. I've attached the log snippets I thought<br>
>>> relevant in [2].<br>
>>><br>
>>> Any advice offered much appreciated.<br>
>>> Thanks!<br>
>>><br>
>>> James.<br>
>>><br>
>>><br>
>>> [1] <a href="https://www.refheap.com/paste/12531" target="_blank">https://www.refheap.com/paste/12531</a><br>
>>> [2] <a href="https://www.refheap.com/paste/12543" target="_blank">https://www.refheap.com/paste/12543</a><br>
>>> _______________________________________________<br>
>>> infinispan-dev mailing list<br>
>>> <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
>>> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
>><br>
>><br>
_______________________________________________<br>
infinispan-dev mailing list<br>
<a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
<a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
</div></div></blockquote></div><br></div>