[infinispan-dev] CacheLoaders, Distribution mode and Interceptors

Fri Mar 15 11:31:47 EDT 2013

Was the cache loader shared?  Which cache loader were you using?

On Fri, Mar 15, 2013 at 8:03 AM, James Aley <james.aley at swiftkey.net> wrote:

> Hey all,
>
> <OT>
> Seeing as this is my first post, I wanted to just quickly thank you
> all for Infinispan. So far I'm really enjoying working with it - great
> product!
> </OT>
>
> I'm using the InfinispanDirectory for a Lucene project at the moment.
> We use Lucene directly to build a search product, which has high read
> requirements and likely very large indexes. I'm hoping to make use of
> a distribution mode cache to keep the whole index in memory across a
> cluster of machines (the index will be too big for one server).
>
> The problem I'm having is that after loading a filesystem-based Lucene
> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
> retrieving data from the cluster - they instead look up keys in their
> local CacheLoaders, which involves lots of disk I/O and is very slow.
> I was hoping to just use the CacheLoader to initialize the caches, but
> from there on read only from RAM (and network, of course). Is this
> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>
> To explain my observations in a little more detail:
> * I start a cluster of two servers, using [1] as the cache config.
> Both have a local copy of the Lucene index that will be loaded into
> the InfinispanDirectory via the loader. This is a test configuration,
> where I've set numOwners=1 so that I only need two servers for
> distribution to happen.
> * Upon startup, things look good. I see the memory usage of the JVM
> reflect a pretty near 50/50 split of the data across both servers.
> Logging indicates both servers are in the cluster view, all seems
> fine.
> * When I send a search query to either one of the nodes, I notice the
> following:
>   - iotop shows huge (~100MB/s) disk I/O on that node alone from the
> JVM process.
>   - no change in network activity between nodes (~300b/s, same as when
> idle)
>   - memory usage on the node running the query increases dramatically,
> and stays higher even after the query is finished.
>
> So it seemed to me like each node was favouring use of the CacheLoader
> to retrieve keys that are not in memory, instead of using the cluster.
> Does that seem reasonable? Is this the expected behaviour?
>
> I started to investigate this by turning on trace logging, in this
> made me think perhaps the cause was that the CacheLoader's interceptor
> is higher priority in the chain than the the distribution interceptor?
> I'm not at all familiar with the design in any level of detail - just
> what I picked up in the last 24 hours from browsing the code, so I
> could easily be way off. I've attached the log snippets I thought
> relevant in [2].
>
> Any advice offered much appreciated.
> Thanks!
>
> James.
>
>
> [1] https://www.refheap.com/paste/12531
> [2] https://www.refheap.com/paste/12543
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20130315/e8399435/attachment.html