[infinispan-dev] CacheLoaders, Distribution mode and Interceptors

Fri Mar 15 12:22:38 EDT 2013

Not sure if I've done exactly what you had in mind... here is my updated XML:
https://www.refheap.com/paste/12601

I added the loader to the lucene-index namedCache, which is the one
I'm using for distribution.

This didn't appear to change anything, as far as I can see. Still
seeing a lot of disk IO with every request.

James.

On 15 March 2013 15:54, Ray Tsang <saturnism at gmail.com> wrote:
> Can you try adding a ClusterCacheLoader to see if that helps?
>
> Thanks,
>
>
> On Fri, Mar 15, 2013 at 8:49 AM, James Aley <james.aley at swiftkey.net> wrote:
>>
>> Apologies - forgot to copy list.
>>
>> On 15 March 2013 15:48, James Aley <james.aley at swiftkey.net> wrote:
>> > Hey Adrian,
>> >
>> > Thanks for the response. I was chatting to Sanne on IRC yesterday, and
>> > he suggested this to me. Actually the logging I attached was from a
>> > cluster of 4 servers with numOwners=2. Sorry, I should have mentioned
>> > this actually, but I thought seeing as it didn't appear to make any
>> > difference that I'd just keep things simple in my previous email.
>> >
>> > While it seemed not to make a difference in this case... I can see why
>> > that would make sense. In future tests I guess I should probably stick
>> > with numOwners > 1.
>> >
>> >
>> > James.
>> >
>> > On 15 March 2013 15:44, Adrian Nistor <anistor at redhat.com> wrote:
>> >> Hi James,
>> >>
>> >> I'm not an expert on InfinispanDirectory but I've noticed in [1] that
>> >> the
>> >> lucene-index cache is distributed with numOwners = 1. That means each
>> >> cache
>> >> entry is owned by just one cluster node and there's nowhere else to go
>> >> in
>> >> the cluster if the key is not available in local memory, thus it needs
>> >> fetching from the cache store. This can be solved with numOwners > 1.
>> >> Please let me know if this solves your problem.
>> >>
>> >> Cheers!
>> >>
>> >>
>> >> On 03/15/2013 05:03 PM, James Aley wrote:
>> >>>
>> >>> Hey all,
>> >>>
>> >>> <OT>
>> >>> Seeing as this is my first post, I wanted to just quickly thank you
>> >>> all for Infinispan. So far I'm really enjoying working with it - great
>> >>> product!
>> >>> </OT>
>> >>>
>> >>> I'm using the InfinispanDirectory for a Lucene project at the moment.
>> >>> We use Lucene directly to build a search product, which has high read
>> >>> requirements and likely very large indexes. I'm hoping to make use of
>> >>> a distribution mode cache to keep the whole index in memory across a
>> >>> cluster of machines (the index will be too big for one server).
>> >>>
>> >>> The problem I'm having is that after loading a filesystem-based Lucene
>> >>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
>> >>> retrieving data from the cluster - they instead look up keys in their
>> >>> local CacheLoaders, which involves lots of disk I/O and is very slow.
>> >>> I was hoping to just use the CacheLoader to initialize the caches, but
>> >>> from there on read only from RAM (and network, of course). Is this
>> >>> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>> >>>
>> >>> To explain my observations in a little more detail:
>> >>> * I start a cluster of two servers, using [1] as the cache config.
>> >>> Both have a local copy of the Lucene index that will be loaded into
>> >>> the InfinispanDirectory via the loader. This is a test configuration,
>> >>> where I've set numOwners=1 so that I only need two servers for
>> >>> distribution to happen.
>> >>> * Upon startup, things look good. I see the memory usage of the JVM
>> >>> reflect a pretty near 50/50 split of the data across both servers.
>> >>> Logging indicates both servers are in the cluster view, all seems
>> >>> fine.
>> >>> * When I send a search query to either one of the nodes, I notice the
>> >>> following:
>> >>>    - iotop shows huge (~100MB/s) disk I/O on that node alone from the
>> >>> JVM process.
>> >>>    - no change in network activity between nodes (~300b/s, same as
>> >>> when
>> >>> idle)
>> >>>    - memory usage on the node running the query increases
>> >>> dramatically,
>> >>> and stays higher even after the query is finished.
>> >>>
>> >>> So it seemed to me like each node was favouring use of the CacheLoader
>> >>> to retrieve keys that are not in memory, instead of using the cluster.
>> >>> Does that seem reasonable? Is this the expected behaviour?
>> >>>
>> >>> I started to investigate this by turning on trace logging, in this
>> >>> made me think perhaps the cause was that the CacheLoader's interceptor
>> >>> is higher priority in the chain than the the distribution interceptor?
>> >>> I'm not at all familiar with the design in any level of detail - just
>> >>> what I picked up in the last 24 hours from browsing the code, so I
>> >>> could easily be way off. I've attached the log snippets I thought
>> >>> relevant in [2].
>> >>>
>> >>> Any advice offered much appreciated.
>> >>> Thanks!
>> >>>
>> >>> James.
>> >>>
>> >>>
>> >>> [1] https://www.refheap.com/paste/12531
>> >>> [2] https://www.refheap.com/paste/12543
>> >>> _______________________________________________
>> >>> infinispan-dev mailing list
>> >>> infinispan-dev at lists.jboss.org
>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>
>> >>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev