[infinispan-dev] CacheLoaders, Distribution mode and Interceptors
Mircea Markus
mmarkus at redhat.com
Tue Mar 19 07:32:56 EDT 2013
Hi James,
By specifying the LuceneCacheLoader as a loader for the default cache, it will added to both the "lucene-index" (where it is needed) and the other two caches (lucene-metadata and lucene-locks) - where I don't think it is needed. I think it should only be configured for the "lucene-index" cache and removed from the default config.
On top of that you might want to add the cluster cache loader *before* the LuceneCacheLoader, otherwise it will always be the LuceneCacheLoader that would be queried first. The config I have in mind is[1], would you mind giving it a try?
[1] https://gist.github.com/mmarkus/5195400
On 15 Mar 2013, at 16:22, James Aley wrote:
> Not sure if I've done exactly what you had in mind... here is my updated XML:
> https://www.refheap.com/paste/12601
>
> I added the loader to the lucene-index namedCache, which is the one
> I'm using for distribution.
>
> This didn't appear to change anything, as far as I can see. Still
> seeing a lot of disk IO with every request.
>
>
> James.
>
>
> On 15 March 2013 15:54, Ray Tsang <saturnism at gmail.com> wrote:
>> Can you try adding a ClusterCacheLoader to see if that helps?
>>
>> Thanks,
>>
>>
>> On Fri, Mar 15, 2013 at 8:49 AM, James Aley <james.aley at swiftkey.net> wrote:
>>>
>>> Apologies - forgot to copy list.
>>>
>>> On 15 March 2013 15:48, James Aley <james.aley at swiftkey.net> wrote:
>>>> Hey Adrian,
>>>>
>>>> Thanks for the response. I was chatting to Sanne on IRC yesterday, and
>>>> he suggested this to me. Actually the logging I attached was from a
>>>> cluster of 4 servers with numOwners=2. Sorry, I should have mentioned
>>>> this actually, but I thought seeing as it didn't appear to make any
>>>> difference that I'd just keep things simple in my previous email.
>>>>
>>>> While it seemed not to make a difference in this case... I can see why
>>>> that would make sense. In future tests I guess I should probably stick
>>>> with numOwners > 1.
>>>>
>>>>
>>>> James.
>>>>
>>>> On 15 March 2013 15:44, Adrian Nistor <anistor at redhat.com> wrote:
>>>>> Hi James,
>>>>>
>>>>> I'm not an expert on InfinispanDirectory but I've noticed in [1] that
>>>>> the
>>>>> lucene-index cache is distributed with numOwners = 1. That means each
>>>>> cache
>>>>> entry is owned by just one cluster node and there's nowhere else to go
>>>>> in
>>>>> the cluster if the key is not available in local memory, thus it needs
>>>>> fetching from the cache store. This can be solved with numOwners > 1.
>>>>> Please let me know if this solves your problem.
>>>>>
>>>>> Cheers!
>>>>>
>>>>>
>>>>> On 03/15/2013 05:03 PM, James Aley wrote:
>>>>>>
>>>>>> Hey all,
>>>>>>
>>>>>> <OT>
>>>>>> Seeing as this is my first post, I wanted to just quickly thank you
>>>>>> all for Infinispan. So far I'm really enjoying working with it - great
>>>>>> product!
>>>>>> </OT>
>>>>>>
>>>>>> I'm using the InfinispanDirectory for a Lucene project at the moment.
>>>>>> We use Lucene directly to build a search product, which has high read
>>>>>> requirements and likely very large indexes. I'm hoping to make use of
>>>>>> a distribution mode cache to keep the whole index in memory across a
>>>>>> cluster of machines (the index will be too big for one server).
>>>>>>
>>>>>> The problem I'm having is that after loading a filesystem-based Lucene
>>>>>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
>>>>>> retrieving data from the cluster - they instead look up keys in their
>>>>>> local CacheLoaders, which involves lots of disk I/O and is very slow.
>>>>>> I was hoping to just use the CacheLoader to initialize the caches, but
>>>>>> from there on read only from RAM (and network, of course). Is this
>>>>>> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>>>>>>
>>>>>> To explain my observations in a little more detail:
>>>>>> * I start a cluster of two servers, using [1] as the cache config.
>>>>>> Both have a local copy of the Lucene index that will be loaded into
>>>>>> the InfinispanDirectory via the loader. This is a test configuration,
>>>>>> where I've set numOwners=1 so that I only need two servers for
>>>>>> distribution to happen.
>>>>>> * Upon startup, things look good. I see the memory usage of the JVM
>>>>>> reflect a pretty near 50/50 split of the data across both servers.
>>>>>> Logging indicates both servers are in the cluster view, all seems
>>>>>> fine.
>>>>>> * When I send a search query to either one of the nodes, I notice the
>>>>>> following:
>>>>>> - iotop shows huge (~100MB/s) disk I/O on that node alone from the
>>>>>> JVM process.
>>>>>> - no change in network activity between nodes (~300b/s, same as
>>>>>> when
>>>>>> idle)
>>>>>> - memory usage on the node running the query increases
>>>>>> dramatically,
>>>>>> and stays higher even after the query is finished.
>>>>>>
>>>>>> So it seemed to me like each node was favouring use of the CacheLoader
>>>>>> to retrieve keys that are not in memory, instead of using the cluster.
>>>>>> Does that seem reasonable? Is this the expected behaviour?
>>>>>>
>>>>>> I started to investigate this by turning on trace logging, in this
>>>>>> made me think perhaps the cause was that the CacheLoader's interceptor
>>>>>> is higher priority in the chain than the the distribution interceptor?
>>>>>> I'm not at all familiar with the design in any level of detail - just
>>>>>> what I picked up in the last 24 hours from browsing the code, so I
>>>>>> could easily be way off. I've attached the log snippets I thought
>>>>>> relevant in [2].
>>>>>>
>>>>>> Any advice offered much appreciated.
>>>>>> Thanks!
>>>>>>
>>>>>> James.
>>>>>>
>>>>>>
>>>>>> [1] https://www.refheap.com/paste/12531
>>>>>> [2] https://www.refheap.com/paste/12543
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
Cheers,
--
Mircea Markus
Infinispan lead (www.infinispan.org)
More information about the infinispan-dev
mailing list