[infinispan-dev] CacheLoaders, Distribution mode and Interceptors

Mon Mar 18 11:52:12 EDT 2013

> there should
> be no need to use a ClusterCacheLoader,

I agree. This looked consistent w/ what I saw a couple weeks ago in a
different thread.  Use of ClusterCacheLoader didn't make sense to me
either...

On Mar 18, 2013, at 5:55, Sanne Grinovero <sanne at infinispan.org> wrote:

> I'm glad you're finding a workaround for the disk IO but there should
> be no need to use a ClusterCacheLoader,
> the intention of that would be to be able to chain multiple grids;
> this is a critical problem IMHO.
>
> Seems there are multiple other issues at hand, let me comment per bullet:
>
> On 18 March 2013 12:20, James Aley <james.aley at swiftkey.net> wrote:
>> Update:
>>
>> I tried again - I think I misconfigured that ClusterCacheLoader on my
>> last attempt. With this configuration [1] it actually appears to be
>> loading keys over the network from the peer node. I'm seeing a lot of
>> network IO between the nodes when requesting from either one of them
>> (30-50 MBp/s), and considerably less disk I/O than previously, though
>> still not negligible.
>>
>> I think, however, that both nodes are holding on to any data they
>> retrieve from the other node. Is this possible? The reason I think
>> this is the case:
>> * I have a fairly large test index on disk, which the Lucene
>> CacheLoader loads into memory as soon as the cache is created. It's
>> about a 12GB index, and after a flurry of disk activity when they
>> processes start, I see about 5-6GB of heap usage on each node -- all
>> seems good.
>
> Agree: that phase looks good.
>
>> * When I send requests now (with this ClusterCacheLoader
>> configuration linked below), I see network activity between nodes,
>> plus some disk I/O.
>
> I would not expect any more disk I/O to happen anymore at this point.
> Only case I think a disk event could be triggered is if the Lucene logic
> would attempt to load a non-existing key, like if there was a function
> attempting either:
> - to check if a file exists (disregarding the directory listing)
> - to load a data range out of the expected boundaries
>
> The reason for me to think that is that Infinispan is not "caching" null
> entries: we plan tombstones for 6.0 but for now if an entry doesn't exist
> it won't "remember" the entry is null and will try to look it up again from
> the usual places, so including the CacheLoader.
>
> I've opened ISPN-2932 to inspect this and add tests to cover it.
>
>> * After each query, each node grows in heap usage considerably.
>> Eventually they'll both be using about 11GB of RAM.
>
> Permanently even after you close the IndexReader?
>
> I'm thinking I'm afraid this Directory is not suited for you as it is expected
> you can fit the whole index in a single JVM: the IndexReader might
> request (and keep references) to all segments; in most cases it will
> work on a subset of segments so it could work for you but in case you
> need to iterate it all you might need to use a custom Collector and play
> with Infinispan custom commands; I can give you some pointers as we
> have examples of an (experimental) distributed Query execution in the
> infinispan query module, or I think we could play into combining
> Map/Reduce with index analysis. (In other words: send the computation
> to the data rather than downloading half of the index to the local JVM).
>
> But if this is not leaking because of the IndexReader usage, then it's a
> leak we need to fix.
>
>> * At the point where both nodes have lots of data in RAM, the network
>> I/O has dropped hugely to ~100k/s
>
> Almost looks like you have L1 enabled? Could you check that?
> Or the IndexReader is buffering.
>
>> * If I repeat an identical query to either node, the response is
>> instant - O(10ms)
>
> Well that would be good if only we could know why :)
> But this is not an option for you right? I mean you can't load all the
> production data in a single JVM?
>
> Sanne
>
>
>>
>> I don't know if this is because they're lazily loading entries from
>> disk despite the preload=true setting (and the index just takes up far
>> more RAM when loaded as a Cache like this?), or if it's because
>> they're locally caching entries that should (by the consistent hash
>> and numOwners configuration, at least) only live in the remote node?
>>
>> Thanks!
>> James.
>>
>> [1] https://www.refheap.com/paste/12685
>>
>>
>> On 16 March 2013 01:19, Sanne Grinovero <sanne at infinispan.org> wrote:
>>> Hi Adrian,
>>> let's forget about Lucene details and focus on DIST.
>>> With numOwners=1 and having two nodes the entries should be stored
>>> roughly 50% on each node, I see nothing wrong with that
>>> considering you don't need data failover in a read-only use case
>>> having all the index available in the shared CacheLoader.
>>>
>>> In such a scenario, and having both nodes preloaded all data, in case
>>> of a get() operation I would expect
>>> either:
>>> A) to be the owner, hence retrieve the value from local in-JVM reference
>>> B) to not be the owner, so to forward the request to the other node
>>> having roughly 50% chance per key to be in case A or B.
>>>
>>> But when hitting case B) it seems that instead of loading from the
>>> other node, it hits the CacheLoader to fetch the value.
>>>
>>> I already had asked James to verify with 4 nodes and numOwners=2, the
>>> result is the same so I suggested him to ask here;
>>> BTW I think numOwners=1 is perfectly valid and should work as with
>>> numOwners=1, the only reason I asked him to repeat
>>> the test is that we don't have much tests on the numOwners=1 case and
>>> I was assuming there might be some (wrong) assumptions
>>> affecting this.
>>>
>>> Note that this is not "just" a critical performance problem but I'm
>>> also suspecting it could provide inconsistent reads, in two classes of
>>> problems:
>>>
>>> # non-shared CacheStore with stale entries
>>> If for non-owned keys it will hit the local CacheStore first, where
>>> you might expect to not find anything, so to forward the request to
>>> the right node. What if this node has been the owner in the past? It
>>> might have an old entry locally stored, which would be returned
>>> instead of the correct value which is owned on a different node.
>>>
>>> # shared CacheStore using write-behind
>>> When using an async CacheStore by definition the content of the
>>> CacheStore is not trustworthy if you don't check on the owner first
>>> for entries in memory.
>>>
>>> Both seem critical to me, but the performance impact is really bad too.
>>>
>>> I hoped to make some more tests myself but couldn't look at this yet,
>>> any help from the core team would be appreciated.
>>>
>>> @Ray, thanks for mentioning the ClusterCacheLoader. Wasn't there
>>> someone else with a CacheLoader issue recently who had worked around
>>> the problem by using a ClusterCacheLoader ?
>>> Do you remember what the scenario was?
>>>
>>> Cheers,
>>> Sanne
>>>
>>> On 15 March 2013 15:44, Adrian Nistor <anistor at redhat.com> wrote:
>>>> Hi James,
>>>>
>>>> I'm not an expert on InfinispanDirectory but I've noticed in [1] that
>>>> the lucene-index cache is distributed with numOwners = 1. That means
>>>> each cache entry is owned by just one cluster node and there's nowhere
>>>> else to go in the cluster if the key is not available in local memory,
>>>> thus it needs fetching from the cache store. This can be solved with
>>>> numOwners > 1.
>>>> Please let me know if this solves your problem.
>>>>
>>>> Cheers!
>>>>
>>>> On 03/15/2013 05:03 PM, James Aley wrote:
>>>>> Hey all,
>>>>>
>>>>> <OT>
>>>>> Seeing as this is my first post, I wanted to just quickly thank you
>>>>> all for Infinispan. So far I'm really enjoying working with it - great
>>>>> product!
>>>>> </OT>
>>>>>
>>>>> I'm using the InfinispanDirectory for a Lucene project at the moment.
>>>>> We use Lucene directly to build a search product, which has high read
>>>>> requirements and likely very large indexes. I'm hoping to make use of
>>>>> a distribution mode cache to keep the whole index in memory across a
>>>>> cluster of machines (the index will be too big for one server).
>>>>>
>>>>> The problem I'm having is that after loading a filesystem-based Lucene
>>>>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
>>>>> retrieving data from the cluster - they instead look up keys in their
>>>>> local CacheLoaders, which involves lots of disk I/O and is very slow.
>>>>> I was hoping to just use the CacheLoader to initialize the caches, but
>>>>> from there on read only from RAM (and network, of course). Is this
>>>>> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>>>>>
>>>>> To explain my observations in a little more detail:
>>>>> * I start a cluster of two servers, using [1] as the cache config.
>>>>> Both have a local copy of the Lucene index that will be loaded into
>>>>> the InfinispanDirectory via the loader. This is a test configuration,
>>>>> where I've set numOwners=1 so that I only need two servers for
>>>>> distribution to happen.
>>>>> * Upon startup, things look good. I see the memory usage of the JVM
>>>>> reflect a pretty near 50/50 split of the data across both servers.
>>>>> Logging indicates both servers are in the cluster view, all seems
>>>>> fine.
>>>>> * When I send a search query to either one of the nodes, I notice the following:
>>>>>   - iotop shows huge (~100MB/s) disk I/O on that node alone from the
>>>>> JVM process.
>>>>>   - no change in network activity between nodes (~300b/s, same as when idle)
>>>>>   - memory usage on the node running the query increases dramatically,
>>>>> and stays higher even after the query is finished.
>>>>>
>>>>> So it seemed to me like each node was favouring use of the CacheLoader
>>>>> to retrieve keys that are not in memory, instead of using the cluster.
>>>>> Does that seem reasonable? Is this the expected behaviour?
>>>>>
>>>>> I started to investigate this by turning on trace logging, in this
>>>>> made me think perhaps the cause was that the CacheLoader's interceptor
>>>>> is higher priority in the chain than the the distribution interceptor?
>>>>> I'm not at all familiar with the design in any level of detail - just
>>>>> what I picked up in the last 24 hours from browsing the code, so I
>>>>> could easily be way off. I've attached the log snippets I thought
>>>>> relevant in [2].
>>>>>
>>>>> Any advice offered much appreciated.
>>>>> Thanks!
>>>>>
>>>>> James.
>>>>>
>>>>>
>>>>> [1] https://www.refheap.com/paste/12531
>>>>> [2] https://www.refheap.com/paste/12543
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev