[infinispan-dev] CacheLoaders, Distribution mode and Interceptors

Tue Mar 19 06:32:11 EDT 2013

Hi all,

So, in my previous update it seems I had numOwners=2, but was only
using two servers. Therefore, what I was seeing made complete sense,
actually. After changing numOwners to 1, distribution appears to work
as expected with that clusterLoader added to the config as suggested.
Thanks for the help!

I'm now having other issues, seeing way more network traffic than I
can really explain, but that's another topic, which I need to
investigate more. Just wanted to let you know that I think we got to
the bottom of this one!

Thanks!
James.

On 18 March 2013 15:52, Ray Tsang <saturnism at gmail.com> wrote:
>> there should
>> be no need to use a ClusterCacheLoader,
>
> I agree. This looked consistent w/ what I saw a couple weeks ago in a
> different thread.  Use of ClusterCacheLoader didn't make sense to me
> either...
>
> On Mar 18, 2013, at 5:55, Sanne Grinovero <sanne at infinispan.org> wrote:
>
>> I'm glad you're finding a workaround for the disk IO but there should
>> be no need to use a ClusterCacheLoader,
>> the intention of that would be to be able to chain multiple grids;
>> this is a critical problem IMHO.
>>
>> Seems there are multiple other issues at hand, let me comment per bullet:
>>
>> On 18 March 2013 12:20, James Aley <james.aley at swiftkey.net> wrote:
>>> Update:
>>>
>>> I tried again - I think I misconfigured that ClusterCacheLoader on my
>>> last attempt. With this configuration [1] it actually appears to be
>>> loading keys over the network from the peer node. I'm seeing a lot of
>>> network IO between the nodes when requesting from either one of them
>>> (30-50 MBp/s), and considerably less disk I/O than previously, though
>>> still not negligible.
>>>
>>> I think, however, that both nodes are holding on to any data they
>>> retrieve from the other node. Is this possible? The reason I think
>>> this is the case:
>>> * I have a fairly large test index on disk, which the Lucene
>>> CacheLoader loads into memory as soon as the cache is created. It's
>>> about a 12GB index, and after a flurry of disk activity when they
>>> processes start, I see about 5-6GB of heap usage on each node -- all
>>> seems good.
>>
>> Agree: that phase looks good.
>>
>>> * When I send requests now (with this ClusterCacheLoader
>>> configuration linked below), I see network activity between nodes,
>>> plus some disk I/O.
>>
>> I would not expect any more disk I/O to happen anymore at this point.
>> Only case I think a disk event could be triggered is if the Lucene logic
>> would attempt to load a non-existing key, like if there was a function
>> attempting either:
>> - to check if a file exists (disregarding the directory listing)
>> - to load a data range out of the expected boundaries
>>
>> The reason for me to think that is that Infinispan is not "caching" null
>> entries: we plan tombstones for 6.0 but for now if an entry doesn't exist
>> it won't "remember" the entry is null and will try to look it up again from
>> the usual places, so including the CacheLoader.
>>
>> I've opened ISPN-2932 to inspect this and add tests to cover it.
>>
>>> * After each query, each node grows in heap usage considerably.
>>> Eventually they'll both be using about 11GB of RAM.
>>
>> Permanently even after you close the IndexReader?
>>
>> I'm thinking I'm afraid this Directory is not suited for you as it is expected
>> you can fit the whole index in a single JVM: the IndexReader might
>> request (and keep references) to all segments; in most cases it will
>> work on a subset of segments so it could work for you but in case you
>> need to iterate it all you might need to use a custom Collector and play
>> with Infinispan custom commands; I can give you some pointers as we
>> have examples of an (experimental) distributed Query execution in the
>> infinispan query module, or I think we could play into combining
>> Map/Reduce with index analysis. (In other words: send the computation
>> to the data rather than downloading half of the index to the local JVM).
>>
>> But if this is not leaking because of the IndexReader usage, then it's a
>> leak we need to fix.
>>
>>> * At the point where both nodes have lots of data in RAM, the network
>>> I/O has dropped hugely to ~100k/s
>>
>> Almost looks like you have L1 enabled? Could you check that?
>> Or the IndexReader is buffering.
>>
>>> * If I repeat an identical query to either node, the response is
>>> instant - O(10ms)
>>
>> Well that would be good if only we could know why :)
>> But this is not an option for you right? I mean you can't load all the
>> production data in a single JVM?
>>
>> Sanne
>>
>>
>>>
>>> I don't know if this is because they're lazily loading entries from
>>> disk despite the preload=true setting (and the index just takes up far
>>> more RAM when loaded as a Cache like this?), or if it's because
>>> they're locally caching entries that should (by the consistent hash
>>> and numOwners configuration, at least) only live in the remote node?
>>>
>>> Thanks!
>>> James.
>>>
>>> [1] https://www.refheap.com/paste/12685
>>>
>>>
>>> On 16 March 2013 01:19, Sanne Grinovero <sanne at infinispan.org> wrote:
>>>> Hi Adrian,
>>>> let's forget about Lucene details and focus on DIST.
>>>> With numOwners=1 and having two nodes the entries should be stored
>>>> roughly 50% on each node, I see nothing wrong with that
>>>> considering you don't need data failover in a read-only use case
>>>> having all the index available in the shared CacheLoader.
>>>>
>>>> In such a scenario, and having both nodes preloaded all data, in case
>>>> of a get() operation I would expect
>>>> either:
>>>> A) to be the owner, hence retrieve the value from local in-JVM reference
>>>> B) to not be the owner, so to forward the request to the other node
>>>> having roughly 50% chance per key to be in case A or B.
>>>>
>>>> But when hitting case B) it seems that instead of loading from the
>>>> other node, it hits the CacheLoader to fetch the value.
>>>>
>>>> I already had asked James to verify with 4 nodes and numOwners=2, the
>>>> result is the same so I suggested him to ask here;
>>>> BTW I think numOwners=1 is perfectly valid and should work as with
>>>> numOwners=1, the only reason I asked him to repeat
>>>> the test is that we don't have much tests on the numOwners=1 case and
>>>> I was assuming there might be some (wrong) assumptions
>>>> affecting this.
>>>>
>>>> Note that this is not "just" a critical performance problem but I'm
>>>> also suspecting it could provide inconsistent reads, in two classes of
>>>> problems:
>>>>
>>>> # non-shared CacheStore with stale entries
>>>> If for non-owned keys it will hit the local CacheStore first, where
>>>> you might expect to not find anything, so to forward the request to
>>>> the right node. What if this node has been the owner in the past? It
>>>> might have an old entry locally stored, which would be returned
>>>> instead of the correct value which is owned on a different node.
>>>>
>>>> # shared CacheStore using write-behind
>>>> When using an async CacheStore by definition the content of the
>>>> CacheStore is not trustworthy if you don't check on the owner first
>>>> for entries in memory.
>>>>
>>>> Both seem critical to me, but the performance impact is really bad too.
>>>>
>>>> I hoped to make some more tests myself but couldn't look at this yet,
>>>> any help from the core team would be appreciated.
>>>>
>>>> @Ray, thanks for mentioning the ClusterCacheLoader. Wasn't there
>>>> someone else with a CacheLoader issue recently who had worked around
>>>> the problem by using a ClusterCacheLoader ?
>>>> Do you remember what the scenario was?
>>>>
>>>> Cheers,
>>>> Sanne
>>>>
>>>> On 15 March 2013 15:44, Adrian Nistor <anistor at redhat.com> wrote:
>>>>> Hi James,
>>>>>
>>>>> I'm not an expert on InfinispanDirectory but I've noticed in [1] that
>>>>> the lucene-index cache is distributed with numOwners = 1. That means
>>>>> each cache entry is owned by just one cluster node and there's nowhere
>>>>> else to go in the cluster if the key is not available in local memory,
>>>>> thus it needs fetching from the cache store. This can be solved with
>>>>> numOwners > 1.
>>>>> Please let me know if this solves your problem.
>>>>>
>>>>> Cheers!
>>>>>
>>>>> On 03/15/2013 05:03 PM, James Aley wrote:
>>>>>> Hey all,
>>>>>>
>>>>>> <OT>
>>>>>> Seeing as this is my first post, I wanted to just quickly thank you
>>>>>> all for Infinispan. So far I'm really enjoying working with it - great
>>>>>> product!
>>>>>> </OT>
>>>>>>
>>>>>> I'm using the InfinispanDirectory for a Lucene project at the moment.
>>>>>> We use Lucene directly to build a search product, which has high read
>>>>>> requirements and likely very large indexes. I'm hoping to make use of
>>>>>> a distribution mode cache to keep the whole index in memory across a
>>>>>> cluster of machines (the index will be too big for one server).
>>>>>>
>>>>>> The problem I'm having is that after loading a filesystem-based Lucene
>>>>>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
>>>>>> retrieving data from the cluster - they instead look up keys in their
>>>>>> local CacheLoaders, which involves lots of disk I/O and is very slow.
>>>>>> I was hoping to just use the CacheLoader to initialize the caches, but
>>>>>> from there on read only from RAM (and network, of course). Is this
>>>>>> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>>>>>>
>>>>>> To explain my observations in a little more detail:
>>>>>> * I start a cluster of two servers, using [1] as the cache config.
>>>>>> Both have a local copy of the Lucene index that will be loaded into
>>>>>> the InfinispanDirectory via the loader. This is a test configuration,
>>>>>> where I've set numOwners=1 so that I only need two servers for
>>>>>> distribution to happen.
>>>>>> * Upon startup, things look good. I see the memory usage of the JVM
>>>>>> reflect a pretty near 50/50 split of the data across both servers.
>>>>>> Logging indicates both servers are in the cluster view, all seems
>>>>>> fine.
>>>>>> * When I send a search query to either one of the nodes, I notice the following:
>>>>>>   - iotop shows huge (~100MB/s) disk I/O on that node alone from the
>>>>>> JVM process.
>>>>>>   - no change in network activity between nodes (~300b/s, same as when idle)
>>>>>>   - memory usage on the node running the query increases dramatically,
>>>>>> and stays higher even after the query is finished.
>>>>>>
>>>>>> So it seemed to me like each node was favouring use of the CacheLoader
>>>>>> to retrieve keys that are not in memory, instead of using the cluster.
>>>>>> Does that seem reasonable? Is this the expected behaviour?
>>>>>>
>>>>>> I started to investigate this by turning on trace logging, in this
>>>>>> made me think perhaps the cause was that the CacheLoader's interceptor
>>>>>> is higher priority in the chain than the the distribution interceptor?
>>>>>> I'm not at all familiar with the design in any level of detail - just
>>>>>> what I picked up in the last 24 hours from browsing the code, so I
>>>>>> could easily be way off. I've attached the log snippets I thought
>>>>>> relevant in [2].
>>>>>>
>>>>>> Any advice offered much appreciated.
>>>>>> Thanks!
>>>>>>
>>>>>> James.
>>>>>>
>>>>>>
>>>>>> [1] https://www.refheap.com/paste/12531
>>>>>> [2] https://www.refheap.com/paste/12543
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev