[infinispan-issues] [JBoss JIRA] (ISPN-761) Cache.keySet(), entrySet(), values(), size() ignore contents of cache loader

Sanne Grinovero (JIRA) jira-events at lists.jboss.org
Thu Jul 11 11:51:21 EDT 2013


    [ https://issues.jboss.org/browse/ISPN-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789175#comment-12789175 ] 

Sanne Grinovero commented on ISPN-761:
--------------------------------------

We discussed this on IRC:
{quote}
wburns: mmarkus, anistor, dberindei, galderz, pruivo, ttarant: sannegrinovero: was looking for input on ISPN-761
wburns: mmarkus brought up a point about non passivated caches as well - and this led me to find an issue with how size would work currently
wburns: basically you have a non passivated cache that has a user use the SKIP_CACHE_STORE flag etc. could cause inconsistencies with size method as the cache store is no longer a superset of the memory store
sannegrinovero: wburns: IMHO #size() should be deleted.
wburns: which would require reading all keys in memory and in the cache store to correctly determine size
sannegrinovero: wburns: that might not happen in the near future, but just saying that I wouldn't spend too much time on it.
wburns: sannegrinovero: 1 thought we had was not to worry about this edge case since it is rarely used Flag outside of internal infinispan
sannegrinovero: wburns: no, size() alreasy is documented that it only returns the size in the (local) data container, so I don't think it should load from the CacheStore at all.
wburns: sannegrinovero: the Jira 761 is to change that and as galderz pointed out I updated the javadoc on Cache
sannegrinovero: wburns: SKIP_CACHE_STORE? no it's super useful for client apps as well. Quite critical actually.
wburns: for write operations for a new key that is
sannegrinovero: wburns: what's the reason to change that? -1 from me.
wburns: sannegrinovero: it is what people want I guess :/
sannegrinovero: wburns: we can't listen to all, it's primarily important that the API stays consistent and clear.
mmarkus: I think size should also include the cache store
sannegrinovero: wburns: did you read the comments on the issue? It was moved to 6.0 so that it could be implemented because, as Manik states there, it requires changes in the CacheStore SPI to be implemented afficiently.
sannegrinovero: efficiently.
sannegrinovero: mmarkus: ^
mmarkus: if people want to ignore it they can always skip it for persistence
mmarkus: sannegrinovero: I'm not aware of any use cases that skip store *writes* with flags, any example?
sannegrinovero: mmarkus: these methods are unreliable anyway as they return just the view of the single node, people shouldn't use them at all or just in statistics (knowing the limitations)
sannegrinovero: mmarkus: there are dozens of examples of that in the Lucene Directory module.
sannegrinovero: mmarkus: I made the flag ;)
wburns: sannegrinovero: so you are the culprit :P jk
mmarkus: sannegrinovero: well directory module is kindof internal though :-)
sannegrinovero: mmarkus: true, but it also is an example of an application dealing with the Infinispan API.
wburns: sannegrinovero: but yes in regards to efficiency I would agree - I have added comments to the design doc to address those deficiencies as well
mmarkus: sannegrinovero: wburns: and agreed that involving cache stores makes more sense for local caches
sannegrinovero:wburns: mmarkus: my vote would go to throw an "UnsupportedOperationException" if we can't remove the ConcurrentMap API ;-)
mmarkus: sannegrinovero: on size() & co?
sannegrinovero: wburns: mmarkus: yes. unless you prefer to "throw OutOfMemoryException() " ... :D
mmarkus: sannegrinovero: −1. I think they make sense for local caches in particular
sannegrinovero: mmarkus: makes sense for local caches without CacheStore. IFF I need a CacheStore, and I am not loading it all in memory anyway, it likely is because there is not enough memory.
mmarkus: sannegrinovero: you might want to get all the keys in the system for certain scenarios and currently there's  no way to do that
mmarkus: sannegrinovero: the way the currently are these methods are broken as they won't return that. Users have been complaining about it for quite a while.
sannegrinovero: mmarkus: I know, and we discussed this many times on the mailing list. As I said to wburns, this issue was moved to 6.0 so that we would fix the CacheStore SPI to allow for better implementation of size() or iterate()
mmarkus: sannegrinovero: also if they won't the old functionality they can use the SKIP_STORE flag ;)
sannegrinovero: mmarkus: also Map/Reduce is WORTHLESS until you fix that SPI method.
sannegrinovero: mmarkus: as it has the same problem: needs to iterate (map) all entries, but they just don't fit in memory.
wburns: sannegrinovero: I wasn't around when that discussion took place... is that for a streaming SPI or something?
sannegrinovero: mmarkus: I'm not against implementing it, no doubt that it could be useful. It's the implementation which is simply not correct.
sannegrinovero: wburns: you are familiar with the Map/Reduce pattern?
mmarkus: sannegrinovero: +1 to implement them more efficiently, but and the required optimisations were tracked in ISPN-3290
wburns: sannegrinovero: yeah
sannegrinovero: mmarkus: fixed the dependency for you on JIRA.
mmarkus: sannegrinovero: you have a point but it's not a hard dependecy
sannegrinovero: mmarkus: it's not a hard dependency if you document those methods with "throws OOException most of the times"
sannegrinovero: mmarkus: BTW why would you want to fix the issue twice? this is doubling the work if wburns has to fix it now and then fix it again to use the new SPI
mmarkus sannegrinovero: the effort duplication should be small though
sannegrinovero: mmarkus: your call. I just hope nobody will use that method.
wburns: sannegrinovero: what was the part about Map/Reduce?  is it basically that cache store entries aren't used - and we want to add a streaming API to go through them by chunks instead of all the contents?
sannegrinovero: wburns: CacheStores are actually used, but as it is today to run a Map/Reduce job it is loading all entries in memory to run the first Map part. Which is pretty much defeating the purpose of the pattern.
wburns: sannegrinovero: guessing we have optimizations for shared at least :) - but basically is just to stream through the cache store contents
sannegrinovero: wburns: to take advantage of the pattern, it should pass the mapping definition to the store engine to have it narrow down execution (ideally) or (more practically) to run it on an iterator from the CacheStore itself to not run in OOM at least.
sannegrinovero: wburns: no optimisations whatsoever :(
mmarkus: sannegrinovero: I imagine it's the same way you're thinking to have keySet() and entries() implemented as well
sannegrinovero: wburns: and that's another point for the SPI needing improvements. At least the option to load only the entries owned by the current node should be added.
mmarkus: sannegrinovero: use keySet().iterator() to iterate over the set of keys
mmarkus: sannegrinovero: ^^?
wburns: sannegrinovero: ouch
sannegrinovero: mmarkus: exactly. But think about size() to keep it simple.. just add a size() method on the cacheStore first, then the JDBC one (for example) could run a simple COUNT instead of attempting to load 100GB in memory
mmarkus: sannegrinovero: yep, that's noted
{quote}
                
> Cache.keySet(),entrySet(),values(),size() ignore contents of cache loader
> -------------------------------------------------------------------------
>
>                 Key: ISPN-761
>                 URL: https://issues.jboss.org/browse/ISPN-761
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Loaders and Stores
>    Affects Versions: 4.2.0.BETA1
>            Reporter: Paul Ferraro
>            Assignee: William Burns
>            Priority: Critical
>              Labels: onboard, potential_performance_hit
>             Fix For: 6.0.0.Alpha1, 6.0.0.Final
>
>
> Passivated cache entries are not represented in values returned by the keySet(), entrySet(), values(), and size() Cache methods.  This results in inconsistent behavior, since it is possible that a given cache key may not be contained in keySet() even though Cache.get(...) would return a non-null value if the entry was previously passivated.
> I think CacheLoaderInterceptor needs to implement the following methods:
>    Object visitSizeCommand(InvocationContext ctx, SizeCommand command) throws Throwable
>    Object visitValuesCommand(InvocationContext ctx, ValuesCommand command) throws Throwable
>    Object visitEntrySetCommand(InvocationContext ctx, EntrySetCommand command) throws Throwable
>    Object visitKeySetCommand(InvocationContext ctx, KeySetCommand command) throws Throwable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the infinispan-issues mailing list