Adrian,
Thanks for the transcript between yourself and Philippe below. Here are my thoughts:
* loadAll() is generally overused and this can get expensive. I've changed
purgeExpired() in certain impls to not use loadAll().
* preloading the cache also calls loadAll(). I have a suggestion for this here -
https://jira.jboss.org/jira/browse/ISPN-310 - but this won't be in place till 4.1.0.
* rehashing isn't as bad as you think - the rehashing of entries in stores only takes
place when the cache store is *not* shared. Any use of an expensive, remote store (such
as S3, JDBC) would typically be shared between Infinispan nodes and as such these will not
be considered when rehashing.
That said, stuff can be improved a bit, specifically with the addition of something like
loadKeys(Set<Object> excludes). This will allow the rehash code to load just the
necessary keys, excluding keys already considered from the data container directly, and
then inspect each key to test if the key needs to be rehashed elsewhere. If so, the value
could be loaded using load().
I have captured this in
https://jira.jboss.org/jira/browse/ISPN-311
The problem, I think, with maintaining metadata separately is that it adds an additional
synchronization point when updating that metadata, whether this is expiration data per
key, or even just a list of keys in the store for a quick loadKeys() impl. But am open to
ideas, after all this is just CacheStore specific implementation details.
Cheers
Manik
On 3 Dec 2009, at 16:20, Adrian Cole wrote:
<adriancole> aloha all
<pvdyck> hi all
<adriancole> we are talking about the rehash concern wrt bucket-based
cachestores
<pvdyck> here is transcript
<pvdyck> it seems that it first loops on the set from the store to
compare the keys with the keys in memory
<pvdyck> [17:01] pvdyck: the set of keys present in memory will
always be smaller ... so maybe looping on this one and comparing with
the keys present in the store is a good optimization
<pvdyck> [17:01] pvdyck: I will give you the exact file and line in a moment
<pvdyck> [17:02] pvdyck: ok LeaveTask:74
<pvdyck> [17:03] pvdyck: actually
org.infinispan.distribution.LeaveTask line 74 from CR2
<adriancole> for context, the current design implies a big load, pvdyck, right?
<pvdyck> (is it sooo early ... is there and #infinispan irc channel
display next to the coffee machine ? ;-)
<adriancole> :)
<adriancole> pvdyk, I can see that changing the loop will reduce the
possiblity for overloading a node
<pvdyck> the design implies calling loadAllLockSafe() ... loading all
the entries (K+V) from the cache -> very bad idea actually
<adriancole> seems that keys should be in a separate place
<adriancole> wdyt?
<adriancole> a lot of large systems have a separate area for metadata
and payload
<adriancole> one popular one is git ;)
<pvdyck> the simple idea of having this loadAll thing is a problem
<pvdyck> if it ever get called ... I am quite sure the system will hang
<pvdyck> and indeed you are right, there is no reason to bring the
values with it ... keys are more than enough!
<adriancole> so, here's the thing
<adriancole> the whole bucket-based thing is suppopsed to help avoid
killing entries who share the same hashCode
<adriancole> and there's also another issue with encoding keys
<adriancole> since they might be typed and not strings
<pvdyck> is it still the case with the permanent hash ?
<pvdyck> oops sorry ... consistent hash
<adriancole> well, I'm talking about the hash of the thing you are putting in
<adriancole> not the consistent hash alg
<pvdyck> ok, understood...
<adriancole> ya, so I think that if we address this, we're ok
<adriancole> in the blobstore (jclouds) thing, we could address
<adriancole> by having a partition on hashCode
<adriancole> and encoding the object's key as the name of the thing in s3
<adriancole> or whereever
<adriancole> so like "apple" -> "bear"
<adriancole> "bear".hashCode/encode("apple")
<pvdyck> actually, I don't think the problem should end up in the
hands of the store itself
<adriancole> that would be convenient :)
<adriancole> in that case, I think that ispn may need a metadata
store and value store
<adriancole> since currently the typed bucket object contains both
keys and values
<adriancole> which makes it impossible to get to one without the other
<adriancole> I'm pretty sure
<pvdyck> looks like a lot of changes... but it is a path to explore!
<pvdyck> we obviously need to wait for them to appear ;-)
<adriancole> firstly, I think we'll need a patch for the s3
optimization so you can work :)
<adriancole> and also, this design needs to be reworked for sure
* Received a malformed DCC request from pvdyck.
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org