On 04 Mar 2014, at 19:02, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote:
> </snip>
>>
>> To anecdotally answer your specific example, yes different configs for different
entities is an interesting benefit but it has to outweigh the drawbacks.
>
> Using a single cache for all the types is practical at all :-) Just to expand my
idea, people prefer using different caches for many reasons:
> - security: Account cache has a different security requirements than the News cache
> - data consistency: News is a non-transactional cache, Account require pessimistic XA
transactions
> - expiry: expire last year's news from the system. Not the same for Accounts
> - availability: I want the Accounts cache to be backed up to another site. I
don't want that for the News cache
> - logical data grouping: mixing Accounts with News doesn't make sense. I might
want to know which account appeared in the news, though.
This kind of reasons reminds me in the RDBMS world of why people use different
databases.
In fact, I have had experience where literally News was a different database than
Accounts.
But again in this model, in one database, you have many tables.
>
>> If you have to do a map reduce for tasks so simple as age > 18, I think you
system better have to be prepared to run gazillions of M/R jobs.
>
> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to
congratulate them. Once a day, not gazzilions of times, and I don't need to index the
age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the
data in a single cache is two-folded:
> - performance: you iterate over the data that is not related to your query.
@Mircea: when we talked about mixing up data in a cache, we talked that you’d get a view
of the cache, say for a particular type, and iterators, map/reduce functions…etc, would
only iterate over those. Hence, you’d avoid iterating over stuff not relevant to you.
If the data are never related (query wise), then we are in the database split category.
Which is fine. But if some of your queries are related, what do you do? Deny the user the
ability to do them?
> - programming model: the Map/Reduce implementation has a dependency on both Dog and
Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of
that as well. Same if I rename/remove Dog. Not nice.
Well it’s called type safety, some people find it good ;)
By the way, OGM does abstract a class from it’s representation in the datastore
(including its name). But that’s another story ;)
>
>> I think that Dogs and any domestic animal is fundamentally related to humans -
Person in your case. So queries involving both will be required - a cross cache M/R is not
doable today AFAIK and even if it was, it’s still M/R and all its drawbacks.
>> To me, the Cache API and Hot Rod are well suited for what I call self contained
object graph (i.e. where Dog would be an embedded object of Person and not a separate
Entity). In that situation, there is a single cache.
>
> I see where you come from but I don't think requiring people to use a single
cache for all the entities is an option. Besides a natural logical separation, different
data has different storage requirements: security, access patterns, consistency,
durability, availability etc. For most of the non-trivial use cases, using a single cache
just wont do.
Let me rephrase and sum up my position.
If you are storing unrelated data, use different caches if you want, that’s fine.
If you are storing related data, store it as one root entity and embeddable objects (ie
one cache entry for the whole graph)
you can have one root entity per cache, that’s fine.
If you are storing related entities and want to do queries on it: you are more or less
screwed today with Infinispan and need a higher level abstraction.
So _recommending_ one entity = one cache to me is wrong.
^ +100
It’s more one entity graph = one cache which is vastly different and
has deep consequences (see my wiki page).
+1 - it opens up a lot of interesting possibilities, and with with cache views you could
drill down to subsets of the cache.
Cheers,
--
Galder Zamarreño
galder(a)redhat.com
twitter.com/galderz
Project Lead, Escalante
http://escalante.io
Engineer, Infinispan
http://infinispan.org