On 6 March 2014 09:21, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote:
On Wed 2014-03-05 17:16, Mircea Markus wrote:
> Sanne came with a good follow up to this email, just some small clarifications:
>
> On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote:
>
> >>> If you have to do a map reduce for tasks so simple as age > 18, I
think you system better have to be prepared to run gazillions of M/R jobs.
> >>
> >> I want to run a simple M/R job in the evening to determine who turns 18
tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need
to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of
holding all the data in a single cache is two-folded:
> >> - performance: you iterate over the data that is not related to your query.
> >
> > If the data are never related (query wise), then we are in the database split
category. Which is fine. But if some of your queries are related, what do you do? Deny the
user the ability to do them?
>
> Here's where cross-site query would have been used. As Sanne suggested (next
post) these limitations overcome the advantages.
No. Cross-cache query if implemented will not support (efficiently
enough) that kind of query. Cf my wiki page.
>
> >
> >> - programming model: the Map/Reduce implementation has a dependency on both
Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be
aware of that as well. Same if I rename/remove Dog. Not nice.
> >
> > Well it’s called type safety, some people find it good ;)
>
> If anything, this model reduces type safety and reusability. E.g. say you want a M/R
task to see how many Persons speak French. With the single cache model(both Dog and Person
int he Cache<String, Mammal>) it would look something like:
>
> a)
> //pseudocode
> map (String k, Mammal value) {
> if (value instanceof Person)) { //this is the ugly part
> if (((Person)value).speaks("French")) ...
> } else {
> //ignore it, it's an Dog
> }
> }
>
> Same thing written for a Cache<String, Person>:
>
> b)
> map (String k, Person value) {
> if (value.speaks("French")) ...
> }
>
> I don't think people would prefer writing a) instead of b) ;)
I concede that point. I would actually have stored
Person {
name: emmanuel
dogs: [
Dog { name:django }
]
}
in the cache making it essentially a Cache<UUID,Person>.
I would not have two caches Cache<UUID,Person>, Cache<UUID,Dog> though
because it would prevent me from doing efficient data correlations
between Persons and Dogs.
True. But even the example by Mircea of non-related Mammals / Persons,
there is a more elegant solution by wiring it up on Paul's proposal of
Cache views. I'd obtain a typesafe Cache<K,Person> instance from the
root Cache<K,Mammals>, then run my M/R job on top of this.
As I suggested in Palma, I love the idea of cache views but if
implemented they need to be taking care of all aspects, so it would
implicitly filter on the "instanceof Person" clause.
So, pseudocode again:
cache.onType(Person.class)
.map(String k, Person value) {
if (value.speaks("French")) ...
}
And I believe the API would be typesafe all the way to the Map implementation.
Sanne