Sanne came with a good follow up to this email, just some small clarifications:
On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote:
>> If you have to do a map reduce for tasks so simple as age
> 18, I think you system better have to be prepared to run gazillions of M/R jobs.
>
> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to
congratulate them. Once a day, not gazzilions of times, and I don't need to index the
age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the
data in a single cache is two-folded:
> - performance: you iterate over the data that is not related to your query.
If the data are never related (query wise), then we are in the database split category.
Which is fine. But if some of your queries are related, what do you do? Deny the user the
ability to do them?
Here's where cross-site query would have been used. As Sanne suggested (next post)
these limitations overcome the advantages.
> - programming model: the Map/Reduce implementation has a dependency on both Dog and
Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of
that as well. Same if I rename/remove Dog. Not nice.
Well it’s called type safety, some people find it good ;)
If anything, this model reduces type safety and reusability. E.g. say you want a M/R task
to see how many Persons speak French. With the single cache model(both Dog and Person int
he Cache<String, Mammal>) it would look something like:
a)
//pseudocode
map (String k, Mammal value) {
if (value instanceof Person)) { //this is the ugly part
if (((Person)value).speaks("French")) ...
} else {
//ignore it, it's an Dog
}
}
Same thing written for a Cache<String, Person>:
b)
map (String k, Person value) {
if (value.speaks("French")) ...
}
I don't think people would prefer writing a) instead of b) ;)
Cheers,
--
Mircea Markus
Infinispan lead (
www.infinispan.org)