On Wed, Mar 12, 2014 at 3:12 PM, Paul Ferraro <paul.ferraro(a)redhat.com> wrote:
On Wed, 2014-03-12 at 18:45 +0100, Galder ZamarreƱo wrote:
> On 05 Mar 2014, at 18:16, Mircea Markus <mmarkus(a)redhat.com> wrote:
>
> > Sanne came with a good follow up to this email, just some small clarifications:
> >
> > On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard <emmanuel(a)hibernate.org>
wrote:
> >
> >>>> If you have to do a map reduce for tasks so simple as age > 18, I
think you system better have to be prepared to run gazillions of M/R jobs.
> >>>
> >>> I want to run a simple M/R job in the evening to determine who turns 18
tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need
to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of
holding all the data in a single cache is two-folded:
> >>> - performance: you iterate over the data that is not related to your
query.
> >>
> >> If the data are never related (query wise), then we are in the database
split category. Which is fine. But if some of your queries are related, what do you do?
Deny the user the ability to do them?
> >
> > Here's where cross-site query would have been used. As Sanne suggested (next
post) these limitations overcome the advantages.
> >
> >>
> >>> - programming model: the Map/Reduce implementation has a dependency on
both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to
be aware of that as well. Same if I rename/remove Dog. Not nice.
> >>
> >> Well it's called type safety, some people find it good ;)
> >
> > If anything, this model reduces type safety and reusability. E.g. say you want a
M/R task to see how many Persons speak French. With the single cache model(both Dog and
Person int he Cache<String, Mammal>) it would look something like:
> >
> > a)
> > //pseudocode
> > map (String k, Mammal value) {
> > if (value instanceof Person)) { //this is the ugly part
> > if (((Person)value).speaks("French")) ...
> > } else {
> > //ignore it, it's an Dog
> > }
> > }
> >
> > Same thing written for a Cache<String, Person>:
> >
> > b)
> > map (String k, Person value) {
> > if (value.speaks("French")) ...
> > }
> >
> > I don't think people would prefer writing a) instead of b) ;)
>
> It's a pity that having discussed in Mallorca and having referenced a fair few
times already, you could not think of an option based on what Paul suggsted in
https://issues.jboss.org/browse/ISPN-3640
>
> Here's my attempt:
>
> Cache<?, ?> cache = ...;
> ValueFilter filter = ... // filter would check those values that are persons...
> Cache<String, Person> view = cache.filter(filter);
> view.map(String k, Person value) {
> ...
> }
>
> Paul referred to per key type filters, but maybe per value type filters, such as the
one I did here, might be useful.
We could also generalize KeyFilter to something like:
public interface Filter {
boolean accepts(Object key, Object value);
}
Infinispan could ship some simple default implementations that filter
based on a specific key type, value type, or both.
e.g.
Cache<?, ?> cache = ...;
Cache<String, Integer> view = cache.filter(new KeyValueTypeFilter(String.class,
Integer.class));
view.addListener(...); // Listens only to events accepted by filter
for (Map.Entry<String, Integer> entry: view.entrySet()) {
// No casting necessary!
}
where KeyValueTypeFilter looks like:
public class KeyValueTypeFilter implements Filter {
private Class<?> keyType;
private Class<?> valueType;
public KeyValueTypeFilter(Class<?> keyType, Class<?> valueType) {
this.keyType = keyType;
this.valueType = valueType;
}
@Override
public boolean accepts(Object key, Object value) {
return this.keyType.isInstance(key) && this.valueType.isInstance(value);
}
}
I am all for a feature like this, however I want to caution making
this so flexible. Lets say you have 2 views of the same cache
<String, Person> and <String, Dog>.
Now lets say you do a put("foo", mircea) in the first and the second
does put("foo", dog1). Will these puts collide? How are they stored?
Do we need a Map<Type, V> for each Key in the DataContainer?
What would happen if I use the non view Cache<String, Mammal>
get("foo")? Would that return both values, neither, the last one
written?
What would happen if I use the non view Cache<String, Mammal>
put("foo", mircea)? I assume that also updates the <String, Person>
view but not the <String, Dog> ?
Maybe it would be simpler to have these additional views read only?
> Cheers,
>
>
> >
> >
> > Cheers,
> > --
> > Mircea Markus
> > Infinispan lead (
www.infinispan.org)
> >
> >
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev(a)lists.jboss.org
> >
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder ZamarreƱo
> galder(a)redhat.com
>
twitter.com/galderz
>
> Project Lead, Escalante
>
http://escalante.io
>
> Engineer, Infinispan
>
http://infinispan.org
>
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev