On 7 March 2014 14:54, Mircea Markus <mmarkus(a)redhat.com> wrote:
On Mar 6, 2014, at 9:21 AM, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote:
> On Wed 2014-03-05 17:16, Mircea Markus wrote:
>> Sanne came with a good follow up to this email, just some small clarifications:
>>
>> On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard <emmanuel(a)hibernate.org>
wrote:
>>
>>>>> If you have to do a map reduce for tasks so simple as age > 18, I
think you system better have to be prepared to run gazillions of M/R jobs.
>>>>
>>>> I want to run a simple M/R job in the evening to determine who turns 18
tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need
to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of
holding all the data in a single cache is two-folded:
>>>> - performance: you iterate over the data that is not related to your
query.
>>>
>>> If the data are never related (query wise), then we are in the database split
category. Which is fine. But if some of your queries are related, what do you do? Deny the
user the ability to do them?
>>
>> Here's where cross-site query would have been used. As Sanne suggested (next
post) these limitations overcome the advantages.
>
> No. Cross-cache query if implemented will not support (efficiently
> enough) that kind of query. Cf my wiki page.
yes, non-indexed joins would be exponential on the number of caches involved.
Technically non-indexed joins would be exponential on the number of
caches (joins) involves *and* on the amount of entries you have
stored: I know you wheren't suggesting doing it, but to confirm it's
even worse than an horrible idea ;-)
And that's not even considering the subtle design catch of "load it
all from all cachestores".. combined with "multiple times per join"..
Is it possible to use an index for x-cache joins with linear index
update time and query?
Index update cost is not linear but LogN: approximates to a constant
cost. And we could cut this constant by 4 orders of magnitude if only
I could safely differentiate between a put of a new entry vs. an
update -> something which we'll need to brainstorm about.
Query time is also significantly sub-linear in practice, but specifics
will vary on the query type.
Yes you could use indexes to improve x-cache joins, but you'll need an
additional engine to coordinate that correctly, not least to manage
data size buffers; essentially I think you'd need Teiid.
Sanne
Cheers,
--
Mircea Markus
Infinispan lead (
www.infinispan.org)
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev