On 7 March 2014 15:27, Mircea Markus <mmarkus(a)redhat.com> wrote:
On Mar 7, 2014, at 3:21 PM, Sanne Grinovero <sanne(a)infinispan.org> wrote:
> On 7 March 2014 14:54, Mircea Markus <mmarkus(a)redhat.com> wrote:
>>
>> On Mar 6, 2014, at 9:21 AM, Emmanuel Bernard <emmanuel(a)hibernate.org>
wrote:
>>
>>> On Wed 2014-03-05 17:16, Mircea Markus wrote:
>>>> Sanne came with a good follow up to this email, just some small
clarifications:
>>>>
>>>> On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard
<emmanuel(a)hibernate.org> wrote:
>>>>
>>>>>>> If you have to do a map reduce for tasks so simple as age
> 18, I think you system better have to be prepared to run gazillions of M/R jobs.
>>>>>>
>>>>>> I want to run a simple M/R job in the evening to determine who
turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I
don't need to index the age filed just for that. Also when it comes to Map/Reduce, the
drawback of holding all the data in a single cache is two-folded:
>>>>>> - performance: you iterate over the data that is not related to
your query.
>>>>>
>>>>> If the data are never related (query wise), then we are in the
database split category. Which is fine. But if some of your queries are related, what do
you do? Deny the user the ability to do them?
>>>>
>>>> Here's where cross-site query would have been used. As Sanne
suggested (next post) these limitations overcome the advantages.
>>>
>>> No. Cross-cache query if implemented will not support (efficiently
>>> enough) that kind of query. Cf my wiki page.
>>
>> yes, non-indexed joins would be exponential on the number of caches involved.
>
> Technically non-indexed joins would be exponential on the number of
> caches (joins) involves *and* on the amount of entries you have
> stored: I know you wheren't suggesting doing it, but to confirm it's
> even worse than an horrible idea ;-)
> And that's not even considering the subtle design catch of "load it
> all from all cachestores".. combined with "multiple times per join"..
I wasn't suggesting doing it, not only for performance but also for the limitations
you mentioned in the previous emails.
>
>> Is it possible to use an index for x-cache joins with linear index update time
and query?
>
> Index update cost is not linear but LogN: approximates to a constant
> cost.
you're counting RPCs here or index seeks?
RPCs are constant, and independent from both the query type and the
data size. For a local (or distributed) index there are zero RPCs, for
DIST it depends on a factor of total index size, chunking and merging
options, numowners, etc.. but these are fixed once defined - >
constant number of RPCs.
The count on index seeks do depend on the query type only, not on the
size at all.
I'm referring to the approximate computation cost of each index seek.
> And we could cut this constant by 4 orders of magnitude if only
> I could safely differentiate between a put of a new entry vs. an
> update -> something which we'll need to brainstorm about.
>
> Query time is also significantly sub-linear in practice, but specifics
> will vary on the query type.
>
> Yes you could use indexes to improve x-cache joins, but you'll need an
> additional engine to coordinate that correctly, not least to manage
> data size buffers; essentially I think you'd need Teiid.
>
> Sanne
>
>
>>
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (
www.infinispan.org)
>>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Cheers,
--
Mircea Markus
Infinispan lead (
www.infinispan.org)
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev