Re: [infinispan-dev] Design change in Infinispan Query

Friday, 28 February 2014

On Feb 26, 2014, at 5:14 PM, Emmanuel Bernard <emmanuel(a)hibernate.org&gt; wrote:

...

 On 25 Feb 2014, at 16:08, Mircea Markus <mmarkus(a)redhat.com&gt; wrote:

> 
> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard <emmanuel(a)hibernate.org&gt; wrote:
> 
>>> On 24 févr. 2014, at 17:39, Mircea Markus <mmarkus(a)redhat.com&gt; wrote:
>>> 
>>> 
>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard
<emmanuel(a)hibernate.org&gt; wrote:
>>>> 
>>>> By the way, Mircea, Sanne and I had quite a long discussion about this
one and the idea of one cache per entity. It turns out that the right (as in easy)
solution does involve a higher level programming model like OGM provides. You can simulate
it yourself using the Infinispan APIs but it is just cumbersome.
>>> 
>>> Curious to hear the whole story :-)
>>> We cannot mandate all the suers to use OGM though, one of the reasons being
OGM is not platform independent (hotrod). 
>> 
>> Then solve all the issues I have raised with a magic wand and come back to me
when you have done it, I'm interested.
> 
> People are going to use infinispan with one cache per entity, because it makes
sense:
> - different config (repl/dist | persistent/non-persistent) for different data types
> - have map/reduce tasks running only the Person entires not on Dog as well, when you
want to select (Person) where age > 18
> I don't see a reason to forbid this, on the contrary. The way I see it the
relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better
abstraction and should be recommended as such for the Java clients, but ultimately
we're a general purpose storage engine that is available to different platforms as
well.
> 

 I do disagree on your assessment.
 I did write a whole essay on why I think your view is problematic - I was getting tired
of repeating myself ;P

https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structu...

Thanks for writing this up, it is a good taxonomy of data storage schemes and querying.

...

 To anecdotally answer your specific example, yes different configs for different entities
is an interesting benefit but it has to outweigh the drawbacks. 
Using a single cache for all the types is practical at all :-) Just to expand my idea,
people prefer using different caches for many reasons:
- security: Account cache has a different security requirements than the News cache
- data consistency: News is a non-transactional cache, Account require pessimistic XA
transactions
- expiry: expire last year's news from the system. Not the same for Accounts
- availability: I want the Accounts cache to be backed up to another site. I don't
want that for the News cache
- logical data grouping: mixing Accounts with News doesn't make sense. I might want to
know which account appeared in the news, though.

...
 If you have to do a map reduce for tasks so simple as age > 18, I
think you system better have to be prepared to run gazillions of M/R jobs. 
I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to
congratulate them. Once a day, not gazzilions of times, and I don't need to index the
age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the
data in a single cache is two-folded:
- performance: you iterate over the data that is not related to your query. 
- programming model: the Map/Reduce implementation has a dependency on both Dog and
Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of
that as well. Same if I rename/remove Dog. Not nice.

...
 I think that Dogs and any domestic animal is fundamentally related to
humans - Person in your case. So queries involving both will be required - a cross cache
M/R is not doable today AFAIK and even if it was, it’s still M/R and all its drawbacks.
 To me, the Cache API and Hot Rod are well suited for what I call self contained object
graph (i.e. where Dog would be an embedded object of Person and not a separate Entity). In
that situation, there is a single cache. 
I see where you come from but I don't think requiring people to use a single cache for
all the entities is an option. Besides a natural logical separation, different data has
different storage requirements: security, access patterns, consistency, durability,
availability etc. For most of the non-trivial use cases, using a single cache just wont
do. 

...
 One cache per entity does make sense for API that do support what I
call connected entities. Hibernate OGM specifically. 
OGM does a great job covering this, but it is very specific: java only and OOP - our C/S
mode, hotrod specifically, is language independent and not OOP. Also I would like to
comment on the following statements:
"I believe a cache API and Hot Rod are well suited to address up to the self
contained object graph use case with a couple of relations maintained manually by the
application but that cannot be queried. For the connected entities use case, only a high
level paradigm is suited like JPA."

I don't think storing object graphs should be under scrutiny here: Infinispan C/S mode
(and there's where most of the client focus is BTW) has a schema (prtobuf) that does
not support object graphs. I also think expecting people to use multiple caches for
multiple data types is a solid assumption to start from. And here's me speculating:
these data types have logical relations between them so people will ask for querying. In
order to queries on multiple data types, you can either merge them together (your
suggestion) or support some sort of new cross-cache indexing/querying/api. x-cache
querying is more flexible and less restraining than merging data, but from what I
understand from you has certain implementation challenges. There's no pressure to take
a decision now around supporting queries spreading multiple caches - just something to
keep an eye on when dealing with use cases/users. ATM merging data is the only solution
available, let's wait and see if people ask for more.

...
 But please read the wiki page first before commenting. I did spend a
lot of time on it

https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structu...

I do read your comments and I really appreciate your feedback. We come from slightly
different worlds and look at things from different angles, but discussions like this raise
many good points.

...

 Emmanuel
 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev 
Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Design change in Infinispan Query