[infinispan-dev] Design change in Infinispan Query

Mon Feb 17 12:35:25 EST 2014

On 30 Jan 2014, at 20:51, Mircea Markus <mmarkus at redhat.com> wrote:

> 
> On Jan 30, 2014, at 9:42 AM, Galder Zamarreño <galder at redhat.com> wrote:
> 
>> 
>> On Jan 21, 2014, at 11:52 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>> 
>>> 
>>> On Jan 15, 2014, at 1:42 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>> 
>>>> By the way, people looking for that feature are also asking for a unified Cache API accessing these several caches right? Otherwise I am not fully understanding why they ask for a unified query.
>>>> Do you have written detailed use cases somewhere for me to better understand what is really requested?
>>> 
>>> IMO from a user perspective, being able to run queries spreading several caches makes the programming simplifies the programming model: each cache corresponding to a single entity type, with potentially different configuration.
>> 
>> Not sure if it simplifies things TBH if the configuration is the same. IMO, it just adds clutter.
> 
> Not sure I follow: having a cache that contains both Cars and Persons sound more cluttering to me. I think it's cumbersome to write any kind of querying with an heterogenous  cache, e.g. Map/Reduce tasks that need to count all the green Cars would need to be aware of Persons and ignore them. Not only it is harder to write, but discourages code reuse and makes it hard to maintain (if you'll add Pets in the same cache in future you need to update the M/R code as well). And of course there are also different cache-based configuration options that are not immediately obvious (at design time) but will be in the future (there are more Persons than Cars, they live longer/expiry etc): mixing everything together in the same cache from the begging is a design decision that might bite you in the future.
> 
> The way I see it - and very curious to see your opinion on this - following an database analogy, the CacheManager corresponds to an Database and the Cache to a Table. Hence my thought that queries spreading multiple caches are both useful and needed (same as query spreading over multiple tables).

My opinion is that seeing it this way is limiting. A key/value store is schemaless. Your view is forcing a particular schema on how to structure things. 

I don’t pretend everyone to store everything in a single cache and of course there will be situations where it’s not ideal or the best solution, such as in cases like the ones you mention above, but if you want to do it, for any of the reasons I or Paul mentioned in [1], it’d be nice to be able to do so. 

Cheers,

[1] https://issues.jboss.org/browse/ISPN-3640

> 
>> 
>> Just yesterday I discovered this gem in Scala's Shapeless extensions [1]. This is experimental stuff but essentially it allows to define what the key/value type pairs a map will contain, and it does type checking at compile time. I almost wet my pants when I saw that ;) :p. In the example, it defines a map as containing Int -> String, and String -> Int key/value pairs. If you try to add an Int -> Int, it fails compilation.
> 
> Agreed the compile time check is pretty awesome :-) Still mix and matching types in a Map doesn't look great to me for ISPN.
> 
>> 
>> Java's type checking is not powerful enough to do this, and it's compilation logic is not extendable in the same way Scala macros does, but I think the fact that other languages are looking into this validates Paul's suggestion in [2], on top of all the benefits listed there.
>> 
>> Cheers,
>> 
>> [1] https://github.com/milessabin/shapeless/wiki/Feature-overview:-shapeless-2.0.0#heterogenous-maps
>> [2] https://issues.jboss.org/browse/ISPN-3640
>> 
>>> Besides the query API that would need to be extended to support accessing multiple caches, not sure what other APIs would need to be extended to take advantage of this?
>>> 
>>>> 
>>>> Emmanuel
>>>> 
>>>> On 14 Jan 2014, at 12:59, Sanne Grinovero <sanne at infinispan.org> wrote:
>>>> 
>>>>> Up this: it was proposed again today ad a face to face meeting.
>>>>> Apparently multiple parties have been asking to be able to run
>>>>> cross-cache queries.
>>>>> 
>>>>> Sanne
>>>>> 
>>>>> On 11 April 2012 12:47, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>>> 
>>>>>> On 10 avr. 2012, at 19:10, Sanne Grinovero wrote:
>>>>>> 
>>>>>>> Hello all,
>>>>>>> currently Infinispan Query is an interceptor registering on the
>>>>>>> specific Cache instance which has indexing enabled; one such
>>>>>>> interceptor is doing all what it needs to do in the sole scope of the
>>>>>>> cache it was registered in.
>>>>>>> 
>>>>>>> If you enable indexing - for example - on 3 different caches, there
>>>>>>> will be 3 different Hibernate Search engines started in background,
>>>>>>> and they are all unaware of each other.
>>>>>>> 
>>>>>>> After some design discussions with Ales for CapeDwarf, but also
>>>>>>> calling attention on something that bothered me since some time, I'd
>>>>>>> evaluate the option to have a single Hibernate Search Engine
>>>>>>> registered in the CacheManager, and have it shared across indexed
>>>>>>> caches.
>>>>>>> 
>>>>>>> Current design limitations:
>>>>>>> 
>>>>>>> A- If they are all configured to use the same base directory to
>>>>>>> store indexes, and happen to have same-named indexes, they'll share
>>>>>>> the index without being aware of each other. This is going to break
>>>>>>> unless the user configures some tricky parameters, and even so
>>>>>>> performance won't be great: instances will lock each other out, or at
>>>>>>> best write in alternate turns.
>>>>>>> B- The search engine isn't particularly "heavy", still it would be
>>>>>>> nice to share some components and internal services.
>>>>>>> C- Configuration details which need some care - like injecting a
>>>>>>> JGroups channel for clustering - needs to be done right isolating each
>>>>>>> instance (so large parts of configuration would be quite similar but
>>>>>>> not totally equal)
>>>>>>> D- Incoming messages into a JGroups Receiver need to be routed not
>>>>>>> only among indexes, but also among Engine instances. This prevents
>>>>>>> Query to reuse code from Hibernate Search.
>>>>>>> 
>>>>>>> Problems with a unified Hibernate Search Engine:
>>>>>>> 
>>>>>>> 1#- Isolation of types / indexes. If the same indexed class is
>>>>>>> stored in different (indexed) caches, they'll share the same index. Is
>>>>>>> it a problem? I'm tempted to consider this a good thing, but wonder if
>>>>>>> it would surprise some users. Would you expect that?
>>>>>> 
>>>>>> I would not expect that. Unicity in Hibernate Search is not defined per identity but per class + provided id.
>>>>>> I can see people reusing the same class as partial DTO and willing to index that. I can even see people
>>>>>> using the Hibernate Search programmatic API to index the "DTO" stored in cache 2 differently than the
>>>>>> domain class stored in cache 1.
>>>>>> I can concede that I am pushing a bit the use case towards bad-ish design approaches.
>>>>>> 
>>>>>>> 2#- configuration format overhaul: indexing options won't be set on
>>>>>>> the cache section but in the global section. I'm looking forward to
>>>>>>> use the schema extensions anyway to provide a better configuration
>>>>>>> experience than the current <properties />.
>>>>>>> 3#- Assuming 1# is fine, when a search hit is found I'd need to be
>>>>>>> able to figure out from which cache the value should be loaded.
>>>>>>> 3#A  we could have the cache name encoded in the index, as part
>>>>>>> of the identifier: {PK,cacheName}
>>>>>>> 3#B  we actually shard the index, keeping a physically separate
>>>>>>> index per cache. This would mean searching on the joint index view but
>>>>>>> extracting hits from specific indexes to keep track of "which index"..
>>>>>>> I think we can do that but it's definitely tricky.
>>>>>>> 
>>>>>>> It's likely easier to keep indexed values from different caches in
>>>>>>> different indexes. that would mean to reject #1 and mess with the user
>>>>>>> defined index name, to add for example the cache name to the user
>>>>>>> defined string.
>>>>>>> 
>>>>>>> Any comment?
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Sanne
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> Cheers,
>>> -- 
>>> Mircea Markus
>>> Infinispan lead (www.infinispan.org)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> --
>> Galder Zamarreño
>> galder at redhat.com
>> twitter.com/galderz
>> 
>> Project Lead, Escalante
>> http://escalante.io
>> 
>> Engineer, Infinispan
>> http://infinispan.org
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org