On 02/26/2014 04:20 PM, Mircea Markus wrote:
On Feb 26, 2014, at 2:13 PM, Dan Berindei
<dan.berindei(a)gmail.com> wrote:
>
>
> On Wed, Feb 26, 2014 at 3:12 PM, Mircea Markus <mmarkus(a)redhat.com> wrote:
>
> On Feb 25, 2014, at 5:08 PM, Sanne Grinovero <sanne(a)infinispan.org> wrote:
>
>> There also is the opposite problem to be considered, as Emmanuel
>> suggested on 11/04/2012:
>> you can't forbid the user to store the same object (same type and same
>> id) in two different caches, where each Cache might be using different
>> indexing options.
>>
>> If the "search service" is a global concept, and you run a query which
>> matches object X, we'll return it to the user but he won't be able to
>> figure out from which cache it's being sourced: is that ok?
> Can't the user figure that out based on the way the query is built?
> I mean the problem is similar with the databases: if address is both a table and an
column in the USER table, then it's the query (select) that determines where from the
address is returned.
>
> You mean the user should specify the cache name(s) when building the query?
yes
Let's say multiple caches are specified when building the query. How can
I tell (with current result api) where does the matching entity come
from? I still think we should extend the result api in order to provide:
1. the key of the entity, 2. the name of the originating cache. The old
result api that just gives you an Iterator<Object> over the matches
should continue to exist because it's more efficient for the cases when
the user does not need #1 and #2.
> With a database you have to go a bit out of your way to select from more than one
table at a time, normally you have just one primary table that you select from and the
others are just to help you filter and transform that table. You also have to add some
information about the source table yourself if you need it, otherwise the DB won't
tell you what table the results are coming from:
>
> SELECT "table1" as source, id FROM table1
> UNION ALL
> SELECT "table2" as source, id FROM table2
>
> Adrian tells our current query API doesn't allow us to do projections with
synthetic columns. On the other hand, we need to extend the current API to give us the
entry key anyway, so it would be easy to extend it to give us the name of the cache as
well.
>
>
>> Ultimately this implies a query might return the same object X in
>> multiple positions in the result list of the query; for example it
>> might be the top result according to some criteria but also be the 5th
>> result because of how it was indexed in a different case: maybe
>> someone will find good use for this "capability" but I see it
>> primarily as a source of confusion.
> Curious if this cannot be source of data can/cannot be specified within the query.
>
> Right, the user should be able to scope a search to a single cache, or maybe to
multiple caches, even if there is only one global index.
>
> But I think the same object can already be inserted twice in the same cache, only
with a different key, so returning duplicates from a query is something the user already
has to cope with.
>
>
>> Finally, if we move the search service as a global component, there
>> might be an impact in how we explain security: an ACL filter applied
>> on one cache - or the index metadata produced by that cache - might
>> not be applied in the same way by an entity being matched through a
>> second cache.
>> Not least a user's permission to access one cache (or not) will affect
>> his results in a rather complex way.
> I'll let Tristan comment more on this, but is this really different from an SQL
database where you grant access on individual tables and run a query involving multiple of
them?
>
> The difference would be that in a DB each table will have its own index(es), so they
only have to check the permissions once and not for every row.
>
> OTOH, if we plan to support key-level permissions, that would require checking the
permissions on each search result anyway, so this wouldn't cost us anything.
>
>
>> I'm wondering if we need to prevent such situations.
>>
>> Sanne
>>
>> On 25 February 2014 16:24, Mircea Markus <mmarkus(a)redhat.com> wrote:
>>> On Feb 25, 2014, at 3:46 PM, Adrian Nistor <anistor(a)gmail.com> wrote:
>>>
>>>> They can do what they please. Either put multiple types in one basket or
put them in separate caches (one type per cache). But allowing / recommending is one
thing, mandating it is a different story.
>>>>
>>>> There's no reason to forbid _any_ of these scenarios / mandate one
over the other! There was previously in this thread some suggestion of mandating the one
type per cache usage. -1 for it
>>> Agreed. I actually don't see how we can enforce people that declare
Cache<Object,Object> not put whatever they want in it. Also makes total sense for
smaller caches as it is easy to set up etc.
>>> The debate in this email, the way I understood it, was: are/should people
using multiple caches for storing data? If yes we should consider querying functionality
spreading over multiple caches.
>>>
>>>>
>>>>
>>>> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus <mmarkus(a)redhat.com>
wrote:
>>>>
>>>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard
<emmanuel(a)hibernate.org> wrote:
>>>>
>>>>>> On 24 févr. 2014, at 17:39, Mircea Markus
<mmarkus(a)redhat.com> wrote:
>>>>>>
>>>>>>
>>>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard
<emmanuel(a)hibernate.org> wrote:
>>>>>>>
>>>>>>> By the way, Mircea, Sanne and I had quite a long discussion
about this one and the idea of one cache per entity. It turns out that the right (as in
easy) solution does involve a higher level programming model like OGM provides. You can
simulate it yourself using the Infinispan APIs but it is just cumbersome.
>>>>>> Curious to hear the whole story :-)
>>>>>> We cannot mandate all the suers to use OGM though, one of the
reasons being OGM is not platform independent (hotrod).
>>>>> Then solve all the issues I have raised with a magic wand and come
back to me when you have done it, I'm interested.
>>>> People are going to use infinispan with one cache per entity, because it
makes sense:
>>>> - different config (repl/dist | persistent/non-persistent) for different
data types
>>>> - have map/reduce tasks running only the Person entires not on Dog as
well, when you want to select (Person) where age > 18
>>>> I don't see a reason to forbid this, on the contrary. The way I see
it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a
better abstraction and should be recommended as such for the Java clients, but ultimately
we're a general purpose storage engine that is available to different platforms as
well.
>>>>
>>>>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Cheers,