On Wed, Feb 26, 2014 at 3:12 PM, Mircea Markus <mmarkus(a)redhat.com> wrote:
On Feb 25, 2014, at 5:08 PM, Sanne Grinovero <sanne(a)infinispan.org> wrote:
> There also is the opposite problem to be considered, as Emmanuel
> suggested on 11/04/2012:
> you can't forbid the user to store the same object (same type and same
> id) in two different caches, where each Cache might be using different
> indexing options.
>
> If the "search service" is a global concept, and you run a query which
> matches object X, we'll return it to the user but he won't be able to
> figure out from which cache it's being sourced: is that ok?
Can't the user figure that out based on the way the query is built?
I mean the problem is similar with the databases: if address is both a
table and an column in the USER table, then it's the query (select) that
determines where from the address is returned.
You mean the user should specify the cache name(s) when building the query?
With a database you have to go a bit out of your way to select from more
than one table at a time, normally you have just one primary table that you
select from and the others are just to help you filter and transform that
table. You also have to add some information about the source table
yourself if you need it, otherwise the DB won't tell you what table the
results are coming from:
SELECT "table1" as source, id FROM table1
UNION ALL
SELECT "table2" as source, id FROM table2
Adrian tells our current query API doesn't allow us to do projections with
synthetic columns. On the other hand, we need to extend the current API to
give us the entry key anyway, so it would be easy to extend it to give us
the name of the cache as well.
>
> Ultimately this implies a query might return the same object X in
> multiple positions in the result list of the query; for example it
> might be the top result according to some criteria but also be the 5th
> result because of how it was indexed in a different case: maybe
> someone will find good use for this "capability" but I see it
> primarily as a source of confusion.
Curious if this cannot be source of data can/cannot be specified within
the query.
Right, the user should be able to scope a search to a single cache, or
maybe to multiple caches, even if there is only one global index.
But I think the same object can already be inserted twice in the same
cache, only with a different key, so returning duplicates from a query is
something the user already has to cope with.
> Finally, if we move the search service as a global component,
there
> might be an impact in how we explain security: an ACL filter applied
> on one cache - or the index metadata produced by that cache - might
> not be applied in the same way by an entity being matched through a
> second cache.
> Not least a user's permission to access one cache (or not) will affect
> his results in a rather complex way.
I'll let Tristan comment more on this, but is this really different from
an SQL database where you grant access on individual tables and run a query
involving multiple of them?
The difference would be that in a DB each table will have its own
index(es), so they only have to check the permissions once and not for
every row.
OTOH, if we plan to support key-level permissions, that would require
checking the permissions on each search result anyway, so this wouldn't
cost us anything.
>
> I'm wondering if we need to prevent such situations.
>
> Sanne
>
> On 25 February 2014 16:24, Mircea Markus <mmarkus(a)redhat.com> wrote:
>>
>> On Feb 25, 2014, at 3:46 PM, Adrian Nistor <anistor(a)gmail.com> wrote:
>>
>>> They can do what they please. Either put multiple types in one basket
or put them in separate caches (one type per cache). But allowing /
recommending is one thing, mandating it is a different story.
>>>
>>> There's no reason to forbid _any_ of these scenarios / mandate one
over the other! There was previously in this thread some suggestion of
mandating the one type per cache usage. -1 for it
>>
>> Agreed. I actually don't see how we can enforce people that declare
Cache<Object,Object> not put whatever they want in it. Also makes total
sense for smaller caches as it is easy to set up etc.
>> The debate in this email, the way I understood it, was: are/should
people using multiple caches for storing data? If yes we should consider
querying functionality spreading over multiple caches.
>>
>>>
>>>
>>>
>>> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus <mmarkus(a)redhat.com>
wrote:
>>>
>>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard
<emmanuel(a)hibernate.org>
wrote:
>>>
>>>>> On 24 févr. 2014, at 17:39, Mircea Markus
<mmarkus(a)redhat.com>
wrote:
>>>>>
>>>>>
>>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <
emmanuel(a)hibernate.org> wrote:
>>>>>>
>>>>>> By the way, Mircea, Sanne and I had quite a long discussion
about
this one and the idea of one cache per entity. It turns out that the right
(as in easy) solution does involve a higher level programming model like
OGM provides. You can simulate it yourself using the Infinispan APIs but it
is just cumbersome.
>>>>>
>>>>> Curious to hear the whole story :-)
>>>>> We cannot mandate all the suers to use OGM though, one of the
reasons being OGM is not platform independent (hotrod).
>>>>
>>>> Then solve all the issues I have raised with a magic wand and come
back to me when you have done it, I'm interested.
>>>
>>> People are going to use infinispan with one cache per entity, because
it makes sense:
>>> - different config (repl/dist | persistent/non-persistent) for
different data types
>>> - have map/reduce tasks running only the Person entires not on Dog as
well, when you want to select (Person) where age > 18
>>> I don't see a reason to forbid this, on the contrary. The way I see it
the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be
a better abstraction and should be recommended as such for the Java
clients, but ultimately we're a general purpose storage engine that is
available to different platforms as well.
>>>
>>>