I have not forgotten, I'm just in a middle of a Bean Validation crisis
that delayed my look into this issue.
Could it be BTW that the mass indexer does not ask for these objects to
be loaded using Hibernate.initialize ? It coudl also be a bug in OGM but
not necessarily. In particular is the association lazy or eager?
Emmanuel
On Mon 2013-03-11 11:00, Davide D'Alto wrote:
I have created a branch for OGM-228 (OGM MassIndexer) that includes
OGM-151 (Metamodel) and OGM-273 (load entities from tuple):
https://github.com/DavideD/hibernate-ogm/tree/OGM-228
A test I've added fails though (AssociationMassIndexerTest):
https://github.com/DavideD/hibernate-ogm/blob/74549a4d264af30fa88960c30e2...
The test uses two entitties IndexedNews and IndexedLabel, with a
relationship one to many from news to label.
The mass indexing works fine but when I retrieve the list of indexed
labels with the query "FROM IndexedLabel", the result contains a list
of proxy and the equals fails because the class of the objects in the
list is not IndexedLabel.
If I first get the list of news and than for each of them I called the
method news.getLabels(), everything works fine.
Any thoughts
Thanks
On Thu, Mar 7, 2013 at 10:15 AM, Emmanuel Bernard
<emmanuel(a)hibernate.org> wrote:
> I have no more coin for this one so I have dumped what I have so far
>
https://github.com/hibernate/hibernate-ogm/pull/175
>
> Emmanuel
>
> On Wed 2013-03-06 19:18, Emmanuel Bernard wrote:
>> I've successfully implemented OGM-151 for EntityKey which is the one we
>> need to move OGM-273 forward for now.
>> I am trying to implement it for AssociationKey but caching here is
>> significantly harder as data is cross reference across associations.
>>
>> Sanne, when you worked on the profiling of OGM, do you remember
>> AssociationKey putting a pressure in build time or memory wise? Because
>> caching them per persister means some rather complex race conditions and
>> more memory used permanently (as opposed to on demand).
>>
>> So I'm wondering if that's worth it. As an intermediary step, I could
>> introduce AssociationKeyMetadata but build it on-demand - that one is
>> easier to achieve.
>>
>> Emmanuel
>>
>> On Wed 2013-03-06 15:32, Davide D'Alto wrote:
>> > it's ok for me
>> >
>> > Davide
>> >
>> > On Wed, Mar 6, 2013 at 3:28 PM, Emmanuel Bernard
<emmanuel(a)hibernate.org> wrote:
>> > > I'm planning on working on OGM-151. Fine with everyone?
>> > > That will likely be my last before I move back to BVAL and close the
>> > > final issues there.
>> > >
>> > > Emmanuel
>> > >
>> > > On Tue 2013-03-05 19:04, Sanne Grinovero wrote:
>> > >> Nice!
>> > >> n+1 is something Hibernate Search has to deal with too, that's
why I
>> > >> was interested in the fetch profiles and graph loading in JPA 2.1
>> > >>
>> > >> On 5 March 2013 17:44, Emmanuel Bernard
<emmanuel(a)hibernate.org> wrote:
>> > >> > I have implemented a solution that gives an entity based on a
tuple.
>> > >> >
https://hibernate.onjira.com/browse/OGM-273#comment-50082
>> > >> >
>> > >> > Note that it does not currently works for MongoDB, but
that's waiting
>> > >> > for the dedicated GridDialect method as well as OGM-151.
>> > >> > Also note that I have no idea how that will work for
associations. I
>> > >> > suspect some nasty n+1 is happening as best. Worse case, an
exception :)
>> > >> >
>> > >> > Emmanuel
>> > >> >
>> > >> > On Tue 2013-03-05 10:30, Emmanuel Bernard wrote:
>> > >> >> We might hope for a stable enough contract on Hibernate
Search and
>> > >> >> hope that we won't break serializability between
micro or minor
>> > >> >> versions. That will need to be taken into account in the
test suite and
>> > >> >> design.
>> > >> >> On the OGM side though, we are not at that level of
maturity and we will
>> > >> >> force homogenous Hibernate OGM version across all the
cluster. The grid
>> > >> >> will have to go down for upgrades or enforce that no mpa
reduce job
>> > >> >> using OGM is used while the version roll out is in
process.
>> > >> >>
>> > >> >> Emmanuel
>> > >> >>
>> > >> >> On Mon 2013-03-04 18:30, Sanne Grinovero wrote:
>> > >> >> > Found an example, this is all the code it needs to
have a MassIndexer working
>> > >> >> > on top of Infinispan's Map/Reduce:
>> > >> >> >
>> > >> >> >
https://github.com/infinispan/infinispan/blob/master/query/src/main/java/...
>> > >> >> >
>> > >> >> > Note it's initialize method which injects needed
components; the
>> > >> >> > implementation is serialized across nodes.
>> > >> >> >
>> > >> >> > Sanne
>> > >> >> >
>> > >> >> > On 4 March 2013 18:26, Sanne Grinovero
<sanne(a)hibernate.org> wrote:
>> > >> >> > > We finished this discussion on IRC, in case
someone else was interested:
>> > >> >> > >
>> > >> >> > > <sanne> hum I forgot the first step..
transformation from entry into entity
>> > >> >> > > <sanne> updated
>> > >> >> > > <sanne> emmanuel, the "hidrate"
step is what DavideD is bashing is
>> > >> >> > > head against, but let's assume he finds a
workaround and we focus on
>> > >> >> > > the pattern as first step?
>> > >> >> > > <emmanuel>
https://gist.github.com/emmanuelbernard/5084039
>> > >> >> > > <emmanuel> sanne: ^ that's how I
would do it if I had an Iterator from the tuple
>> > >> >> > > <emmanuel> assuming pushToExecutor pushes
to whatever concurrent work
>> > >> >> > > mechanism you planned to use on consumes
>> > >> >> > > <emmanuel> Plus I am not folloing exactly
how you plan consumes(Entry)
>> > >> >> > > to be executed concurrently
>> > >> >> > > <emmanuel> is that the GridDialect
responsibility?
>> > >> >> > > <emmanuel> That looks like a lot of work
on the dialect's side
>> > >> >> > > <sanne> emmanuel, imagine the backend is
Infinispan and has some large
>> > >> >> > > amount of data per node, plus that each node
has its own backend
>> > >> >> > > IndexManager (like and ideal sharding)
>> > >> >> > > <emmanuel> ie pool mgt and cap +
queuing
>> > >> >> > > <sanne> then with your approach the
iterator needs to fetch data from
>> > >> >> > > all remote nodes, and then enqueue in a local
blocking queue which is
>> > >> >> > > returning the data to the original owners
>> > >> >> > > <sanne> but if you skip that step, you
can just forward the statless
>> > >> >> > > consumer to each node and have it run on data
locality
>> > >> >> > > <emmanuel> I was thinking that if you had
the luncene index locally on
>> > >> >> > > each node you would ahve a different impl of
the MassIndexer anyways
>> > >> >> > > <emmanuel> that would simply send a
command to each local node
>> > >> >> > > <sanne> To answer your question: that
would be an optional GridDialect
>> > >> >> > > responsibility. I would endorse a trivial first
draft doing a
>> > >> >> > > single-threaded loop.
>> > >> >> > > <emmanuel> and have
GridDialect.getDataFor() returnlocal data
>> > >> >> > > <sanne> The "consumes"
implementation can be either implemented with a
>> > >> >> > > simple iterator - as in your design - so I
don't think it pushes much
>> > >> >> > > complexity to the GridDialect implementor?
>> > >> >> > > <sanne> The benefit of the consumer is
that *optionally* it can be
>> > >> >> > > mapped on the Map phase, and that's trivial
if your backend supports
>> > >> >> > > Map/Reduce
>> > >> >> > > <emmanuel> sanne: I don't follow that
soory
>> > >> >> > > <emmanuel> how does that make it mappable
to the Map phase?
>> > >> >> > > <sanne> "public void consume(Entry
e) " is a degenerate (simplified)
>> > >> >> > > form of map.
>> > >> >> > > <sanne> mm infinispan IDE crashes at the
right moment.
>> > >> >> > > <emmanuel> I thought Map was about
*filtering*
>> > >> >> > > <emmanuel> not processing
>> > >> >> > > <sanne> you can decide to accept 100% of
values (without filtering),
>> > >> >> > > but actually you might want to filter on the
specified tables only.
>> > >> >> > > <sanne> also, the return type doesn't
have to match the input type:
>> > >> >> > > hence you define a transformation function,
which is inherently
>> > >> >> > > applied in parallel on all matching entries.
>> > >> >> > > <emmanuel> sanne: but then you require
the OGM code to be everywhere
>> > >> >> > > (ie on each node of the targetNoSQL
>> > >> >> > > <emmanuel> to eb able to do tuple ->
entity
>> > >> >> > > <emmanuel> that's not realistic
>> > >> >> > > <emmanuel> assuming your transform phase
is about tuple -> entity and
>> > >> >> > > some HSearch ops
>> > >> >> > > <sanne> yes right
>> > >> >> > > <sanne> but isn;t it worth it? it's
optional and much more efficient,
>> > >> >> > > as you avoid transferring any data.
>> > >> >> > > <sanne> btw we often assume all nodes in
the grid are equally
>> > >> >> > > configured, so having same apps & libraries
deployed.
>> > >> >> > > <emmanuel> sanne: let me try and
summarize what I understand
>> > >> >> > > <emmanuel> it's more efficient if you
store the Lucene index locally
>> > >> >> > > with the data, and if the grid is written in
Java or at least can run
>> > >> >> > > code in Java including libraries and if you
distribute the OGM
>> > >> >> > > configuration across the whole grid
>> > >> >> > > <emmanuel> Otherwise, it does not make
any difference
>> > >> >> > > <emmanuel> Also the GridDialect
implementation need to know if you are
>> > >> >> > > doing this trick to only return local data
>> > >> >> > > <sanne> no there are other drawbacks
which get defeated, but minor so
>> > >> >> > > I didn't mention them
>> > >> >> > > <emmanuel> am I right?
>> > >> >> > > <sanne> mainly, you skip the need for the
contentions point as there
>> > >> >> > > is no push to a shared blocking queue
>> > >> >> > > <sanne> no the GridDialect doesn't
need to know.
>> > >> >> > > <emmanuel> sanne: sure if you can process
the code on each node you
>> > >> >> > > avoid the shared blocking queue, at lest until
you reach the
>> > >> >> > > IndexManager
>> > >> >> > > <sanne> you'll just forward a simple
(standard) M/R task, and it will
>> > >> >> > > need to execute it as always.
>> > >> >> > > <sanne> the IndexManager is parallel ;)
>> > >> >> > > <emmanuel> sanne: parallel on a single
node
>> > >> >> > > <sanne> yes, but no contentions points
other than the internal
>> > >> >> > > structure of the IW
>> > >> >> > > <emmanuel> I mean updating the index for
a given table is better done
>> > >> >> > > on a singlle node
>> > >> >> > > <sanne> IndexWriter
>> > >> >> > > <emmanuel> sorry I meant IndexWriter
>> > >> >> > > <emmanuel> ah but ou mention perfect
sharding
>> > >> >> > > <emmanuel> you need cosmological
alignment for this shit to happen
>> > >> >> > > <sanne> not if we plan for it :)
>> > >> >> > > <sanne> you might remember the changes to
Segments in the ISPN code,
>> > >> >> > > to accomodate index storage consistent with the
data locality
>> > >> >> > > <sanne> that's expected in 6.0
>> > >> >> > > <emmanuel> So
gridDialect.getData(Consumer consumer, String.. tables) is wrong
>> > >> >> > > <emmanuel> it's more
gridDialect.getData(ConsumerImpl.class, String... tables)
>> > >> >> > > <emmanuel> as you ened to send the
Comsumer impl
>> > >> >> > > <emmanuel> not simply use it
>> > >> >> > > <sanne> hu, it needs a reference to the
current SearchFactory at very least
>> > >> >> > > <emmanuel> sanne: but you're telling
me you send the M/R task
>> > >> >> > > <emmanuel> so you need to send the M/R
code as well
>> > >> >> > > <sanne> yes but here we enter Infinspan
specific implementation
>> > >> >> > > <sanne> I would register the needed
components in Infinispan and use
>> > >> >> > > the ServiceRegistry to look them up remotely
>> > >> >> > > <sanne> not to mention Infinispan could
accomodate a custom command for it
>> > >> >> > > <emmanuel> What I am saying is that you
don't pass the Consumer
>> > >> >> > > *instance* tot he grid dialect but rather the
impl, no?
>> > >> >> > > <sanne> the impl class definition?
>> > >> >> > > <emmanuel> sanne: you tell me. How do I
send M/R code today?
>> > >> >> > > <emmanuel> certainly not an impl
instance
>> > >> >> > > <sanne> yes you do
>> > >> >> > > <sanne> JBMar will take care of it,
including state.
>> > >> >> > > <sanne> but in this case that would be
wrong of course as I don't want
>> > >> >> > > to serialize the whole SearchFactory so I'd
use injection and lookup,
>> > >> >> > > but that's a detail of Infinispan.
>> > >> >> > > <sanne> But this shouldn't be
MassIndexer specific right? it's good to
>> > >> >> > > expose a general "execute on all"
method, and I think accepting
>> > >> >> > > instances would make life easier for most -
even though we might need
>> > >> >> > > to document some limitations.
>> > >> >> > > <emmanuel> alright, I guess 'll have
to live with a visitor pattern
>> > >> >> > > for a feature that has 5% chance of happening
:)
>> > >> >> > > <sanne> I'm going to punch Davide
>> > >> >> > > <sanne> as he's yelling
"it's not a visitor" but doesn't have the guts
>> > >> >> > > to write it down :)
>> > >> >> > > <emmanuel> sanne: DavideD 's would
have nothing to do about it, that's
>> > >> >> > > requires a lot of config and Infinispan
machinery I'm not sure is here
>> > >> >> > > today
>> > >> >> > > <DavideD> :)
>> > >> >> > > <emmanuel> ah
>> > >> >> > > <emmanuel> I don't care how it's
called, it's one of those patterns
>> > >> >> > > that make the code harder to follow
>> > >> >> > > <DavideD> I was actually trying to
remember the name of the pattern
>> > >> >> > > <sanne> ok now we agree :)
>> > >> >> > > <emmanuel> Obfuscator pattern family
>> > >> >> > > <sanne> very popular among consultants, I
don't understand why you complain :P
>> > >> >> > > <sanne> Anyway, let's wrap up and
broaden the horizon:
>> > >> >> > > <emmanuel> ok so we are left with findin
to to load a entity from a tuple
>> > >> >> > > <sanne> you don't think it's
useful as a general purpose method?
>> > >> >> > > <emmanuel> sanne: wil be for queries
>> > >> >> > > <emmanuel> It's just that it's
non obvious
>> > >> >> > > <sanne> Exactly. Also I think lambda
methods are getting widely better known.
>> > >> >> > > <emmanuel> syntactically yes
>> > >> >> > > <emmanuel> VM wise, perf improvements
will come later
>> > >> >> > > <sanne> what I mean is that by defining
the SPI this way, I don't
>> > >> >> > > expect it to be more complex for the
GridDialect implementors, while
>> > >> >> > > we can reuse it for a wider scope of needs.
>> > >> >> > >
>> > >> >> > > --Sanne
>> > >> >> > >
>> > >> >> > > On 4 March 2013 17:02, Emmanuel Bernard
<emmanuel(a)hibernate.org> wrote:
>> > >> >> > >>
>> > >> >> > >>
>> > >> >> > >> On 4 mars 2013, at 17:39, Sanne Grinovero
<sanne(a)hibernate.org> wrote:
>> > >> >> > >>
>> > >> >> > >>> On 4 March 2013 16:20, Emmanuel Bernard
<emmanuel(a)hibernate.org> wrote:
>> > >> >> > >>>> I already gave what I knew on how
to load an entity from a tuple (which
>> > >> >> > >>>> isn't much) but we can try and
dig together. Something I thought about
>> > >> >> > >>>> is that ORM probably has a
mechanism to load an entity from a resultset
>> > >> >> > >>>> via the query parser. And that
probably looks also like the second half
>> > >> >> > >>>> of OgmLoader.load. We could look at
this part and see if we can make an
>> > >> >> > >>>> OGM version of it. We never had the
need before as we never had query
>> > >> >> > >>>> support (the way SQL does it).
>> > >> >> > >>>
>> > >> >> > >>> I would also need to study the ORM
code, but to add a high level observation,
>> > >> >> > >>> the methods currently defined by the
GridDialect are focusing on
>> > >> >> > >>> loading from well known key instances,
>> > >> >> > >>> there is nothing to makes us able to
scan/inspect for all values.
>> > >> >> > >>>
>> > >> >> > >>> In other words: even if we wanted to
load keys first, we don't have definitions
>> > >> >> > >>> of functions from raw->primary key
instances either.
>> > >> >> > >>
>> > >> >> > >> I understand that. I'm not denying the
need for the method.
>> > >> >> > >>
>> > >> >> > >>>
>> > >> >> > >>>
>> > >> >> > >>>> On the visitor vs Iterator
approach, I still don't see how implementing
>> > >> >> > >>>> an Iterator on a map / reduce
backend would be harder than the visitor
>> > >> >> > >>>> but maybe I'm missing
something.
>> > >> >> > >>>>
>> > >> >> > >>>> class IteratorAsStream {
>> > >> >> > >>>> final Query
someMapReduceQuery = ...;
>> > >> >> > >>>>
>> > >> >> > >>>> public Object next() {
>> > >> >> > >>>> if
(!someMapReduceQuery.started()) {
>> > >> >> > >>>> // execute and
collect results in parallel
>> > >> >> > >>>>
someMapReduceQuery.execute();
>> > >> >> > >>>> }
>> > >> >> > >>>> Object result =
someMapReduce.getNextOrBlock();
>> > >> >> > >>>> return result;
>> > >> >> > >>>> }
>> > >> >> > >>>> }
>> > >> >> > >>>
>> > >> >> > >>> That could work to *load* all entities
in parallel, but I'd like to
>> > >> >> > >>> process the entities in parallel as
well.
>> > >> >> > >>> And I'd rather not force the
GridDialect implementors to write some
>> > >> >> > >>> Hibernate Search specific code,
>> > >> >> > >>> so to break out we need some form of
"Execute X on each": a closure or a lambda.
>> > >> >> > >>>
>> > >> >> > >>
>> > >> >> > >> I can't see how the visitor model helps
in your processing of entities in parallel. To me both approaches are strictly equivalent.
Care to show some pseudo-code?
>> > >> >> _______________________________________________
>> > >> >> hibernate-dev mailing list
>> > >> >> hibernate-dev(a)lists.jboss.org
>> > >> >>
https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> > > _______________________________________________
>> > > hibernate-dev mailing list
>> > > hibernate-dev(a)lists.jboss.org
>> > >
https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/hibernate-dev