Re: [hibernate-dev] [OGM] Ogm mass indexer, how to convert Tuple/EntityKey to Entity/Id?

Thursday, 7 March 2013

I have no more coin for this one so I have dumped what I have so far
https://github.com/hibernate/hibernate-ogm/pull/175

Emmanuel

On Wed 2013-03-06 19:18, Emmanuel Bernard wrote:
...
 I've successfully implemented OGM-151 for EntityKey which is the
one we
 need to move OGM-273 forward for now.
 I am trying to implement it for AssociationKey but caching here is
 significantly harder as data is cross reference across associations.

 Sanne, when you worked on the profiling of OGM, do you remember
 AssociationKey putting a pressure in build time or memory wise? Because
 caching them per persister means some rather complex race conditions and
 more memory used permanently (as opposed to on demand).

 So I'm wondering if that's worth it. As an intermediary step, I could
 introduce AssociationKeyMetadata but build it on-demand - that one is
 easier to achieve.

 Emmanuel

 On Wed 2013-03-06 15:32, Davide D'Alto wrote:
 > it's ok for me
 > 
 > Davide
 > 
 > On Wed, Mar 6, 2013 at 3:28 PM, Emmanuel Bernard <emmanuel(a)hibernate.org&gt;
wrote:
 > > I'm planning on working on OGM-151. Fine with everyone?
 > > That will likely be my last before I move back to BVAL and close the
 > > final issues there.
 > >
 > > Emmanuel
 > >
 > > On Tue 2013-03-05 19:04, Sanne Grinovero wrote:
 > >> Nice!
 > >> n+1 is something Hibernate Search has to deal with too, that's why I
 > >> was interested in the fetch profiles and graph loading in JPA 2.1
 > >>
 > >> On 5 March 2013 17:44, Emmanuel Bernard <emmanuel(a)hibernate.org&gt;
wrote:
 > >> > I have implemented a solution that gives an entity based on a tuple.
 > >> > https://hibernate.onjira.com/browse/OGM-273#comment-50082
 > >> >
 > >> > Note that it does not currently works for MongoDB, but that's
waiting
 > >> > for the dedicated GridDialect method as well as OGM-151.
 > >> > Also note that I have no idea how that will work for associations. I
 > >> > suspect some nasty n+1 is happening as best. Worse case, an exception
:)
 > >> >
 > >> > Emmanuel
 > >> >
 > >> > On Tue 2013-03-05 10:30, Emmanuel Bernard wrote:
 > >> >> We might hope for a stable enough contract on Hibernate Search
and
 > >> >> hope that we won't break serializability between micro or
minor
 > >> >> versions. That will need to be taken into account in the test
suite and
 > >> >> design.
 > >> >> On the OGM side though, we are not at that level of maturity and
we will
 > >> >> force homogenous Hibernate OGM version across all the cluster. The
grid
 > >> >> will have to go down for upgrades or enforce that no mpa reduce
job
 > >> >> using OGM is used while the version roll out is in process.
 > >> >>
 > >> >> Emmanuel
 > >> >>
 > >> >> On Mon 2013-03-04 18:30, Sanne Grinovero wrote:
 > >> >> > Found an example, this is all the code it needs to have a
MassIndexer working
 > >> >> > on top of Infinispan's Map/Reduce:
 > >> >> >
 > >> >> >
https://github.com/infinispan/infinispan/blob/master/query/src/main/java/...
 > >> >> >
 > >> >> > Note it's initialize method which injects needed
components; the
 > >> >> > implementation is serialized across nodes.
 > >> >> >
 > >> >> > Sanne
 > >> >> >
 > >> >> > On 4 March 2013 18:26, Sanne Grinovero
<sanne(a)hibernate.org&gt; wrote:
 > >> >> > > We finished this discussion on IRC, in case someone else
was interested:
 > >> >> > >
 > >> >> > > <sanne> hum I forgot the first step..
transformation from entry into entity
 > >> >> > > <sanne> updated
 > >> >> > > <sanne> emmanuel, the "hidrate" step is
what DavideD is bashing is
 > >> >> > > head against, but let's assume he finds a workaround
and we focus on
 > >> >> > > the pattern as first step?
 > >> >> > > <emmanuel>
https://gist.github.com/emmanuelbernard/5084039
 > >> >> > > <emmanuel> sanne: ^ that's how I would do it
if I had an Iterator from the tuple
 > >> >> > > <emmanuel> assuming pushToExecutor pushes to
whatever concurrent work
 > >> >> > > mechanism you planned to use on consumes
 > >> >> > > <emmanuel> Plus I am not folloing exactly how you
plan consumes(Entry)
 > >> >> > > to be executed concurrently
 > >> >> > > <emmanuel> is that the GridDialect
responsibility?
 > >> >> > > <emmanuel> That looks like a lot of work on the
dialect's side
 > >> >> > > <sanne> emmanuel, imagine the backend is
Infinispan and has some large
 > >> >> > > amount of data per node, plus that each node has its own
backend
 > >> >> > > IndexManager (like and ideal sharding)
 > >> >> > > <emmanuel> ie pool mgt and cap +  queuing
 > >> >> > > <sanne> then with your approach the iterator needs
to fetch data from
 > >> >> > > all remote nodes, and then enqueue in a local blocking
queue which is
 > >> >> > > returning the data to the original owners
 > >> >> > > <sanne> but if you skip that step, you can just
forward the statless
 > >> >> > > consumer to each node and have it run on data locality
 > >> >> > > <emmanuel> I was thinking that if you had the
luncene index locally on
 > >> >> > > each node you would ahve a different impl of the
MassIndexer anyways
 > >> >> > > <emmanuel> that would simply send a command to
each local node
 > >> >> > > <sanne> To answer your question: that would be an
optional GridDialect
 > >> >> > > responsibility. I would endorse a trivial first draft
doing a
 > >> >> > > single-threaded loop.
 > >> >> > > <emmanuel> and have GridDialect.getDataFor()
returnlocal data
 > >> >> > > <sanne> The "consumes" implementation
can be either implemented with a
 > >> >> > > simple iterator - as in your design - so I don't
think it pushes much
 > >> >> > > complexity to the GridDialect implementor?
 > >> >> > > <sanne> The benefit of the consumer is that
*optionally* it can be
 > >> >> > > mapped on the Map phase, and that's trivial if your
backend supports
 > >> >> > > Map/Reduce
 > >> >> > > <emmanuel> sanne: I don't follow that soory
 > >> >> > > <emmanuel> how does that make it mappable to the
Map phase?
 > >> >> > > <sanne> "public void consume(Entry e) "
is a degenerate (simplified)
 > >> >> > > form of map.
 > >> >> > > <sanne> mm infinispan IDE crashes at the right
moment.
 > >> >> > > <emmanuel> I thought Map was about *filtering*
 > >> >> > > <emmanuel> not processing
 > >> >> > > <sanne> you can decide to accept 100% of values
(without filtering),
 > >> >> > > but actually you might want to filter on the specified
tables only.
 > >> >> > > <sanne> also, the return type doesn't have to
match the input type:
 > >> >> > > hence you define a transformation function, which is
inherently
 > >> >> > > applied in parallel on all matching entries.
 > >> >> > > <emmanuel> sanne: but then you require the OGM
code to be everywhere
 > >> >> > > (ie on each node of the targetNoSQL
 > >> >> > > <emmanuel> to eb able to do tuple -> entity
 > >> >> > > <emmanuel> that's not realistic
 > >> >> > > <emmanuel> assuming your transform phase is about
tuple -> entity and
 > >> >> > > some HSearch ops
 > >> >> > > <sanne> yes right
 > >> >> > > <sanne> but isn;t it worth it? it's optional
and much more efficient,
 > >> >> > > as you avoid transferring any data.
 > >> >> > > <sanne> btw we often assume all nodes in the grid
are equally
 > >> >> > > configured, so having same apps & libraries
deployed.
 > >> >> > > <emmanuel> sanne: let me try and summarize what I
understand
 > >> >> > > <emmanuel> it's more efficient if you store
the Lucene index locally
 > >> >> > > with the data, and if the grid is written in Java or at
least can run
 > >> >> > > code in Java including libraries and if you distribute
the OGM
 > >> >> > > configuration across the whole grid
 > >> >> > > <emmanuel> Otherwise, it does not make any
difference
 > >> >> > > <emmanuel> Also the GridDialect implementation
need to know if you are
 > >> >> > > doing this trick to only return local data
 > >> >> > > <sanne> no there are other drawbacks which get
defeated, but minor so
 > >> >> > > I didn't mention them
 > >> >> > > <emmanuel> am I right?
 > >> >> > > <sanne> mainly, you skip the need for the
contentions point as there
 > >> >> > > is no push to a shared blocking queue
 > >> >> > > <sanne> no the GridDialect doesn't need to
know.
 > >> >> > > <emmanuel> sanne: sure if you can process the code
on each node you
 > >> >> > > avoid the shared blocking queue, at lest until you reach
the
 > >> >> > > IndexManager
 > >> >> > > <sanne> you'll just forward a simple
(standard) M/R task, and it will
 > >> >> > > need to execute it as always.
 > >> >> > > <sanne> the IndexManager is parallel ;)
 > >> >> > > <emmanuel> sanne: parallel on a single node
 > >> >> > > <sanne> yes, but no contentions points other than
the internal
 > >> >> > > structure of the IW
 > >> >> > > <emmanuel> I mean updating the index for a given
table is better done
 > >> >> > > on a singlle node
 > >> >> > > <sanne> IndexWriter
 > >> >> > > <emmanuel> sorry I meant IndexWriter
 > >> >> > > <emmanuel> ah but ou mention perfect sharding
 > >> >> > > <emmanuel> you need cosmological alignment for
this shit to happen
 > >> >> > > <sanne> not if we plan for it :)
 > >> >> > > <sanne> you might remember the changes to Segments
in the ISPN code,
 > >> >> > > to accomodate index storage consistent with the data
locality
 > >> >> > > <sanne> that's expected in 6.0
 > >> >> > > <emmanuel> So gridDialect.getData(Consumer
consumer, String.. tables) is wrong
 > >> >> > > <emmanuel> it's more
gridDialect.getData(ConsumerImpl.class, String... tables)
 > >> >> > > <emmanuel> as you ened to send the Comsumer impl
 > >> >> > > <emmanuel> not simply use it
 > >> >> > > <sanne> hu, it needs a reference to the current
SearchFactory at very least
 > >> >> > > <emmanuel> sanne: but you're telling me you
send the M/R task
 > >> >> > > <emmanuel> so you need to send the M/R code as
well
 > >> >> > > <sanne> yes but here we enter Infinspan specific
implementation
 > >> >> > > <sanne> I would register the needed components in
Infinispan and use
 > >> >> > > the ServiceRegistry to look them up remotely
 > >> >> > > <sanne> not to mention Infinispan could accomodate
a custom command for it
 > >> >> > > <emmanuel> What I am saying is that you don't
pass the Consumer
 > >> >> > > *instance* tot he grid dialect but rather the impl, no?
 > >> >> > > <sanne> the impl class definition?
 > >> >> > > <emmanuel> sanne: you tell me. How do I send M/R
code today?
 > >> >> > > <emmanuel> certainly not an impl instance
 > >> >> > > <sanne> yes you do
 > >> >> > > <sanne> JBMar will take care of it, including
state.
 > >> >> > > <sanne> but in this case that would be wrong of
course as I don't want
 > >> >> > > to serialize the whole SearchFactory so I'd use
injection and lookup,
 > >> >> > > but that's a detail of Infinispan.
 > >> >> > > <sanne> But this shouldn't be MassIndexer
specific right? it's good to
 > >> >> > > expose a general "execute on all" method, and
I think accepting
 > >> >> > > instances would make life easier for most - even though
we might need
 > >> >> > > to document some limitations.
 > >> >> > > <emmanuel> alright, I guess 'll have to live
with a visitor pattern
 > >> >> > > for a feature that has 5% chance of happening :)
 > >> >> > > <sanne> I'm going to punch Davide
 > >> >> > > <sanne> as he's yelling "it's not a
visitor" but doesn't have the guts
 > >> >> > > to write it down :)
 > >> >> > > <emmanuel> sanne: DavideD 's would have
nothing to do about it, that's
 > >> >> > > requires a lot of config and Infinispan machinery
I'm not sure is here
 > >> >> > > today
 > >> >> > > <DavideD> :)
 > >> >> > > <emmanuel> ah
 > >> >> > > <emmanuel> I don't care how it's called,
it's one of those patterns
 > >> >> > > that make the code harder to follow
 > >> >> > > <DavideD> I was actually trying to remember the
name of the pattern
 > >> >> > > <sanne> ok now we agree :)
 > >> >> > > <emmanuel> Obfuscator pattern family
 > >> >> > > <sanne> very popular among consultants, I
don't understand why you complain :P
 > >> >> > > <sanne> Anyway, let's wrap up and broaden the
horizon:
 > >> >> > > <emmanuel> ok so we are left with findin to to
load a entity from a tuple
 > >> >> > > <sanne> you don't think it's useful as a
general purpose method?
 > >> >> > > <emmanuel> sanne: wil be for queries
 > >> >> > > <emmanuel> It's just that it's non
obvious
 > >> >> > > <sanne> Exactly. Also I think lambda methods are
getting widely better known.
 > >> >> > > <emmanuel> syntactically yes
 > >> >> > > <emmanuel> VM wise, perf improvements will come
later
 > >> >> > > <sanne> what I mean is that by defining the SPI
this way, I don't
 > >> >> > > expect it to be more complex for the GridDialect
implementors, while
 > >> >> > > we can reuse it for a wider scope of needs.
 > >> >> > >
 > >> >> > >  --Sanne
 > >> >> > >
 > >> >> > > On 4 March 2013 17:02, Emmanuel Bernard
<emmanuel(a)hibernate.org&gt; wrote:
 > >> >> > >>
 > >> >> > >>
 > >> >> > >> On 4 mars 2013, at 17:39, Sanne Grinovero
<sanne(a)hibernate.org&gt; wrote:
 > >> >> > >>
 > >> >> > >>> On 4 March 2013 16:20, Emmanuel Bernard
<emmanuel(a)hibernate.org&gt; wrote:
 > >> >> > >>>> I already gave what I knew on how to load an
entity from a tuple (which
 > >> >> > >>>> isn't much) but we can try and dig
together. Something I thought about
 > >> >> > >>>> is that ORM probably has a mechanism to load
an entity from a resultset
 > >> >> > >>>> via the query parser. And that probably
looks also like the second half
 > >> >> > >>>> of OgmLoader.load. We could look at this
part and see if we can make an
 > >> >> > >>>> OGM version of it. We never had the need
before as we never had query
 > >> >> > >>>> support (the way SQL does it).
 > >> >> > >>>
 > >> >> > >>> I would also need to study the ORM code, but to
add a high level observation,
 > >> >> > >>> the methods currently defined by the GridDialect
are focusing on
 > >> >> > >>> loading from well known key instances,
 > >> >> > >>> there is nothing to makes us able to
scan/inspect for all values.
 > >> >> > >>>
 > >> >> > >>> In other words: even if we wanted to load keys
first, we don't have definitions
 > >> >> > >>> of functions from raw->primary key instances
either.
 > >> >> > >>
 > >> >> > >> I understand that. I'm not denying the need for
the method.
 > >> >> > >>
 > >> >> > >>>
 > >> >> > >>>
 > >> >> > >>>> On the visitor vs Iterator approach, I still
don't see how implementing
 > >> >> > >>>> an Iterator on a map / reduce backend would
be harder than the visitor
 > >> >> > >>>> but maybe I'm missing something.
 > >> >> > >>>>
 > >> >> > >>>>    class IteratorAsStream {
 > >> >> > >>>>        final Query someMapReduceQuery =
...;
 > >> >> > >>>>
 > >> >> > >>>>        public Object next() {
 > >> >> > >>>>            if
(!someMapReduceQuery.started()) {
 > >> >> > >>>>                // execute and collect
results in parallel
 > >> >> > >>>>               
someMapReduceQuery.execute();
 > >> >> > >>>>            }
 > >> >> > >>>>            Object result =
someMapReduce.getNextOrBlock();
 > >> >> > >>>>            return result;
 > >> >> > >>>>        }
 > >> >> > >>>>    }
 > >> >> > >>>
 > >> >> > >>> That could work to *load* all entities in
parallel, but I'd like to
 > >> >> > >>> process the entities in parallel as well.
 > >> >> > >>> And I'd rather not force the GridDialect
implementors to write some
 > >> >> > >>> Hibernate Search specific code,
 > >> >> > >>> so to break out we need some form of
"Execute X on each": a closure or a lambda.
 > >> >> > >>>
 > >> >> > >>
 > >> >> > >> I can't see how the visitor model helps in your
processing of entities in parallel. To me both approaches are strictly equivalent. Care to
show some pseudo-code?
 > >> >> _______________________________________________
 > >> >> hibernate-dev mailing list
 > >> >> hibernate-dev(a)lists.jboss.org
 > >> >> https://lists.jboss.org/mailman/listinfo/hibernate-dev
 > > _______________________________________________
 > > hibernate-dev mailing list
 > > hibernate-dev(a)lists.jboss.org
 > > https://lists.jboss.org/mailman/listinfo/hibernate-dev
 _______________________________________________
 hibernate-dev mailing list
 hibernate-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] [OGM] Ogm mass indexer, how to convert Tuple/EntityKey to Entity/Id?