Re: [hibernate-dev] [OGM] Ogm mass indexer, how to convert Tuple/EntityKey to Entity/Id?

Wednesday, 6 March 2013

it's ok for me

Davide

On Wed, Mar 6, 2013 at 3:28 PM, Emmanuel Bernard <emmanuel(a)hibernate.org&gt; wrote:
...
 I'm planning on working on OGM-151. Fine with everyone?
 That will likely be my last before I move back to BVAL and close the
 final issues there.

 Emmanuel

 On Tue 2013-03-05 19:04, Sanne Grinovero wrote:
> Nice!
> n+1 is something Hibernate Search has to deal with too, that's why I
> was interested in the fetch profiles and graph loading in JPA 2.1
>
> On 5 March 2013 17:44, Emmanuel Bernard <emmanuel(a)hibernate.org&gt; wrote:
> > I have implemented a solution that gives an entity based on a tuple.
> > https://hibernate.onjira.com/browse/OGM-273#comment-50082
> >
> > Note that it does not currently works for MongoDB, but that's waiting
> > for the dedicated GridDialect method as well as OGM-151.
> > Also note that I have no idea how that will work for associations. I
> > suspect some nasty n+1 is happening as best. Worse case, an exception :)
> >
> > Emmanuel
> >
> > On Tue 2013-03-05 10:30, Emmanuel Bernard wrote:
> >> We might hope for a stable enough contract on Hibernate Search and
> >> hope that we won't break serializability between micro or minor
> >> versions. That will need to be taken into account in the test suite and
> >> design.
> >> On the OGM side though, we are not at that level of maturity and we will
> >> force homogenous Hibernate OGM version across all the cluster. The grid
> >> will have to go down for upgrades or enforce that no mpa reduce job
> >> using OGM is used while the version roll out is in process.
> >>
> >> Emmanuel
> >>
> >> On Mon 2013-03-04 18:30, Sanne Grinovero wrote:
> >> > Found an example, this is all the code it needs to have a MassIndexer
working
> >> > on top of Infinispan's Map/Reduce:
> >> >
> >> >
https://github.com/infinispan/infinispan/blob/master/query/src/main/java/...
> >> >
> >> > Note it's initialize method which injects needed components; the
> >> > implementation is serialized across nodes.
> >> >
> >> > Sanne
> >> >
> >> > On 4 March 2013 18:26, Sanne Grinovero <sanne(a)hibernate.org&gt;
wrote:
> >> > > We finished this discussion on IRC, in case someone else was
interested:
> >> > >
> >> > > <sanne> hum I forgot the first step.. transformation from
entry into entity
> >> > > <sanne> updated
> >> > > <sanne> emmanuel, the "hidrate" step is what
DavideD is bashing is
> >> > > head against, but let's assume he finds a workaround and we
focus on
> >> > > the pattern as first step?
> >> > > <emmanuel> https://gist.github.com/emmanuelbernard/5084039
> >> > > <emmanuel> sanne: ^ that's how I would do it if I had an
Iterator from the tuple
> >> > > <emmanuel> assuming pushToExecutor pushes to whatever
concurrent work
> >> > > mechanism you planned to use on consumes
> >> > > <emmanuel> Plus I am not folloing exactly how you plan
consumes(Entry)
> >> > > to be executed concurrently
> >> > > <emmanuel> is that the GridDialect responsibility?
> >> > > <emmanuel> That looks like a lot of work on the
dialect's side
> >> > > <sanne> emmanuel, imagine the backend is Infinispan and has
some large
> >> > > amount of data per node, plus that each node has its own backend
> >> > > IndexManager (like and ideal sharding)
> >> > > <emmanuel> ie pool mgt and cap +  queuing
> >> > > <sanne> then with your approach the iterator needs to fetch
data from
> >> > > all remote nodes, and then enqueue in a local blocking queue which
is
> >> > > returning the data to the original owners
> >> > > <sanne> but if you skip that step, you can just forward the
statless
> >> > > consumer to each node and have it run on data locality
> >> > > <emmanuel> I was thinking that if you had the luncene index
locally on
> >> > > each node you would ahve a different impl of the MassIndexer
anyways
> >> > > <emmanuel> that would simply send a command to each local
node
> >> > > <sanne> To answer your question: that would be an optional
GridDialect
> >> > > responsibility. I would endorse a trivial first draft doing a
> >> > > single-threaded loop.
> >> > > <emmanuel> and have GridDialect.getDataFor() returnlocal
data
> >> > > <sanne> The "consumes" implementation can be
either implemented with a
> >> > > simple iterator - as in your design - so I don't think it
pushes much
> >> > > complexity to the GridDialect implementor?
> >> > > <sanne> The benefit of the consumer is that *optionally* it
can be
> >> > > mapped on the Map phase, and that's trivial if your backend
supports
> >> > > Map/Reduce
> >> > > <emmanuel> sanne: I don't follow that soory
> >> > > <emmanuel> how does that make it mappable to the Map phase?
> >> > > <sanne> "public void consume(Entry e) " is a
degenerate (simplified)
> >> > > form of map.
> >> > > <sanne> mm infinispan IDE crashes at the right moment.
> >> > > <emmanuel> I thought Map was about *filtering*
> >> > > <emmanuel> not processing
> >> > > <sanne> you can decide to accept 100% of values (without
filtering),
> >> > > but actually you might want to filter on the specified tables
only.
> >> > > <sanne> also, the return type doesn't have to match the
input type:
> >> > > hence you define a transformation function, which is inherently
> >> > > applied in parallel on all matching entries.
> >> > > <emmanuel> sanne: but then you require the OGM code to be
everywhere
> >> > > (ie on each node of the targetNoSQL
> >> > > <emmanuel> to eb able to do tuple -> entity
> >> > > <emmanuel> that's not realistic
> >> > > <emmanuel> assuming your transform phase is about tuple
-> entity and
> >> > > some HSearch ops
> >> > > <sanne> yes right
> >> > > <sanne> but isn;t it worth it? it's optional and much
more efficient,
> >> > > as you avoid transferring any data.
> >> > > <sanne> btw we often assume all nodes in the grid are
equally
> >> > > configured, so having same apps & libraries deployed.
> >> > > <emmanuel> sanne: let me try and summarize what I
understand
> >> > > <emmanuel> it's more efficient if you store the Lucene
index locally
> >> > > with the data, and if the grid is written in Java or at least can
run
> >> > > code in Java including libraries and if you distribute the OGM
> >> > > configuration across the whole grid
> >> > > <emmanuel> Otherwise, it does not make any difference
> >> > > <emmanuel> Also the GridDialect implementation need to know
if you are
> >> > > doing this trick to only return local data
> >> > > <sanne> no there are other drawbacks which get defeated, but
minor so
> >> > > I didn't mention them
> >> > > <emmanuel> am I right?
> >> > > <sanne> mainly, you skip the need for the contentions point
as there
> >> > > is no push to a shared blocking queue
> >> > > <sanne> no the GridDialect doesn't need to know.
> >> > > <emmanuel> sanne: sure if you can process the code on each
node you
> >> > > avoid the shared blocking queue, at lest until you reach the
> >> > > IndexManager
> >> > > <sanne> you'll just forward a simple (standard) M/R
task, and it will
> >> > > need to execute it as always.
> >> > > <sanne> the IndexManager is parallel ;)
> >> > > <emmanuel> sanne: parallel on a single node
> >> > > <sanne> yes, but no contentions points other than the
internal
> >> > > structure of the IW
> >> > > <emmanuel> I mean updating the index for a given table is
better done
> >> > > on a singlle node
> >> > > <sanne> IndexWriter
> >> > > <emmanuel> sorry I meant IndexWriter
> >> > > <emmanuel> ah but ou mention perfect sharding
> >> > > <emmanuel> you need cosmological alignment for this shit to
happen
> >> > > <sanne> not if we plan for it :)
> >> > > <sanne> you might remember the changes to Segments in the
ISPN code,
> >> > > to accomodate index storage consistent with the data locality
> >> > > <sanne> that's expected in 6.0
> >> > > <emmanuel> So gridDialect.getData(Consumer consumer,
String.. tables) is wrong
> >> > > <emmanuel> it's more
gridDialect.getData(ConsumerImpl.class, String... tables)
> >> > > <emmanuel> as you ened to send the Comsumer impl
> >> > > <emmanuel> not simply use it
> >> > > <sanne> hu, it needs a reference to the current
SearchFactory at very least
> >> > > <emmanuel> sanne: but you're telling me you send the M/R
task
> >> > > <emmanuel> so you need to send the M/R code as well
> >> > > <sanne> yes but here we enter Infinspan specific
implementation
> >> > > <sanne> I would register the needed components in Infinispan
and use
> >> > > the ServiceRegistry to look them up remotely
> >> > > <sanne> not to mention Infinispan could accomodate a custom
command for it
> >> > > <emmanuel> What I am saying is that you don't pass the
Consumer
> >> > > *instance* tot he grid dialect but rather the impl, no?
> >> > > <sanne> the impl class definition?
> >> > > <emmanuel> sanne: you tell me. How do I send M/R code
today?
> >> > > <emmanuel> certainly not an impl instance
> >> > > <sanne> yes you do
> >> > > <sanne> JBMar will take care of it, including state.
> >> > > <sanne> but in this case that would be wrong of course as I
don't want
> >> > > to serialize the whole SearchFactory so I'd use injection and
lookup,
> >> > > but that's a detail of Infinispan.
> >> > > <sanne> But this shouldn't be MassIndexer specific
right? it's good to
> >> > > expose a general "execute on all" method, and I think
accepting
> >> > > instances would make life easier for most - even though we might
need
> >> > > to document some limitations.
> >> > > <emmanuel> alright, I guess 'll have to live with a
visitor pattern
> >> > > for a feature that has 5% chance of happening :)
> >> > > <sanne> I'm going to punch Davide
> >> > > <sanne> as he's yelling "it's not a
visitor" but doesn't have the guts
> >> > > to write it down :)
> >> > > <emmanuel> sanne: DavideD 's would have nothing to do
about it, that's
> >> > > requires a lot of config and Infinispan machinery I'm not sure
is here
> >> > > today
> >> > > <DavideD> :)
> >> > > <emmanuel> ah
> >> > > <emmanuel> I don't care how it's called, it's
one of those patterns
> >> > > that make the code harder to follow
> >> > > <DavideD> I was actually trying to remember the name of the
pattern
> >> > > <sanne> ok now we agree :)
> >> > > <emmanuel> Obfuscator pattern family
> >> > > <sanne> very popular among consultants, I don't
understand why you complain :P
> >> > > <sanne> Anyway, let's wrap up and broaden the horizon:
> >> > > <emmanuel> ok so we are left with findin to to load a entity
from a tuple
> >> > > <sanne> you don't think it's useful as a general
purpose method?
> >> > > <emmanuel> sanne: wil be for queries
> >> > > <emmanuel> It's just that it's non obvious
> >> > > <sanne> Exactly. Also I think lambda methods are getting
widely better known.
> >> > > <emmanuel> syntactically yes
> >> > > <emmanuel> VM wise, perf improvements will come later
> >> > > <sanne> what I mean is that by defining the SPI this way, I
don't
> >> > > expect it to be more complex for the GridDialect implementors,
while
> >> > > we can reuse it for a wider scope of needs.
> >> > >
> >> > >  --Sanne
> >> > >
> >> > > On 4 March 2013 17:02, Emmanuel Bernard
<emmanuel(a)hibernate.org&gt; wrote:
> >> > >>
> >> > >>
> >> > >> On 4 mars 2013, at 17:39, Sanne Grinovero
<sanne(a)hibernate.org&gt; wrote:
> >> > >>
> >> > >>> On 4 March 2013 16:20, Emmanuel Bernard
<emmanuel(a)hibernate.org&gt; wrote:
> >> > >>>> I already gave what I knew on how to load an entity
from a tuple (which
> >> > >>>> isn't much) but we can try and dig together.
Something I thought about
> >> > >>>> is that ORM probably has a mechanism to load an entity
from a resultset
> >> > >>>> via the query parser. And that probably looks also
like the second half
> >> > >>>> of OgmLoader.load. We could look at this part and see
if we can make an
> >> > >>>> OGM version of it. We never had the need before as we
never had query
> >> > >>>> support (the way SQL does it).
> >> > >>>
> >> > >>> I would also need to study the ORM code, but to add a high
level observation,
> >> > >>> the methods currently defined by the GridDialect are
focusing on
> >> > >>> loading from well known key instances,
> >> > >>> there is nothing to makes us able to scan/inspect for all
values.
> >> > >>>
> >> > >>> In other words: even if we wanted to load keys first, we
don't have definitions
> >> > >>> of functions from raw->primary key instances either.
> >> > >>
> >> > >> I understand that. I'm not denying the need for the
method.
> >> > >>
> >> > >>>
> >> > >>>
> >> > >>>> On the visitor vs Iterator approach, I still don't
see how implementing
> >> > >>>> an Iterator on a map / reduce backend would be harder
than the visitor
> >> > >>>> but maybe I'm missing something.
> >> > >>>>
> >> > >>>>    class IteratorAsStream {
> >> > >>>>        final Query someMapReduceQuery = ...;
> >> > >>>>
> >> > >>>>        public Object next() {
> >> > >>>>            if (!someMapReduceQuery.started()) {
> >> > >>>>                // execute and collect results in
parallel
> >> > >>>>                someMapReduceQuery.execute();
> >> > >>>>            }
> >> > >>>>            Object result =
someMapReduce.getNextOrBlock();
> >> > >>>>            return result;
> >> > >>>>        }
> >> > >>>>    }
> >> > >>>
> >> > >>> That could work to *load* all entities in parallel, but
I'd like to
> >> > >>> process the entities in parallel as well.
> >> > >>> And I'd rather not force the GridDialect implementors
to write some
> >> > >>> Hibernate Search specific code,
> >> > >>> so to break out we need some form of "Execute X on
each": a closure or a lambda.
> >> > >>>
> >> > >>
> >> > >> I can't see how the visitor model helps in your processing
of entities in parallel. To me both approaches are strictly equivalent. Care to show some
pseudo-code?
> >> _______________________________________________
> >> hibernate-dev mailing list
> >> hibernate-dev(a)lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/hibernate-dev
 _______________________________________________
 hibernate-dev mailing list
 hibernate-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] [OGM] Ogm mass indexer, how to convert Tuple/EntityKey to Entity/Id?