[hibernate-dev] [OGM] Ogm mass indexer, how to convert Tuple/EntityKey to Entity/Id?

Mon Mar 4 11:39:53 EST 2013

On 4 March 2013 16:20, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> I already gave what I knew on how to load an entity from a tuple (which
> isn't much) but we can try and dig together. Something I thought about
> is that ORM probably has a mechanism to load an entity from a resultset
> via the query parser. And that probably looks also like the second half
> of OgmLoader.load. We could look at this part and see if we can make an
> OGM version of it. We never had the need before as we never had query
> support (the way SQL does it).

I would also need to study the ORM code, but to add a high level observation,
the methods currently defined by the GridDialect are focusing on
loading from well known key instances,
there is nothing to makes us able to scan/inspect for all values.

In other words: even if we wanted to load keys first, we don't have definitions
of functions from raw->primary key instances either.

> On the visitor vs Iterator approach, I still don't see how implementing
> an Iterator on a map / reduce backend would be harder than the visitor
> but maybe I'm missing something.
>
>     class IteratorAsStream {
>         final Query someMapReduceQuery = ...;
>
>         public Object next() {
>             if (!someMapReduceQuery.started()) {
>                 // execute and collect results in parallel
>                 someMapReduceQuery.execute();
>             }
>             Object result = someMapReduce.getNextOrBlock();
>             return result;
>         }
>     }

That could work to *load* all entities in parallel, but I'd like to
process the entities in parallel as well.
And I'd rather not force the GridDialect implementors to write some
Hibernate Search specific code,
so to break out we need some form of "Execute X on each": a closure or a lambda.

Sanne

>
> On Mon 2013-03-04 14:33, Sanne Grinovero wrote:
>> The Hibernate Search / ORM approach does iterate on the primary keys to get a
>> consistent snapshot of the state to be reindexed, but subsequent phases avoid
>> the "iterator" approach as it makes parallel execution very hard.
>>
>> With OGM/Infinispan I think the natural solution is to use Map/Reduce, and
>> that would be simpler than the multiple-phases (stream) approach we
>> are forced to use on ORM.
>>
>> Depending of the underlying OGM backend, some might be able to support an
>> efficient Map/Reduce operation, some other might have different approaches so
>> the interface proposed by Davide is to provide something that could be
>> implemented
>> by each backend "optimally": we avoid expectation of all backends to support
>> Map/Reduce directly, but to provide at least some form of "iteration"
>> (which is not
>> an Iterator) of all data.
>>
>> Indeed the GridDialect would need to work on "Tuples", while Hibernate Search
>> only digests entities, so the consumer of this GridDialect would need to use
>> the OGM mapping engine itself to perform the transformation; but this is again
>> code that needs to be coded only once and can be shared across backends.
>>
>> Davide needs advice to transform the Tuple into entities; he could use a Session
>> and transform keys, but given the nature of our backends it seems more suited
>> to iterate on the data directly rather than iterate on the keys only.
>>
>> Our idea is not to "feed" the existing MassIndexer implementation but
>> to implement
>> a new one, which shares the same last phase (consumption of Lucene Documents
>> from multithreaded producers); this would be an extremely trivial one-phase
>> processor invoking the DocumentBuilder, provided we have some way to have
>> the GridDialect expose (to avoid "iterate") all data.
>>
>> Sanne
>>
>> On 4 March 2013 10:50, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>> > The mass indexer does not work at the resultset level so mixing tuples
>> > and mass indexer seems wrong to me.
>> >
>> > Have you considered something like
>> >
>> >     Iterator<Tuple> getAllTuplesFrom(String... tableNames);
>> >
>> > And then expose an Iterator<Object> (ie entities) to the mass indexer?
>> > I mean we could make it work with your proposed consumer scheme but I
>> > find it unnecessarily complex and it might make stream / flow style
>> > processing impossible. I can be wrong but I'd like to see your arguments
>> > first.
>> >
>> > OgmLoader.getRowFromResultSet shows how to get a Object[] from a Tuple.
>> > OgmLoader.getRow is at the heart of it.
>> >
>> > But the process of initializing an entity involves several phases, so
>> > the best bet is to look at OgmLoader.load and look at what happens
>> > globally.
>> >
>> > In the end, to answer your question, there is no method to do what
>> > you want today, it's more or less the bottom half of OgmLoader.load.
>> >
>> > What about associations BTW?
>> >
>> > On Fri 2013-03-01 15:00, Davide D'Alto wrote:
>> >> Hello,
>> >> I'm trying to create a mass indexer that could work with OGM.
>> >> The idea is to have a way to scan all the element of a certain type in
>> >> the data store and index them, this way it would be possible to create
>> >> an index starting from an existing populated data store.
>> >>
>> >> The first prototype idea is to add a method to the GridDialect, something like:
>> >>
>> >> GridDialect#forEachTuple(Consumer consumer, String... tableName)
>> >>
>> >> Where the Consumer is an interface with a method Consumer#consume(Tuple tuple)
>> >>
>> >> The consumer will execute the indexing of the found tuple.
>> >>
>> >> The problem that I have now is how to convert the Tuple to the
>> >> corresponding entity so that I can index it using hibernate search.
>> >> An alternative idea would be to use the EntityKey and obtain the id
>> >> instead of using the Tuple.
>> >>
>> >> Is there a method somewhere that I can use to obtain an entity from a Tuple?
>> >>
>> >> Thanks,
>> >> Davide
>> >> _______________________________________________
>> >> hibernate-dev mailing list
>> >> hibernate-dev at lists.jboss.org
>> >> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> > _______________________________________________
>> > hibernate-dev mailing list
>> > hibernate-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/hibernate-dev