The Hibernate Search / ORM approach does iterate on the primary keys to get a
consistent snapshot of the state to be reindexed, but subsequent phases avoid
the "iterator" approach as it makes parallel execution very hard.
With OGM/Infinispan I think the natural solution is to use Map/Reduce, and
that would be simpler than the multiple-phases (stream) approach we
are forced to use on ORM.
Depending of the underlying OGM backend, some might be able to support an
efficient Map/Reduce operation, some other might have different approaches so
the interface proposed by Davide is to provide something that could be
implemented
by each backend "optimally": we avoid expectation of all backends to support
Map/Reduce directly, but to provide at least some form of "iteration"
(which is not
an Iterator) of all data.
Indeed the GridDialect would need to work on "Tuples", while Hibernate Search
only digests entities, so the consumer of this GridDialect would need to use
the OGM mapping engine itself to perform the transformation; but this is again
code that needs to be coded only once and can be shared across backends.
Davide needs advice to transform the Tuple into entities; he could use a Session
and transform keys, but given the nature of our backends it seems more suited
to iterate on the data directly rather than iterate on the keys only.
Our idea is not to "feed" the existing MassIndexer implementation but
to implement
a new one, which shares the same last phase (consumption of Lucene Documents
from multithreaded producers); this would be an extremely trivial one-phase
processor invoking the DocumentBuilder, provided we have some way to have
the GridDialect expose (to avoid "iterate") all data.
Sanne
On 4 March 2013 10:50, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote:
The mass indexer does not work at the resultset level so mixing
tuples
and mass indexer seems wrong to me.
Have you considered something like
Iterator<Tuple> getAllTuplesFrom(String... tableNames);
And then expose an Iterator<Object> (ie entities) to the mass indexer?
I mean we could make it work with your proposed consumer scheme but I
find it unnecessarily complex and it might make stream / flow style
processing impossible. I can be wrong but I'd like to see your arguments
first.
OgmLoader.getRowFromResultSet shows how to get a Object[] from a Tuple.
OgmLoader.getRow is at the heart of it.
But the process of initializing an entity involves several phases, so
the best bet is to look at OgmLoader.load and look at what happens
globally.
In the end, to answer your question, there is no method to do what
you want today, it's more or less the bottom half of OgmLoader.load.
What about associations BTW?
On Fri 2013-03-01 15:00, Davide D'Alto wrote:
> Hello,
> I'm trying to create a mass indexer that could work with OGM.
> The idea is to have a way to scan all the element of a certain type in
> the data store and index them, this way it would be possible to create
> an index starting from an existing populated data store.
>
> The first prototype idea is to add a method to the GridDialect, something like:
>
> GridDialect#forEachTuple(Consumer consumer, String... tableName)
>
> Where the Consumer is an interface with a method Consumer#consume(Tuple tuple)
>
> The consumer will execute the indexing of the found tuple.
>
> The problem that I have now is how to convert the Tuple to the
> corresponding entity so that I can index it using hibernate search.
> An alternative idea would be to use the EntityKey and obtain the id
> instead of using the Tuple.
>
> Is there a method somewhere that I can use to obtain an entity from a Tuple?
>
> Thanks,
> Davide
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/hibernate-dev
_______________________________________________
hibernate-dev mailing list
hibernate-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev