This conversation is starting to get a bit complex, so I'll try to organize
my answers:
# Applying the same solution to HV and HSearch
@Emmanuel: right, I didn't see you were also talking about HV. I was only
considering the HSearch case.
I think I agree with you both, HV and HSearch are a bit different and we
certainly cannot share the whole code.
Some principles could probably be shared, such as the abstraction over
accessing the input type with Emmanuel's "StructureTraverser".
But the traversal algorithms are probably very different. And in fact,
these traversals are at the core of each project's purpose, so it may not
be a good idea to try to make them "more similar".
# The requirements for HSearch
@Emmanuel: we didn't take much notes, but we did draw a diagram of the
target architecture:
https://drive.google.com/a/redhat.com/file/d/0B_z-zSf_hJiZam
JkZFBlNG5CeDQ/view?usp=sharing
When you shared your recordings/pictures, I asked for the write permission
on the shared folder to put the diagram, but you probably haven't had time
yet.
If I remember correctly, here were the main requirements:
- Separate the source data traversal from the actual output format.
- This will help when implementing different indexing services
(Elasticsearch, Solr): we don't want to assume anything about the target
format.
- Make the implementation of JGroups/JMS as simple as possible.
- In these case, we don't really want to build documents, we just
want to transform the entity to a serializable object, and reduce the
information to transmit over the network to a minimum.
- Ideally, we'd just want to "record" the output of the traversal,
transmit this recording to the master node, and leave the master node
replay it to build a document. This would have the added benefit of not
requiring any knowledge of the underlying technology (Lucene/ES/Solr) on
the client side.
- Requirements on the "mapping tree" (I'm not absolutely sure about
those, Sanne may want to clarify):
- “depth” and navigational graph to be pre-computed: tree of valid
fields and options to be known in advance.
- Immutable, threadsafe, easy to inspect/walk mapping tree
- And on my end (I think Sanne shared this concern, but I may be
wrong): query metadata as little as possible at runtime.
# More info on my snippet
@Gunnar: you asked for some client code, but I'm not sure it'll be very
explicit. The only client-facing interface (as far as document building
goes) is EntityDocumentConverter.
So, the parts of the application that need to convert an entity to a
document will do something like that:
EntityDocumentConverter<E, D> converter = indexManager.getEntityDocument
Converter();
D document = converter.convert( entity );
indexManager.performOperation( newAddOperation( document ) );
The idea behind this was to make runtime code as simple as possible, and
move the complexity to the bootstrapping.
Basically, when you call converter.convert, it will delegate to
ValueProcessors, which will extract information from the entity and inject
it into the DocumentBuilder. What is extracted, and how to extract it, is
completely up to the ValueProcessor.
This means that, when bootstrapping, a tree of ValueProcessors will be
built according to the metadata. For instance when a @Field is encountered,
we build an appropriate ValueProcessor (potentially nesting multiple ones
if we want to keep matters separate: one for extracting the property's
value, one for transforming this value using a bridge). When an
@IndexedEmbedded is encountered, we build a different ValueProcessor. And
so on.
Here is an (admittedly very simple) example of what it'd look like in the
metadata processor;
List<ValueProcessor> collectedProcessors = new ArrayList<>();
for ( XProperty property : properties ) {
Field fieldAnnotation = property.getAnnotation( Field.class );
if ( fieldAnnotation != null ) {
ValueProcessor fieldBridgeProcessor = createFieldBridgeProcessor(
property.getType(), fieldAnnotation );
ValueProcessor propertyProcessor = new JavaPropertyProcessor(
property, fieldBridgeProcessor ); // The value of the property will be
passed to the fieldBridgeProcessor at runtime
collectedProcessor.add( propertyProcessor );
}
}
ValueProcessor rootProcessor = new CompositeProcessor(
collectedProcessors );
return new EntityDocumentConverter( rootProcessor,
indexManagerType.getDocumentBuilder() )
The actual code will obviously be more complex, first because we need to
support much more features than just @Field, but also because the
createFieldBridgeProcessor() method needs to somehow build backend-specific
metadata based on the nature of the field. But I think the snippet captures
the spirit.
# Summary
Thinking about it a little, there's a different focus in our solutions.
1. Emmanuel's solutions focuses on abstracting over the input data
format (thanks to StructureTraverser), assuming the traversal algorithm
will be re-implemented for each output type.
2. My solution focuses on abstracting over the output data format
(thanks to DocumentBuilder), assuming the traversal algorithm will be
re-implemented for each input type using different ValueProcessors.
3. Gunnar's solution seem to focus on abstracting over the output data
format, reimplementing the traversal algorithm for each input type using a
different TreeTraversalSequence.
Solution 1 and 2 are, in my opinion, compatible. We could have very generic
ValueProcessors that would make use of a StructureTraverser to extract data
and of a DocumentBuilder to inject it into a document. I'm not sure it is
necessary, because I expect metadata to be defined differently based on the
input type, and hence the traversal algorithms to be slightly different,
but I think we could do it.
About solution 3: TreeTraversalSequence seems to implement the traversal
algorithm, while TreeTraversalEventConsumer abstracts over the output
format and TreeTraversalEvent abstracts over the information being
transferred.
I think the general principles are more or less equivalent to solution 2.
The main difference are:
- How the context around the data to transfer is propagated.
In solution 2, we pass the context progressively by making call to the
DocumentBuilder (documentBuilder.nest(...), documentBuilder.addField(...)).
In solution 3, the context is explicitly modeled as a TreeTraversalEvent.
- How metadata is looked up.
In solution 2, the metadata is built in the objects implementing the
traversal algorithm, so there is no look up to speak of. In solution 3,
there is a metadata lookup for each node in the tree.
Maybe there's a matter of performance, but I don't know enough about this
to give a definitive answer. In the end it's probably more a matter of
taste.
Yoann Rodière <yoann(a)hibernate.org>
Hibernate NoORM Team
On 7 February 2017 at 11:17, Gunnar Morling <gunnar(a)hibernate.org> wrote:
Emmanuel,
In your PoC, how would a complete tree-like structure be traversed?
It's not clear to me, who is driving StructureTraverser, i.e. which
component will call processSubstructureInContainer() et al. when
traversing an entire tree.
@Yoann, maybe you can add a usage example similar to Emmanuel's? You
have a lot of framework code, but I'm not sure about how it'd be used.
For Hibernate Search, the traversal pattern I implemented for the
ScenicView PoC may be of interest. Its general idea is to represent a
tree traversal as a sequence of events which a traverser
implementation receives and can act on, e.g. to create a corresponding
de-normalized structure, Lucene document etc. The retrieval of values
and associated objects happens lazily as the traverser
("TreeTraversalEventConsumer" in my lingo) pulls events from the
sequence, similar to what some XML parsers do.
The main contract can be found at [1]. There are two event sequence
implements, one based on Hibernate's meta-model [2] and one for
java.util.Map [3]. An example event consumer implementation which
creates MongoDB documents can be found at [4].
As said I think it'd nicely fit for Hibernate Search, for HV I'm not
so sure. The reason being that the order of traversal may very,
depending on the defined validation groups and sequences. Sometimes we
need to go "depth first". I've been contemplating to employ an
event-like approach as described above for HV, but it may look
different than the one used for HSEARCH.
--Gunnar
[1]
https://github.com/gunnarmorling/scenicview-mvp/
blob/master/core/src/main/java/org/hibernate/scenicview/spi/backend/model/
TreeTraversalSequence.java.
[2]
https://github.com/gunnarmorling/scenicview-mvp/
blob/master/core/src/main/java/org/hibernate/scenicview/internal/model/
EntityStateBasedTreeTraversalSequence.java
[3]
https://github.com/gunnarmorling/scenicview-mvp/
blob/master/core/src/test/java/org/hibernate/scenicview/test/traversal/
MapTreeTraversalSequence.java
[4]
https://github.com/gunnarmorling/scenicview-mvp/
blob/master/mongodb/src/main/java/org/hibernate/scenicview/
mongodb/internal/MongoDbDenormalizationBackend.java#L91..L128
2017-02-06 16:49 GMT+01:00 Emmanuel Bernard <emmanuel(a)hibernate.org>:
> Your prototype is very Hibernate Search tainted. I wonder how or whether
we want it reusable across Hibernate Validator, Search and possibly more.
>
> Have you captured somewhere the discussion about the new document
builder so I could get a better grip of what’s at bay?
> Would this reverse of logic also be embraced by Hibernate Validator?
There are runtime decisions done in HV during traversal that made me doubt
that it would be as pertinent.
>
>
>
>> On 30 Jan 2017, at 11:21, Yoann Rodiere <yrodiere(a)redhat.com> wrote:
>>
>> Hi,
>>
>> Did the same this week-end, and adapted your work to match the bigger
picture of what we discussed on Friday.
>> Basically the "StructureTraverser" is now called
"ValueProcessor",
because it's not responsible for exposing the internals of a structure
anymore, but only to process a structure according to previously defined
metadata, passing the output to the "DocumentContext". I think it's the
second option you suggested. It makes sense in my opinion, since metadata
will be defined differently for different source types (POJO, JSON, ...).
>> This design allows in particular what Sanne suggested: when
bootstrapping, we can build some kind of "walker" (a composition of
"ValueProcessors") from the metadata, and avoid metadata lookup at runtime.
>>
>> The snippet is there:
https://gist.github.com/yrodiere/
9ff8fe8a8c7f59c1a051b36db20fbd4d <
https://gist.github.com/yrodiere/
9ff8fe8a8c7f59c1a051b36db20fbd4d>
>>
>> I'm sure it'll have to be refined to address additional constraints,
but in its current state it seems to address all of our requirements.
>>
>> Yoann Rodière <yrodiere(a)redhat.com <mailto:yrodiere@redhat.com>>
>> Software Engineer
>> Red Hat / Hibernate NoORM Team
>>
>> On 27 January 2017 at 18:23, Emmanuel Bernard <emmanuel(a)hibernate.org
<mailto:emmanuel@hibernate.org>> wrote:
>> I took the flight home to play with free form and specifically how we
would retrieve data from the free form structure.
>> By free-form I mean non POJO but they will have schema (not expressed
here).
>>
>>
https://github.com/emmanuelbernard/hibernate-search/commit/
0bd3fbab137bdad81bfa5b9934063792a050f537 <
https://github.com/
emmanuelbernard/hibernate-search/commit/0bd3fbab137bdad81bfa5b99340637
92a050f537>
>>
>> And in particular
>>
https://github.com/emmanuelbernard/hibernate-
search/blob/freeform/freeform/src/main/java/org/hibernate/
freeform/StructureTraverser.java <
https://github.com/
emmanuelbernard/hibernate-search/blob/freeform/freeform/
src/main/java/org/hibernate/freeform/StructureTraverser.java>
>>
https://github.com/emmanuelbernard/hibernate-
search/blob/freeform/freeform/src/main/java/org/hibernate/
freeform/pojo/impl/PojoStructureTraverser.java <
https://github.com/
emmanuelbernard/hibernate-search/blob/freeform/freeform/
src/main/java/org/hibernate/freeform/pojo/impl/PojoStructureTraverser.java
>
>>
>> It probably does not compile, I could not make the build work.
>>
>> I figured it was important to dump this raw thinking because it will
influence and will be influenced by the redesign of the DocumentBuilder of
Hibernate Search.
>>
>> There are several options for traversing a free form structure
>> - expose the traversing API as a holder to navigate all properties
per structure and sub structure. This is what the prototype shows. Caching
needs to be accessed via a hashmap get or other lookup. Metadata and the
traversing structure will be navigated in parallel
>> - expose a structure that is specialized to a single property or
container unwrapping aspect. The structures will be spread across and
embedded in the Metadata
>>
>>
>> Another angle:
>> - create a traversable object per payload to carry it (sharing metadata
info per type)
>> - have a stateless traversable object that is provided the payload for
each access
>>
>> The former seems better as it does not create a traversable object per
object navigated.
>> The latter is better for payloads that need parsing or are better at
sequential access since state could be cached.
>>
>> We need to discuss that and know where DocumentBuilder is going to
properly design this API.
>>
>> Emmanuel
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev(a)lists.jboss.org <mailto:hibernate-dev@lists.jboss.org>
>>
https://lists.jboss.org/mailman/listinfo/hibernate-dev <
https://lists.jboss.org/mailman/listinfo/hibernate-dev>
>>
>
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/hibernate-dev