This conversation is starting to get a bit complex, so I'll try to organize
my answers:
# Applying the same solution to HV and HSearch
@Emmanuel: right, I didn't see you were also talking about HV. I was only
considering the HSearch case.
I think I agree with you both, HV and HSearch are a bit different and we
certainly cannot share the whole code.
Some principles could probably be shared, such as the abstraction over
accessing the input type with Emmanuel's "StructureTraverser".
But the traversal algorithms are probably very different. And in fact,
these traversals are at the core of each project's purpose, so it may not
be a good idea to try to make them "more similar".
# The requirements for HSearch
@Emmanuel: we didn't take much notes, but we did draw a diagram of the
target architecture:
https://drive.google.com/a/redhat.com/file/d/0B_z-zSf_hJiZam
JkZFBlNG5CeDQ/view?usp=sharing
When you shared your recordings/pictures, I asked for the write permission
on the shared folder to put the diagram, but you probably haven't had time
yet.
If I remember correctly, here were the main requirements:
   - Separate the source data traversal from the actual output format.
      - This will help when implementing different indexing services
      (Elasticsearch, Solr): we don't want to assume anything about the target
      format.
   - Make the implementation of JGroups/JMS as simple as possible.
      - In these case, we don't really want to build documents, we just
      want to transform the entity to a serializable object, and reduce the
      information to transmit over the network to a minimum.
      - Ideally, we'd just want to "record" the output of the traversal,
      transmit this recording to the master node, and leave the master node
      replay it to build a document. This would have the added benefit of not
      requiring any knowledge of the underlying technology (Lucene/ES/Solr) on
      the client side.
   - Requirements on the "mapping tree" (I'm not absolutely sure about
   those, Sanne may want to clarify):
      - “depth” and navigational graph to be pre-computed: tree of valid
      fields and options to be known in advance.
      - Immutable, threadsafe, easy to inspect/walk mapping tree
      - And on my end (I think Sanne shared this concern, but I may be
      wrong): query metadata as little as possible at runtime.
# More info on my snippet
@Gunnar: you asked for some client code, but I'm not sure it'll be very
explicit. The only client-facing interface (as far as document building
goes) is EntityDocumentConverter.
So, the parts of the application that need to convert an entity to a
document will do something like that:
    EntityDocumentConverter<E, D> converter = indexManager.getEntityDocument
Converter();
    D document = converter.convert( entity );
    indexManager.performOperation( newAddOperation( document ) );
The idea behind this was to make runtime code as simple as possible, and
move the complexity to the bootstrapping.
Basically, when you call converter.convert, it will delegate to
ValueProcessors, which will extract information from the entity and inject
it into the DocumentBuilder. What is extracted, and how to extract it, is
completely up to the ValueProcessor.
This means that, when bootstrapping, a tree of ValueProcessors will be
built according to the metadata. For instance when a @Field is encountered,
we build an appropriate ValueProcessor (potentially nesting multiple ones
if we want to keep matters separate: one for extracting the property's
value, one for transforming this value using a bridge). When an
@IndexedEmbedded is encountered, we build a different ValueProcessor. And
so on.
Here is an (admittedly very simple) example of what it'd look like in the
metadata processor;
  List<ValueProcessor> collectedProcessors = new ArrayList<>();
  for ( XProperty property : properties ) {
    Field fieldAnnotation = property.getAnnotation( Field.class );
    if ( fieldAnnotation != null ) {
      ValueProcessor fieldBridgeProcessor = createFieldBridgeProcessor(
property.getType(), fieldAnnotation );
      ValueProcessor propertyProcessor = new JavaPropertyProcessor(
property, fieldBridgeProcessor ); // The value of the property will be
passed to the fieldBridgeProcessor at runtime
      collectedProcessor.add( propertyProcessor );
    }
  }
  ValueProcessor rootProcessor = new CompositeProcessor(
collectedProcessors );
  return new EntityDocumentConverter( rootProcessor,
indexManagerType.getDocumentBuilder() )
The actual code will obviously be more complex, first because we need to
support much more features than just @Field, but also because the
createFieldBridgeProcessor() method needs to somehow build backend-specific
metadata based on the nature of the field. But I think the snippet captures
the spirit.
# Summary
Thinking about it a little, there's a different focus in our solutions.
   1. Emmanuel's solutions focuses on abstracting over the input data
   format (thanks to StructureTraverser), assuming the traversal algorithm
   will be re-implemented for each output type.
   2. My solution focuses on abstracting over the output data format
   (thanks to DocumentBuilder), assuming the traversal algorithm will be
   re-implemented for each input type using different ValueProcessors.
   3. Gunnar's solution seem to focus on abstracting over the output data
   format, reimplementing the traversal algorithm for each input type using a
   different TreeTraversalSequence.
Solution 1 and 2 are, in my opinion, compatible. We could have very generic
ValueProcessors that would make use of a StructureTraverser to extract data
and of a DocumentBuilder to inject it into a document. I'm not sure it is
necessary, because I expect metadata to be defined differently based on the
input type, and hence the traversal algorithms to be slightly different,
but I think we could do it.
About solution 3: TreeTraversalSequence seems to implement the traversal
algorithm, while TreeTraversalEventConsumer abstracts over the output
format and TreeTraversalEvent abstracts over the information being
transferred.
I think the general principles are more or less equivalent to solution 2.
The main difference are:
   - How the context around the data to transfer is propagated.
   In solution 2, we pass the context progressively by making call to the
   DocumentBuilder (documentBuilder.nest(...), documentBuilder.addField(...)).
   In solution 3, the context is explicitly modeled as a TreeTraversalEvent.
   - How metadata is looked up.
   In solution 2, the metadata is built in the objects implementing the
   traversal algorithm, so there is no look up to speak of. In solution 3,
   there is a metadata lookup for each node in the tree.
Maybe there's a matter of performance, but I don't know enough about this
to give a definitive answer. In the end it's probably more a matter of
taste.
Yoann Rodière <yoann(a)hibernate.org>
Hibernate NoORM Team
On 7 February 2017 at 11:17, Gunnar Morling <gunnar(a)hibernate.org> wrote:
 Emmanuel,
 In your PoC, how would a complete tree-like structure be traversed?
 It's not clear to me, who is driving StructureTraverser, i.e. which
 component will call processSubstructureInContainer() et al. when
 traversing an entire tree.
 @Yoann, maybe you can add a usage example similar to Emmanuel's? You
 have a lot of framework code, but I'm not sure about how it'd be used.
 For Hibernate Search, the traversal pattern I implemented for the
 ScenicView PoC may be of interest. Its general idea is to represent a
 tree traversal as a sequence of events which a traverser
 implementation receives and can act on, e.g. to create a corresponding
 de-normalized structure, Lucene document etc. The retrieval of values
 and associated objects happens lazily as the traverser
 ("TreeTraversalEventConsumer" in my lingo) pulls events from the
 sequence, similar to what some XML parsers do.
 The main contract can be found at [1]. There are two event sequence
 implements, one based on Hibernate's meta-model [2] and one for
 java.util.Map [3]. An example event consumer implementation which
 creates MongoDB documents can be found at [4].
 As said I think it'd nicely fit for Hibernate Search, for HV I'm not
 so sure. The reason being that the order of traversal may very,
 depending on the defined validation groups and sequences. Sometimes we
 need to go "depth first". I've been contemplating to employ an
 event-like approach as described above for HV, but it may look
 different than the one used for HSEARCH.
 --Gunnar
 [1] 
https://github.com/gunnarmorling/scenicview-mvp/
 blob/master/core/src/main/java/org/hibernate/scenicview/spi/backend/model/
 TreeTraversalSequence.java.
 [2] 
https://github.com/gunnarmorling/scenicview-mvp/
 blob/master/core/src/main/java/org/hibernate/scenicview/internal/model/
 EntityStateBasedTreeTraversalSequence.java
 [3] 
https://github.com/gunnarmorling/scenicview-mvp/
 blob/master/core/src/test/java/org/hibernate/scenicview/test/traversal/
 MapTreeTraversalSequence.java
 [4] 
https://github.com/gunnarmorling/scenicview-mvp/
 blob/master/mongodb/src/main/java/org/hibernate/scenicview/
 mongodb/internal/MongoDbDenormalizationBackend.java#L91..L128
 2017-02-06 16:49 GMT+01:00 Emmanuel Bernard <emmanuel(a)hibernate.org>:
 > Your prototype is very Hibernate Search tainted. I wonder how or whether
 we want it reusable across Hibernate Validator, Search and possibly more.
 >
 > Have you captured somewhere the discussion about the new document
 builder so I could get a better grip of what’s at bay?
 > Would this reverse of logic also be embraced by Hibernate Validator?
 There are runtime decisions done in HV during traversal that made me doubt
 that it would be as pertinent.
 >
 >
 >
 >> On 30 Jan 2017, at 11:21, Yoann Rodiere <yrodiere(a)redhat.com> wrote:
 >>
 >> Hi,
 >>
 >> Did the same this week-end, and adapted your work to match the bigger
 picture of what we discussed on Friday.
 >> Basically the "StructureTraverser" is now called
"ValueProcessor",
 because it's not responsible for exposing the internals of a structure
 anymore, but only to process a structure according to previously defined
 metadata, passing the output to the "DocumentContext". I think it's the
 second option you suggested. It makes sense in my opinion, since metadata
 will be defined differently for different source types (POJO, JSON, ...).
 >> This design allows in particular what Sanne suggested: when
 bootstrapping, we can build some kind of "walker" (a composition of
 "ValueProcessors") from the metadata, and avoid metadata lookup at runtime.
 >>
 >> The snippet is there: 
https://gist.github.com/yrodiere/
 9ff8fe8a8c7f59c1a051b36db20fbd4d <
https://gist.github.com/yrodiere/
 9ff8fe8a8c7f59c1a051b36db20fbd4d>
 >>
 >> I'm sure it'll have to be refined to address additional constraints,
 but in its current state it seems to address all of our requirements.
 >>
 >> Yoann Rodière <yrodiere(a)redhat.com <mailto:yrodiere@redhat.com>>
 >> Software Engineer
 >> Red Hat / Hibernate NoORM Team
 >>
 >> On 27 January 2017 at 18:23, Emmanuel Bernard <emmanuel(a)hibernate.org
 <mailto:emmanuel@hibernate.org>> wrote:
 >> I took the flight home to play with free form and specifically how we
 would retrieve data from the free form structure.
 >> By free-form I mean non POJO but they will have schema (not expressed
 here).
 >>
 >> 
https://github.com/emmanuelbernard/hibernate-search/commit/
 0bd3fbab137bdad81bfa5b9934063792a050f537 <
https://github.com/
 emmanuelbernard/hibernate-search/commit/0bd3fbab137bdad81bfa5b99340637
 92a050f537>
 >>
 >> And in particular
 >> 
https://github.com/emmanuelbernard/hibernate-
 search/blob/freeform/freeform/src/main/java/org/hibernate/
 freeform/StructureTraverser.java <
https://github.com/
 emmanuelbernard/hibernate-search/blob/freeform/freeform/
 src/main/java/org/hibernate/freeform/StructureTraverser.java>
 >> 
https://github.com/emmanuelbernard/hibernate-
 search/blob/freeform/freeform/src/main/java/org/hibernate/
 freeform/pojo/impl/PojoStructureTraverser.java <
https://github.com/
 emmanuelbernard/hibernate-search/blob/freeform/freeform/
 src/main/java/org/hibernate/freeform/pojo/impl/PojoStructureTraverser.java
 >
 >>
 >> It probably does not compile, I could not make the build work.
 >>
 >> I figured it was important to dump this raw thinking because it will
 influence and will be influenced by the redesign of the DocumentBuilder of
 Hibernate Search.
 >>
 >> There are several options for traversing a free form structure
 >> - expose  the traversing API as a holder to  navigate all properties
 per structure and sub structure. This is what the prototype shows. Caching
 needs to be accessed via a hashmap get or other lookup. Metadata and the
 traversing structure will be navigated in parallel
 >> - expose a structure that is specialized to a single property or
 container unwrapping aspect. The structures will be spread across and
 embedded in the Metadata
 >>
 >>
 >> Another angle:
 >> - create a traversable object per payload to carry it (sharing metadata
 info per type)
 >> - have a stateless traversable object that is provided the payload for
 each access
 >>
 >> The former seems better as it does not create a traversable object per
 object navigated.
 >> The latter is better for payloads that need parsing or are better at
 sequential access since state could be cached.
 >>
 >> We need to discuss that and know where DocumentBuilder is going to
 properly design this API.
 >>
 >> Emmanuel
 >> _______________________________________________
 >> hibernate-dev mailing list
 >> hibernate-dev(a)lists.jboss.org <mailto:hibernate-dev@lists.jboss.org>
 >> 
https://lists.jboss.org/mailman/listinfo/hibernate-dev <
 
https://lists.jboss.org/mailman/listinfo/hibernate-dev>
 >>
 >
 > _______________________________________________
 > hibernate-dev mailing list
 > hibernate-dev(a)lists.jboss.org
 > 
https://lists.jboss.org/mailman/listinfo/hibernate-dev