[hibernate-dev] [HV/HSEARCH] Free form

Tue Feb 7 10:36:38 EST 2017

This conversation is starting to get a bit complex, so I'll try to organize
my answers:

# Applying the same solution to HV and HSearch

@Emmanuel: right, I didn't see you were also talking about HV. I was only
considering the HSearch case.

I think I agree with you both, HV and HSearch are a bit different and we
certainly cannot share the whole code.
Some principles could probably be shared, such as the abstraction over
accessing the input type with Emmanuel's "StructureTraverser".
But the traversal algorithms are probably very different. And in fact,
these traversals are at the core of each project's purpose, so it may not
be a good idea to try to make them "more similar".

# The requirements for HSearch

@Emmanuel: we didn't take much notes, but we did draw a diagram of the
target architecture:

https://drive.google.com/a/redhat.com/file/d/0B_z-zSf_hJiZam
JkZFBlNG5CeDQ/view?usp=sharing

When you shared your recordings/pictures, I asked for the write permission
on the shared folder to put the diagram, but you probably haven't had time
yet.

If I remember correctly, here were the main requirements:

   - Separate the source data traversal from the actual output format.
      - This will help when implementing different indexing services
      (Elasticsearch, Solr): we don't want to assume anything about the target
      format.
   - Make the implementation of JGroups/JMS as simple as possible.
      - In these case, we don't really want to build documents, we just
      want to transform the entity to a serializable object, and reduce the
      information to transmit over the network to a minimum.
      - Ideally, we'd just want to "record" the output of the traversal,
      transmit this recording to the master node, and leave the master node
      replay it to build a document. This would have the added benefit of not
      requiring any knowledge of the underlying technology (Lucene/ES/Solr) on
      the client side.
   - Requirements on the "mapping tree" (I'm not absolutely sure about
   those, Sanne may want to clarify):
      - “depth” and navigational graph to be pre-computed: tree of valid
      fields and options to be known in advance.
      - Immutable, threadsafe, easy to inspect/walk mapping tree
      - And on my end (I think Sanne shared this concern, but I may be
      wrong): query metadata as little as possible at runtime.

# More info on my snippet

@Gunnar: you asked for some client code, but I'm not sure it'll be very
explicit. The only client-facing interface (as far as document building
goes) is EntityDocumentConverter.
So, the parts of the application that need to convert an entity to a
document will do something like that:

    EntityDocumentConverter<E, D> converter = indexManager.getEntityDocument
Converter();
    D document = converter.convert( entity );
    indexManager.performOperation( newAddOperation( document ) );

The idea behind this was to make runtime code as simple as possible, and
move the complexity to the bootstrapping.
Basically, when you call converter.convert, it will delegate to
ValueProcessors, which will extract information from the entity and inject
it into the DocumentBuilder. What is extracted, and how to extract it, is
completely up to the ValueProcessor.
This means that, when bootstrapping, a tree of ValueProcessors will be
built according to the metadata. For instance when a @Field is encountered,
we build an appropriate ValueProcessor (potentially nesting multiple ones
if we want to keep matters separate: one for extracting the property's
value, one for transforming this value using a bridge). When an
@IndexedEmbedded is encountered, we build a different ValueProcessor. And
so on.
Here is an (admittedly very simple) example of what it'd look like in the
metadata processor;

  List<ValueProcessor> collectedProcessors = new ArrayList<>();
  for ( XProperty property : properties ) {
    Field fieldAnnotation = property.getAnnotation( Field.class );
    if ( fieldAnnotation != null ) {
      ValueProcessor fieldBridgeProcessor = createFieldBridgeProcessor(
property.getType(), fieldAnnotation );
      ValueProcessor propertyProcessor = new JavaPropertyProcessor(
property, fieldBridgeProcessor ); // The value of the property will be
passed to the fieldBridgeProcessor at runtime
      collectedProcessor.add( propertyProcessor );
    }
  }
  ValueProcessor rootProcessor = new CompositeProcessor(
collectedProcessors );
  return new EntityDocumentConverter( rootProcessor,
indexManagerType.getDocumentBuilder() )

The actual code will obviously be more complex, first because we need to
support much more features than just @Field, but also because the
createFieldBridgeProcessor() method needs to somehow build backend-specific
metadata based on the nature of the field. But I think the snippet captures
the spirit.

# Summary

Thinking about it a little, there's a different focus in our solutions.

   1. Emmanuel's solutions focuses on abstracting over the input data
   format (thanks to StructureTraverser), assuming the traversal algorithm
   will be re-implemented for each output type.
   2. My solution focuses on abstracting over the output data format
   (thanks to DocumentBuilder), assuming the traversal algorithm will be
   re-implemented for each input type using different ValueProcessors.
   3. Gunnar's solution seem to focus on abstracting over the output data
   format, reimplementing the traversal algorithm for each input type using a
   different TreeTraversalSequence.

Solution 1 and 2 are, in my opinion, compatible. We could have very generic
ValueProcessors that would make use of a StructureTraverser to extract data
and of a DocumentBuilder to inject it into a document. I'm not sure it is
necessary, because I expect metadata to be defined differently based on the
input type, and hence the traversal algorithms to be slightly different,
but I think we could do it.

About solution 3: TreeTraversalSequence seems to implement the traversal
algorithm, while TreeTraversalEventConsumer abstracts over the output
format and TreeTraversalEvent abstracts over the information being
transferred.
I think the general principles are more or less equivalent to solution 2.
The main difference are:

   - How the context around the data to transfer is propagated.
   In solution 2, we pass the context progressively by making call to the
   DocumentBuilder (documentBuilder.nest(...), documentBuilder.addField(...)).
   In solution 3, the context is explicitly modeled as a TreeTraversalEvent.
   - How metadata is looked up.
   In solution 2, the metadata is built in the objects implementing the
   traversal algorithm, so there is no look up to speak of. In solution 3,
   there is a metadata lookup for each node in the tree.

Maybe there's a matter of performance, but I don't know enough about this
to give a definitive answer. In the end it's probably more a matter of
taste.

Yoann Rodière <yoann at hibernate.org>
Hibernate NoORM Team

On 7 February 2017 at 11:17, Gunnar Morling <gunnar at hibernate.org> wrote:

> Emmanuel,
>
> In your PoC, how would a complete tree-like structure be traversed?
> It's not clear to me, who is driving StructureTraverser, i.e. which
> component will call processSubstructureInContainer() et al. when
> traversing an entire tree.
>
> @Yoann, maybe you can add a usage example similar to Emmanuel's? You
> have a lot of framework code, but I'm not sure about how it'd be used.
>
> For Hibernate Search, the traversal pattern I implemented for the
> ScenicView PoC may be of interest. Its general idea is to represent a
> tree traversal as a sequence of events which a traverser
> implementation receives and can act on, e.g. to create a corresponding
> de-normalized structure, Lucene document etc. The retrieval of values
> and associated objects happens lazily as the traverser
> ("TreeTraversalEventConsumer" in my lingo) pulls events from the
> sequence, similar to what some XML parsers do.
>
> The main contract can be found at [1]. There are two event sequence
> implements, one based on Hibernate's meta-model [2] and one for
> java.util.Map [3]. An example event consumer implementation which
> creates MongoDB documents can be found at [4].
>
> As said I think it'd nicely fit for Hibernate Search, for HV I'm not
> so sure. The reason being that the order of traversal may very,
> depending on the defined validation groups and sequences. Sometimes we
> need to go "depth first". I've been contemplating to employ an
> event-like approach as described above for HV, but it may look
> different than the one used for HSEARCH.
>
> --Gunnar
>
> [1] https://github.com/gunnarmorling/scenicview-mvp/
> blob/master/core/src/main/java/org/hibernate/scenicview/spi/backend/model/
> TreeTraversalSequence.java.
> [2] https://github.com/gunnarmorling/scenicview-mvp/
> blob/master/core/src/main/java/org/hibernate/scenicview/internal/model/
> EntityStateBasedTreeTraversalSequence.java
> [3] https://github.com/gunnarmorling/scenicview-mvp/
> blob/master/core/src/test/java/org/hibernate/scenicview/test/traversal/
> MapTreeTraversalSequence.java
> [4] https://github.com/gunnarmorling/scenicview-mvp/
> blob/master/mongodb/src/main/java/org/hibernate/scenicview/
> mongodb/internal/MongoDbDenormalizationBackend.java#L91..L128
>
>
>
> 2017-02-06 16:49 GMT+01:00 Emmanuel Bernard <emmanuel at hibernate.org>:
> > Your prototype is very Hibernate Search tainted. I wonder how or whether
> we want it reusable across Hibernate Validator, Search and possibly more.
> >
> > Have you captured somewhere the discussion about the new document
> builder so I could get a better grip of what’s at bay?
> > Would this reverse of logic also be embraced by Hibernate Validator?
> There are runtime decisions done in HV during traversal that made me doubt
> that it would be as pertinent.
> >
> >
> >
> >> On 30 Jan 2017, at 11:21, Yoann Rodiere <yrodiere at redhat.com> wrote:
> >>
> >> Hi,
> >>
> >> Did the same this week-end, and adapted your work to match the bigger
> picture of what we discussed on Friday.
> >> Basically the "StructureTraverser" is now called "ValueProcessor",
> because it's not responsible for exposing the internals of a structure
> anymore, but only to process a structure according to previously defined
> metadata, passing the output to the "DocumentContext". I think it's the
> second option you suggested. It makes sense in my opinion, since metadata
> will be defined differently for different source types (POJO, JSON, ...).
> >> This design allows in particular what Sanne suggested: when
> bootstrapping, we can build some kind of "walker" (a composition of
> "ValueProcessors") from the metadata, and avoid metadata lookup at runtime.
> >>
> >> The snippet is there: https://gist.github.com/yrodiere/
> 9ff8fe8a8c7f59c1a051b36db20fbd4d <https://gist.github.com/yrodiere/
> 9ff8fe8a8c7f59c1a051b36db20fbd4d>
> >>
> >> I'm sure it'll have to be refined to address additional constraints,
> but in its current state it seems to address all of our requirements.
> >>
> >> Yoann Rodière <yrodiere at redhat.com <mailto:yrodiere at redhat.com>>
> >> Software Engineer
> >> Red Hat / Hibernate NoORM Team
> >>
> >> On 27 January 2017 at 18:23, Emmanuel Bernard <emmanuel at hibernate.org
> <mailto:emmanuel at hibernate.org>> wrote:
> >> I took the flight home to play with free form and specifically how we
> would retrieve data from the free form structure.
> >> By free-form I mean non POJO but they will have schema (not expressed
> here).
> >>
> >> https://github.com/emmanuelbernard/hibernate-search/commit/
> 0bd3fbab137bdad81bfa5b9934063792a050f537 <https://github.com/
> emmanuelbernard/hibernate-search/commit/0bd3fbab137bdad81bfa5b99340637
> 92a050f537>
> >>
> >> And in particular
> >> https://github.com/emmanuelbernard/hibernate-
> search/blob/freeform/freeform/src/main/java/org/hibernate/
> freeform/StructureTraverser.java <https://github.com/
> emmanuelbernard/hibernate-search/blob/freeform/freeform/
> src/main/java/org/hibernate/freeform/StructureTraverser.java>
> >> https://github.com/emmanuelbernard/hibernate-
> search/blob/freeform/freeform/src/main/java/org/hibernate/
> freeform/pojo/impl/PojoStructureTraverser.java <https://github.com/
> emmanuelbernard/hibernate-search/blob/freeform/freeform/
> src/main/java/org/hibernate/freeform/pojo/impl/PojoStructureTraverser.java
> >
> >>
> >> It probably does not compile, I could not make the build work.
> >>
> >> I figured it was important to dump this raw thinking because it will
> influence and will be influenced by the redesign of the DocumentBuilder of
> Hibernate Search.
> >>
> >> There are several options for traversing a free form structure
> >> - expose  the traversing API as a holder to  navigate all properties
> per structure and sub structure. This is what the prototype shows. Caching
> needs to be accessed via a hashmap get or other lookup. Metadata and the
> traversing structure will be navigated in parallel
> >> - expose a structure that is specialized to a single property or
> container unwrapping aspect. The structures will be spread across and
> embedded in the Metadata
> >>
> >>
> >> Another angle:
> >> - create a traversable object per payload to carry it (sharing metadata
> info per type)
> >> - have a stateless traversable object that is provided the payload for
> each access
> >>
> >> The former seems better as it does not create a traversable object per
> object navigated.
> >> The latter is better for payloads that need parsing or are better at
> sequential access since state could be cached.
> >>
> >> We need to discuss that and know where DocumentBuilder is going to
> properly design this API.
> >>
> >> Emmanuel
> >> _______________________________________________
> >> hibernate-dev mailing list
> >> hibernate-dev at lists.jboss.org <mailto:hibernate-dev at lists.jboss.org>
> >> https://lists.jboss.org/mailman/listinfo/hibernate-dev <
> https://lists.jboss.org/mailman/listinfo/hibernate-dev>
> >>
> >
> > _______________________________________________
> > hibernate-dev mailing list
> > hibernate-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>