[hibernate-dev] Shaping the requirements for the new DocumentBuilder to come in Hibernate Search 6

Sanne Grinovero sanne at hibernate.org
Thu Nov 17 08:13:08 EST 2016


Hi all,
among the various plans for Hibernate Search 6, one of the reasons we
had to do the Elasticsearch integration sooner as experimental was to
get ourselves a clearer picture of what's going to be needed in terms
of internal cleanup.

Our DocumentBuilder is ancient, and several new features have been
added since it was a well designed, simple piece of code..

So, while we have discussed several wishes already, I started now a
document to try get all our thoughts to converge.

For convenience, pasting the current content below.
 - https://docs.google.com/document/d/1JwKanIRHVTw1LvCdLGyY6EKuyvvQn6gvlkmPGqDjFxw/edit?usp=sharing

I'm not giving comment permissions to the world; anyone who's
interested please answer here or drop me a note, happy to give
permissions to comment to well-intentioned people.

N.B. The document will very likely evolve beyond this email; as it is
now it's an initial brain dump. For example, I haven't thought about
the ES capability of nesting structures yet.

Thanks,
Sanne

==== Pasting from document =====


DocumentBuilder and FieldBridge requirements for Hibernate Search 6.0

* Never import Lucene types; ideally make Lucene an dependency of the
Lucene backend only.
   * In a modular world, don’t expect end user code to be able to load
Lucene class definitions.
* Efficient lookup “field name” -> field mappings and its indexing
options; not least:
   * Cardinality {always one, optional one, one-many, zero-many}
      * Needed for validation of queries, e.g. query for null can use
an “exists” query only in some of these cases, vs needing a null
token.
   * projectable alone vs part of multiple fields relating to a single
property (allow projection of Two-Way bridges using multiple fields)
   * Might need “group name”.”sub field name” for groups and index time joins
* IndexedEmbedded
   * “depth” and navigational graph to be pre-computed: tree of valid
fields and options to be known in advance.
   * Navigating into a relation must deal with possibly navigating
into subclasses of the relation type:
http://stackoverflow.com/questions/39516355/indexing-a-interface-in-hibernate-search
* Immutable, threadsafe, easy to inspect/walk mapping tree
   * Built and validated at boostrap of the IndexManager
      * can’t be updated after that
   * Field names and custom FieldType not to be allocated at runtime
   * Efficient to validate Queries
   * Allow efficient production of an Entity instance into:
      * Elasticsearch “document”
      * Lucene “document”
      * An efficient to serialize “document”
         * If it gets easy enough, make our own simple serialization?
      * Extensible to other backends e.g. Apache Solr in the future (a
Walkable SPI)
      * Pretty printed text to dump the “schema” we’re using from a
given domain model
   * Validations and comparisons
      * Allow to validate compatibility with an Elasticsearch schema
      * Allow to validate compatibility with a Lucene schema
   * Walking tree to map to ORM loading strategies
      * allow to predict which paths we’ll need to initialize
(database load) for efficient batch loading (graph initialization)
      * Allow for accurate Dirty-checking to skip indexing operations
      * Allow generation of better MassIndexer queries (fetch join
some of the relations?)
* ID handling: specific care
   * ad-hoc encoders for ID
   * stricter validation (e.g. cardinatlity, DocValues, Two-Way fieldbridges)
   * Support multi-term IDs (composite keys, @IdClass)
   * Have different “index id strategies” to have them apply different
logic, i.e. “delete by term” and “update by term” only apply on
single-term IDs.
   * ID handling strategy might need to take into account if the index
is shared among types.
* Decoupling from Java “Class” as entity-type identifiers
* Sharding:
   * Allow reuse of the same schema for indexes using the same
      * Allow reuse of some elements for indexes sharing such elements
* Properties / Field relations
   * Handle one property -> multiple Fields as a bidirectional relation.
   * Disallow one index field being target of different properties
and/or bridges?
* Representation of “Join points” and Groups:
   * allow future production of Lucene documents with index-time join
(write in groups)
   * allow efficient Query validation for both index-time and
query-time join options
* Composable
   * @ClassBridge, @Field annotations to both contribute to field definitions
   * a @ClassBridge of an @IndexedEmbedded to both contribute to the
embedded field definitions
   * Include type-bound user custom Bridges (see BridgeProvider) in
the compositions
   * Both @ClassBridge and custom Bridges need to trigger on
polymorphic relations as well



More information about the hibernate-dev mailing list