Hi all,
among the various plans for Hibernate Search 6, one of the reasons we
had to do the Elasticsearch integration sooner as experimental was to
get ourselves a clearer picture of what's going to be needed in terms
of internal cleanup.
Our DocumentBuilder is ancient, and several new features have been
added since it was a well designed, simple piece of code..
So, while we have discussed several wishes already, I started now a
document to try get all our thoughts to converge.
For convenience, pasting the current content below.
-
https://docs.google.com/document/d/1JwKanIRHVTw1LvCdLGyY6EKuyvvQn6gvlkmPG...
I'm not giving comment permissions to the world; anyone who's
interested please answer here or drop me a note, happy to give
permissions to comment to well-intentioned people.
N.B. The document will very likely evolve beyond this email; as it is
now it's an initial brain dump. For example, I haven't thought about
the ES capability of nesting structures yet.
Thanks,
Sanne
==== Pasting from document =====
DocumentBuilder and FieldBridge requirements for Hibernate Search 6.0
* Never import Lucene types; ideally make Lucene an dependency of the
Lucene backend only.
* In a modular world, don’t expect end user code to be able to load
Lucene class definitions.
* Efficient lookup “field name” -> field mappings and its indexing
options; not least:
* Cardinality {always one, optional one, one-many, zero-many}
* Needed for validation of queries, e.g. query for null can use
an “exists” query only in some of these cases, vs needing a null
token.
* projectable alone vs part of multiple fields relating to a single
property (allow projection of Two-Way bridges using multiple fields)
* Might need “group name”.”sub field name” for groups and index time joins
* IndexedEmbedded
* “depth” and navigational graph to be pre-computed: tree of valid
fields and options to be known in advance.
* Navigating into a relation must deal with possibly navigating
into subclasses of the relation type:
http://stackoverflow.com/questions/39516355/indexing-a-interface-in-hiber...
* Immutable, threadsafe, easy to inspect/walk mapping tree
* Built and validated at boostrap of the IndexManager
* can’t be updated after that
* Field names and custom FieldType not to be allocated at runtime
* Efficient to validate Queries
* Allow efficient production of an Entity instance into:
* Elasticsearch “document”
* Lucene “document”
* An efficient to serialize “document”
* If it gets easy enough, make our own simple serialization?
* Extensible to other backends e.g. Apache Solr in the future (a
Walkable SPI)
* Pretty printed text to dump the “schema” we’re using from a
given domain model
* Validations and comparisons
* Allow to validate compatibility with an Elasticsearch schema
* Allow to validate compatibility with a Lucene schema
* Walking tree to map to ORM loading strategies
* allow to predict which paths we’ll need to initialize
(database load) for efficient batch loading (graph initialization)
* Allow for accurate Dirty-checking to skip indexing operations
* Allow generation of better MassIndexer queries (fetch join
some of the relations?)
* ID handling: specific care
* ad-hoc encoders for ID
* stricter validation (e.g. cardinatlity, DocValues, Two-Way fieldbridges)
* Support multi-term IDs (composite keys, @IdClass)
* Have different “index id strategies” to have them apply different
logic, i.e. “delete by term” and “update by term” only apply on
single-term IDs.
* ID handling strategy might need to take into account if the index
is shared among types.
* Decoupling from Java “Class” as entity-type identifiers
* Sharding:
* Allow reuse of the same schema for indexes using the same
* Allow reuse of some elements for indexes sharing such elements
* Properties / Field relations
* Handle one property -> multiple Fields as a bidirectional relation.
* Disallow one index field being target of different properties
and/or bridges?
* Representation of “Join points” and Groups:
* allow future production of Lucene documents with index-time join
(write in groups)
* allow efficient Query validation for both index-time and
query-time join options
* Composable
* @ClassBridge, @Field annotations to both contribute to field definitions
* a @ClassBridge of an @IndexedEmbedded to both contribute to the
embedded field definitions
* Include type-bound user custom Bridges (see BridgeProvider) in
the compositions
* Both @ClassBridge and custom Bridges need to trigger on
polymorphic relations as well