Shaping the requirements for the new DocumentBuilder to come in Hibernate Search 6
by Sanne Grinovero
Hi all,
among the various plans for Hibernate Search 6, one of the reasons we
had to do the Elasticsearch integration sooner as experimental was to
get ourselves a clearer picture of what's going to be needed in terms
of internal cleanup.
Our DocumentBuilder is ancient, and several new features have been
added since it was a well designed, simple piece of code..
So, while we have discussed several wishes already, I started now a
document to try get all our thoughts to converge.
For convenience, pasting the current content below.
- https://docs.google.com/document/d/1JwKanIRHVTw1LvCdLGyY6EKuyvvQn6gvlkmPG...
I'm not giving comment permissions to the world; anyone who's
interested please answer here or drop me a note, happy to give
permissions to comment to well-intentioned people.
N.B. The document will very likely evolve beyond this email; as it is
now it's an initial brain dump. For example, I haven't thought about
the ES capability of nesting structures yet.
Thanks,
Sanne
==== Pasting from document =====
DocumentBuilder and FieldBridge requirements for Hibernate Search 6.0
* Never import Lucene types; ideally make Lucene an dependency of the
Lucene backend only.
* In a modular world, don’t expect end user code to be able to load
Lucene class definitions.
* Efficient lookup “field name” -> field mappings and its indexing
options; not least:
* Cardinality {always one, optional one, one-many, zero-many}
* Needed for validation of queries, e.g. query for null can use
an “exists” query only in some of these cases, vs needing a null
token.
* projectable alone vs part of multiple fields relating to a single
property (allow projection of Two-Way bridges using multiple fields)
* Might need “group name”.”sub field name” for groups and index time joins
* IndexedEmbedded
* “depth” and navigational graph to be pre-computed: tree of valid
fields and options to be known in advance.
* Navigating into a relation must deal with possibly navigating
into subclasses of the relation type:
http://stackoverflow.com/questions/39516355/indexing-a-interface-in-hiber...
* Immutable, threadsafe, easy to inspect/walk mapping tree
* Built and validated at boostrap of the IndexManager
* can’t be updated after that
* Field names and custom FieldType not to be allocated at runtime
* Efficient to validate Queries
* Allow efficient production of an Entity instance into:
* Elasticsearch “document”
* Lucene “document”
* An efficient to serialize “document”
* If it gets easy enough, make our own simple serialization?
* Extensible to other backends e.g. Apache Solr in the future (a
Walkable SPI)
* Pretty printed text to dump the “schema” we’re using from a
given domain model
* Validations and comparisons
* Allow to validate compatibility with an Elasticsearch schema
* Allow to validate compatibility with a Lucene schema
* Walking tree to map to ORM loading strategies
* allow to predict which paths we’ll need to initialize
(database load) for efficient batch loading (graph initialization)
* Allow for accurate Dirty-checking to skip indexing operations
* Allow generation of better MassIndexer queries (fetch join
some of the relations?)
* ID handling: specific care
* ad-hoc encoders for ID
* stricter validation (e.g. cardinatlity, DocValues, Two-Way fieldbridges)
* Support multi-term IDs (composite keys, @IdClass)
* Have different “index id strategies” to have them apply different
logic, i.e. “delete by term” and “update by term” only apply on
single-term IDs.
* ID handling strategy might need to take into account if the index
is shared among types.
* Decoupling from Java “Class” as entity-type identifiers
* Sharding:
* Allow reuse of the same schema for indexes using the same
* Allow reuse of some elements for indexes sharing such elements
* Properties / Field relations
* Handle one property -> multiple Fields as a bidirectional relation.
* Disallow one index field being target of different properties
and/or bridges?
* Representation of “Join points” and Groups:
* allow future production of Lucene documents with index-time join
(write in groups)
* allow efficient Query validation for both index-time and
query-time join options
* Composable
* @ClassBridge, @Field annotations to both contribute to field definitions
* a @ClassBridge of an @IndexedEmbedded to both contribute to the
embedded field definitions
* Include type-bound user custom Bridges (see BridgeProvider) in
the compositions
* Both @ClassBridge and custom Bridges need to trigger on
polymorphic relations as well
8 years, 1 month
Vibur DBCP connection pool
by Sanne Grinovero
Hi all, Steve in particular,
there's a nice and friendly offer to integrate this connection pool in the
"Hibernate distribution".
- https://hibernate.atlassian.net/browse/HHH-10541
I asked to Simeon - the project lead for Vibur and issue reporter - if he
would help to maintain it and there seem to be good intentions.. but then
we didn't get him an answer.
What do you all think about this?
I understand we can't integrate everything but the approach seemed
promising to me; he has a point that it would just be simpler to maintain
if the integration point it incorporated, like we do with HikariCP and the
most popular 2nd level cache implementations.
Thanks,
Sanne
8 years, 1 month
Checkstyle checks in ORM
by Steve Ebersole
While developing the Byte Buddy Enhancer, Rafael ran into what I thnk is a
valid problem in the ORM build. Namely the fact that we incorporate
non-fatal Checktyle checks. In local builds this leads to a situation
where it is extremely difficult for new contributors to find out what
exactly they violated. Jenkins presents these better and makes it easier
to see just "fatal" (high priority) violations, but the Checktype report
itself does not. Even worse the Checkstyle report does not even show the
priority of individual violations at all.
I suggest we remove all non-fatal Checkstyle rules to make it easier on
contributors.
8 years, 1 month
HHH-11155 : problems updating lazy properties in lazy groups
by Gail Badner
Static update strings appear to cover only the following situations:
1) there are no uninitialized properties (so all updateable attributes
should be updated);
2) all lazy properties are uninitialized (so only
non-lazy, updateable attributes should be updated).
As of 5.1, we have "lazy groups". It is possible some lazy groups are
initialized, and some are uninitialized. We have a couple of alternatives
for dealing with the various combinations:
For example, if there are are 3 lazy groups: lazyGroup1, lazyGroup2,
lazyGroup3.
1) Generate SQL update strings for all possible combinations of initialized
lazy groups:
SQL update strings are already generated for the following combinations:
* lazyGroup1: uninitialized; lazyGroup2: uninitialized; lazyGroup3:
uninitialized
* lazyGroup1: initialized; lazyGroup2: initialized; lazyGroup3:
initialized
SQL update strings for the following combinations need to be generated to
fix the bug:
* lazyGroup1: initialized; lazyGroup2: uninitialized; lazyGroup3:
uninitialized
* lazyGroup1: uninitialized; lazyGroup2: initialized; lazyGroup3:
uninitialized
* lazyGroup1: uninitialized; lazyGroup2: uninitialized; lazyGroup3:
initialized
* lazyGroup1: initialized; lazyGroup2: initialized; lazyGroup3:
uninitialized
* lazyGroup1: initialized; lazyGroup2: uninitialized; lazyGroup3:
initialized
* lazyGroup1: uninitialized; lazyGroup2: initialized; lazyGroup3:
initialized
The update strings could be stored in a Map with key containing the names
(or indexes?) of the corresponding initialized lazy groups.
2) Generate dynamic update strings when there is at least 1 uninitialized
group, or when there are more than N lazy groups. What should N be?
Comments or suggestions?
A related bug is that calling a setter on a lazy property only initializes
that one lazy property. It should also initialize other properties in the
same lazy group. This one is pretty easy to fix.
Thanks,
Gail
8 years, 1 month
Hibernate Search and Elasticsearch 5 support
by Yoann Rodiere
Hello,
I just finished assessing the required changes for supporting Elasticsearch
5.0. I put the details in this ticket:
https://hibernate.atlassian.net/browse/HSEARCH-2434
Here is a quick summary:
- Some (non-breaking) changes are required in Jest [1]
- We'll have to update the way we do integration testing:
elasticsearch-maven-plugin doesn't work well with ES 5.0 and will require a
major overhaul to work. [2]
- There are some breaking changes that require us either drop support
for ES 2.x or introduce dialects (for instance the string datatype has been
split into two datatypes: text and keyword, which behave quite differently).
- And, perhaps most importantly, support for defining analyzers in
elasticsearch.yml has been dropped. This means users have to resort to the
index settings API to define their analyzers. So this breaks our automatic
index creation / mapping generation feature: we put mappings just after
creating the index, but since the index was just created analyzer
definitions will be missing and the mapping will be rejected. See
HSEARCH-2434 (in the comments) for details.
Ultimately, what we have to decide is how and when we're going to support
ES 5. Several options have already been mentioned on HipChat:
1. Support ES 5.x right now and drop support for ES 2.x
2. Support both ES 5.x and 2.x right now by introducing dialects (that
could be chosen automatically by asking the running version to the server)
3. Only support ES 2.x for now and keep ES 5.x for later (probably HS
6.0)
We have to consider several things in order to make the decision:
- Deadlines and available resources. Supporting 5.x only should be easy
enough, I think (if we ignore the analyzer definition issue) ; actually I
already did some work [3]. The dialect solution would obviously require
some more work, but if only target a quick, dirty fix (to be refactored in
6.0) it shouldn't be hell.
- Users. Dropping support for 2.x probably will make someone angry. But
then, we only published alpha/betas and advertised experimental support for
now, so there's shouldn't be many people using it in a production
environment.
Thoughts, opinions?
[1] https://github.com/searchbox-io/Jest/pull/408
[2] https://github.com/alexcojocaru/elasticsearch-maven-plugin/pull/19
[3] https://github.com/yrodiere/hibernate-search/tree/HSEARCH-2434
Yoann Rodière <yoann(a)hibernate.org>
Hibernate NoORM Team
8 years, 1 month