[JIRA] (HSEARCH-3313) Add support for dynamic index "partitioning"
by Yoann Rodière (JIRA)
Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... ) *updated* an issue
Hibernate Search ( https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiNDUyZDk2... ) / New Feature ( https://hibernate.atlassian.net/browse/HSEARCH-3313?atlOrigin=eyJpIjoiNDU... ) HSEARCH-3313 ( https://hibernate.atlassian.net/browse/HSEARCH-3313?atlOrigin=eyJpIjoiNDU... ) Add support for dynamic index "partitioning" ( https://hibernate.atlassian.net/browse/HSEARCH-3313?atlOrigin=eyJpIjoiNDU... )
Change By: Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... )
*WARNING:* See HSEARCH-3683 / HSEARCH-3971 before trying to implement this; it's possible that HSEARCH-3683 / HSEARCH-3971 will lay the groundwork for dynamic index partitioning.
*WARNING*: We should probably at least investigate this before 6.0.0.Final, at least to be sure it won’t require API changes.
Dynamic sharding is a bit tricky to implement in Search 6, because:
# it doesn't fit the concept of sharding in Elasticsearch very well (Elasticsearch only allows "static" sharding)
# as implemented in Search 5, it mixes the concept of entity being mapped and the resulting document (see {{org.hibernate.search.store.ShardIdentifierProvider#getShardIdentifier}})
However, there are real use cases for "dynamic sharding". There are two reasons to want dynamic sharding:
# For performance. Sharding data according to a key extracted from business data, assigning one shard to each key, may improve performance. For example if we know that most searches will focus on a given language, then let's shard on the "language" key and have one index per language.
# To allow for slightly different configuration for each shard. In particular, we may want to index the same field with a different analyzer depending on the language (see also HSEARCH-3311).
As far as I (Yoann) am concerned, combining "static", hash-based sharding with the solution mentioned in HSEARCH-3311 seems to do the trick and should be enough. It may be a bit less efficient (we'd need an additional "language = xxx" predicate to take into account the fact that multiple languages may end up in the same index, see [how it's usually done with Elasticsearch|https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html#search-routing]), but would also be much simpler.
Here is what we decided during the F2F meeting. To be revisited depending on the other priorities...
{quote}
DECISION: we don’t need dynamic sharding at the backend level. However:
# We want this at the mapper level because it addresses real business needs, such as having one index per language.
# Let’s call it “dynamic partitioning”? “dynamic index mapping”? The idea would be not to confuse it with sharding at the backend level.
# We will have to require users to implement an SPI that returns a list of all indexes created so far (called on each query) and gets notified when we create an index.
# Targeting class “Book” will, by default, target all relevant index managers; applying a filter will remove some index managers from the target.
# We should probably investigate this before 6.0.0.Final, at least to be sure it won’t require API changes.
# While we’re at it, we should investigate mapping a single entity object to multiple documents in different indexes. What does it mean when querying in particular? This could wait, though.
{quote}
( https://hibernate.atlassian.net/browse/HSEARCH-3313#add-comment?atlOrigin... ) Add Comment ( https://hibernate.atlassian.net/browse/HSEARCH-3313#add-comment?atlOrigin... )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.... ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailN... ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100140- sha1:fbb0377 )
4 years, 5 months
[JIRA] (HSEARCH-3903) Filters based exclusively on mapper metadata for @IndexedEmbedded
by Yoann Rodière (JIRA)
Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... ) *updated* an issue
Hibernate Search ( https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiYTZjNWQy... ) / New Feature ( https://hibernate.atlassian.net/browse/HSEARCH-3903?atlOrigin=eyJpIjoiYTZ... ) HSEARCH-3903 ( https://hibernate.atlassian.net/browse/HSEARCH-3903?atlOrigin=eyJpIjoiYTZ... ) Filters based exclusively on mapper metadata for @IndexedEmbedded ( https://hibernate.atlassian.net/browse/HSEARCH-3903?atlOrigin=eyJpIjoiYTZ... )
Change By: Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... )
The {{includePaths}} filter in {{@IndexedEmbedded}} refers to index field paths. This has several drawbacks:
* Part of the implementation has to be in the backend, which feels quite dirty.
* This is not very consistent with the {{maxDepth}} filter, which applied to the {{@IndexedEmbedded}} only (depth of fields created within an included bridge is unlimited).
* The filters cannot easily be applied to dynamic fields, so dynamic fields are always included as soon as their nearest static parent is included.
* The filter can end up including some fields declared by a custom field bridge, but not others.
** This does not make sense performance-wise as the fields will still be populated by the bridge, but ignored by the backend.
** Worse, when we introduce support for bridge-defined predicates (HSEARCH-3320), we may end up with dysfunctional predicates because only some fields are present, while the bridge expects all fields to be present.
* We are forced to use inference to detect which bridges should be included or excluded, based on the fields they declared.
** This code is unnecessarily complex.
** This code does not work correctly with field templates, since we cannot know in advance whether dynamic fields will be included. In particular:
*** Bridges that declare field templates, but only ever add dynamic fields that would not match the {{includePaths}}, are included nonetheless.
*** Bridges that do not declare anything and rely on field templates declared by a parent (which is legal) are excluded.
We could get rid of most of the complexity by implementing filters differently, based on mapper metadata exclusively (mapping annotations and/or entity model).
h3. Solution 1: property paths
We could rely on property paths instead of field paths. Only bridges applied to included properties are themselves included.
The major drawback is that there wouldn't be any way to filter out type bridges.
h3. Solution 2: groups
We could rely on "groups", similarly to the {{@LazyGroup}} support in Hibernate ORM, or to the group support in Hibernate Validator.
One assigns groups to every {{@Field}}/{{@IndexedEmbedded}}, then references the groups in {{@IncludedEmbedded(includeGroups = ...)}}.
The main problem with this solution is its complexity; Validator is using groups and I know they can be pretty complex to handle. We should definitely see what makes them so complex in Validator to avoid the same problems in Search.
For example:
{code}
@Indexed
public class Level1Entity {
// Will include id only
@IndexedEmbedded
private Level2Entity level2_1;
// Will include id, name
@IndexedEmbedded(includeGroups = {BuiltinGroups.DEFAULT, "base"})
private Level2Entity level2_2;
// Will include id, name, category
@IndexedEmbedded(includeGroups = {BuiltinGroups.DEFAULT, "base", "advanced"})
private Level2Entity level2_3;
}
public class Level2Entity {
@GenericField // Default group
private String id;
@GenericField(groups = "base")
private String name;
@GenericField(groups = "advanced")
private String category;
}
{code}
h4. Variation: overriding {{includeGroups}}
It would prevent us from supporting the use case mentioned in HSEARCH-1112 directly, but I believe the same effect could be achieved if we defined group filters as "overriding" instead of "composable": an {{@IndexedEmbedded(includeGroups = "a")}} that includes an {{@IndexedEmbedded(includeGroups = "b")}} would just act as if the contained {{@IndexedEmbedded}} included group "a", and only group "a".
For example:
{code}
@Indexed
public class Level1Entity {
// Will include level2.level3.a only
@IndexedEmbedded(includeGroups = "a")
private Level2Entity level2;
}
public class Level2Entity {
@GenericField(groups = "b")
private String name;
@IndexedEmbedded(includeGroups = "b") // includeGroups is overridden in Level1Entity
private String id;
}
public class Level3Entity {
@GenericField(groups = "a")
private String a;
@GenericField(groups = "b")
private String b;
}
{code}
There are pros and cons:
* Pro: Groups may be easier to implement and understand: the various filters defined in indexed-embedded entities would no longer be relevant. One could argue that it's the opposite, though: the fact that filters defined in indexed-embedded entities are ignored can be confusing.
* Con: it would become harder to manage cycles through group filtering: you would no longer be able to rely on indexed-embedded entities to filter out cycles through groups (since their group filters are ignored).
* Con: the behavior would not be consistent with that of {{maxDepth}}.
h3. Next
h4. Deprecation
As a second step, we should probably deprecate {{includePaths}} and mark it for removal in a later major version (7+).
h4. Going further: dynamic group selection
One could imagine to allow selecting groups dynamically. All the groups that *can* be selected would be included in the index schema, and when indexing some fields would get enabled or not based on the dynamic group selection.
This would provide a feature similar to the {{AlternativeBinder}}, but much more powerful.
There is one unknown, though: how would {{(a)IndexedEmbedded.includeGroups}} interact with the dynamic group selection? If dynamic group selection is overridden by {{(a)IndexedEmbedded.includeGroups}}, it will likely not work as intended for the "multi See HSEARCH - language" use case of {{AlternativeBinder}} 3971. If dynamic group selection ignores {{(a)IndexedEmbedded.includeGroups}}, it will become impossible to exclude dynamically enabled fields from {{@IndexedEmbedded}}.
Maybe we should separate the two concepts, e.g. {{@GenericField(groups = ..., dynamicGroups = ...)}}?
Or maybe we should assign a static group to the "dynamic group resolver": the resolver and all corresponding fields would be included in the schema if the resolver's assigned static group is included, even if the field's (dynamic) groups are not included. Then it would be the user's responsibility to make sure static and dynamic groups use different names, so as not to include dynamic groups statically by mistake.
h4. Going further: using field groups to select fields in the search DSL
We could address HSEARCH-3926 by contributing groups to the index metamodel: {{@GenericField(groups = "foo")}} would assign the group "foo" to the corresponding index field, which could then be targeted at query time by selecting the group "foo". See HSEARCH-3926 for more information.
This would be a reasonable use of groups, I believe?
( https://hibernate.atlassian.net/browse/HSEARCH-3903#add-comment?atlOrigin... ) Add Comment ( https://hibernate.atlassian.net/browse/HSEARCH-3903#add-comment?atlOrigin... )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.... ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailN... ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100140- sha1:fbb0377 )
4 years, 5 months
[JIRA] (HSEARCH-3926) Predicate on multiple fields designated by a single label/group name/etc. ("_all", copy_to, ...)
by Yoann Rodière (JIRA)
Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... ) *updated* an issue
Hibernate Search ( https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiNDNiM2U2... ) / New Feature ( https://hibernate.atlassian.net/browse/HSEARCH-3926?atlOrigin=eyJpIjoiNDN... ) HSEARCH-3926 ( https://hibernate.atlassian.net/browse/HSEARCH-3926?atlOrigin=eyJpIjoiNDN... ) Predicate on multiple fields designated by a single label/group name/etc. ("_all", copy_to, ...) ( https://hibernate.atlassian.net/browse/HSEARCH-3926?atlOrigin=eyJpIjoiNDN... )
Change By: Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... )
Sometimes it's not practical to explicitly list all the fields one wants to target when searching. Maybe there are lots of them, maybe you just want to target all fields that can be targeted, ...
Some related discussions on the forums:
* https://discourse.hibernate.org/t/hibernate-search-6-0-0-beta6-simplequer...
* https://discourse.hibernate.org/t/support-for-copy-to-mapping-on-fields/415
Elasticsearch offers several solutions to that problem; we should investigate and pick the most appropriate.
In particular:
* Some queries implicitly target all relevant fields when we don't pass a field name (simple query string in particular)
* The {{copy_to}} attribute in the mapping allows copying the content of a field to another at indexing time : https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html
* I remember seeing something about an {{_all}} field whose content is the content of all other fields? Though I believe it was disabled by default.
Alternatively, we could use our own concept of groups in Hibernate Search:
* Assign one or more groups to each field.
* When searching, specify the group name instead of the field name, which will select all fields with that name.
We already need that concept of groups for other features, and will most likely introduce it in HSEARCH-3903, so... two birds with one stone?
( https://hibernate.atlassian.net/browse/HSEARCH-3926#add-comment?atlOrigin... ) Add Comment ( https://hibernate.atlassian.net/browse/HSEARCH-3926#add-comment?atlOrigin... )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.... ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailN... ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100140- sha1:fbb0377 )
4 years, 5 months
[JIRA] (HSEARCH-3683) Mapping a single entity type to multiple indexes
by Yoann Rodière (JIRA)
Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... ) *updated* an issue
Hibernate Search ( https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiMmIzMTRh... ) / Improvement ( https://hibernate.atlassian.net/browse/HSEARCH-3683?atlOrigin=eyJpIjoiMmI... ) HSEARCH-3683 ( https://hibernate.atlassian.net/browse/HSEARCH-3683?atlOrigin=eyJpIjoiMmI... ) Mapping a single entity type to multiple indexes ( https://hibernate.atlassian.net/browse/HSEARCH-3683?atlOrigin=eyJpIjoiMmI... )
Change By: Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... )
Introduce a way to map a single entity type to multiple indexes:
* on bootstrap, user code will provide a list of index names that this entity type will be mapped to (instead of just one usually).
* at runtime, each entity to index will be inspected by user code upon indexing, and routed to the correct index.
For example, we could repurpose the {{RoutingKeyBridge}}, renaming it to a {{RoutingBridge}}. Its APIs would change a lot, but something like this could do:
{code}
public class MyRoutingBridge implements RoutingBridge {
private final IndexReference indexReference1; // Obtained by the binder
private final IndexReference indexReference2; // Obtained by the binder
@Override
public void route(Route route,
String tenantIdentifier, Object entityId, String documentId, Object entity,
RoutingBridgeRouteContext context) {
route.index( indexReference1 ); // Only necessary if there are indexes declared by the binder
route.routingKey( <compute some routing key> ); // Only necessary if the binder enabled routing keys
}
}
{code}
This should address :
* Part of the use cases mentioned in HSEARCH-3313: essentially all the use cases except those where the list of indexes is not known in advance. Even that one could be covered eventually most , but let's keep it for later.
* Most, if not all, of the use cases for Indexing Event Interceptors (HSEARCH-3108). These interceptors are mainly used to prevent indexing of entities that are in a certain state. We could provide a way for users to route an entity to "nowhere" ({{route.discard()}}?), meaning it will be removed from all indexes and will not be added anywhere.
Note that initially, we'll force all indexes to have the exact same mapping. We'll allow each index to have a different mapping in HSEARCH-3971, and then we'll have addressed part of the use cases mentioned in HSEARCH-3313: essentially all the use cases except those where the list of indexes is not known in advance. Even that one could be covered eventually, in theory but that'll have to be handled in HSEARCH-3313 (if ever).
( https://hibernate.atlassian.net/browse/HSEARCH-3683#add-comment?atlOrigin... ) Add Comment ( https://hibernate.atlassian.net/browse/HSEARCH-3683#add-comment?atlOrigin... )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.... ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailN... ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100140- sha1:fbb0377 )
4 years, 5 months
[JIRA] (HSEARCH-3971) Mapping an entity differently based on a discriminator
by Yoann Rodière (JIRA)
Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... ) *created* an issue
Hibernate Search ( https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiZDRhYzJi... ) / New Feature ( https://hibernate.atlassian.net/browse/HSEARCH-3971?atlOrigin=eyJpIjoiZDR... ) HSEARCH-3971 ( https://hibernate.atlassian.net/browse/HSEARCH-3971?atlOrigin=eyJpIjoiZDR... ) Mapping an entity differently based on a discriminator ( https://hibernate.atlassian.net/browse/HSEARCH-3971?atlOrigin=eyJpIjoiZDR... )
Issue Type: New Feature Assignee: Unassigned Components: mapper-pojo Created: 27/Jul/2020 03:39 AM Fix Versions: 6.1 Priority: Major Reporter: Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... )
Follow-up on HSEARCH-3311 ( https://hibernate.atlassian.net/browse/HSEARCH-3311 ) Pull Request Sent.
In some cases, one could benefit of mapping an entity differently depending on
There are two use cases to distinguish:
* I want a different set of fields depending on a discriminator. This could be solved by including all fields in the schema, but populating them only when needed at runtime: e.g. only index largeTextField when entityState is ACTIVE , but not when it's ARCHIVED.
* I want the same fields, but configured differently depending on a discriminator. This could be solved by mapping the same entity type to multiple indexes ( HSEARCH-3683 ( https://hibernate.atlassian.net/browse/HSEARCH-3683 ) Open ) with *a different schema* (not included in HSEARCH-3683 ( https://hibernate.atlassian.net/browse/HSEARCH-3683 ) Open ), and creating the documents in the relevant index at runtime ; e.g. map "BlogPost" to blogposts_fr and blogposts_de , use a different analyzer on the text field in each index, and route blog posts with language = FR to blogposts_fr , blog posts with language = DE to blogposts_de.
Use case 1: enable/disable fields based on a discriminator
----------------------------------------------------------
Both use cases could be addressed with the concept of "groups" introduced in HSEARCH-3903 ( https://hibernate.atlassian.net/browse/HSEARCH-3903 ) Open. The user would provide a component that enables or disables "groups" dynamically, based on the state of the entity. All the groups that *can* be selected would be included in the index schema, and when indexing some fields would get enabled or not based on the dynamic group selection.
This would provide a feature similar to the AlternativeBinder , but much more powerful.
There is one unknown, though: how would @IndexedEmbedded.includeGroups interact with the dynamic group selection? If dynamic group selection is overridden by @IndexedEmbedded.includeGroups , it will likely not work as intended for the "multi-language" use case of AlternativeBinder. If dynamic group selection ignores @IndexedEmbedded.includeGroups , it will become impossible to exclude dynamically enabled fields from @IndexedEmbedded.
Maybe we should separate the two concepts, e.g. @GenericField(groups = ..., dynamicGroups = ...) ?
Or maybe we should assign a static group to the "dynamic group resolver": the resolver and all corresponding fields would be included in the schema if the resolver's assigned static group is included, even if the field's (dynamic) groups are not included. Then it would be the user's responsibility to make sure static and dynamic groups use different names, so as not to include dynamic groups statically by mistake.
Use case 2: route to a different index with a different schema based on a discriminator
---------------------------------------------------------------------------------------
This should only be a matter of building upon what is described above and the work done in HSEARCH-3683: when declaring the indexes an entity type is mapped to, we would also declare the groups of field that are enabled for each index. Then at runtime, when an entity is routed to an index, it would use the appropriate mapping.
( https://hibernate.atlassian.net/browse/HSEARCH-3971#add-comment?atlOrigin... ) Add Comment ( https://hibernate.atlassian.net/browse/HSEARCH-3971#add-comment?atlOrigin... )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.... ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailN... ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100140- sha1:fbb0377 )
4 years, 5 months