Message Title

Thomas Göttlich created an issue

HSEARCH-1403

Issue Type:	Bug
Affects Versions:	4.3.0.Final
Assignee:	Unassigned
Components:	analyzer
Created:	11/Sep/13 5:51 AM
Priority:	Major
Reporter:	Thomas Göttlich

From my understanding, an analyzer discriminator would be defined on classes/fields that should not use the standard analyzer. However, DocumentBuilderIndexedEntity.allowAnalyzerDiscriminatorOverride() applies the discriminator to all unprocessed fields in the document, which might also be fields which the analyzer should not be applied to.

The discriminator itself does not know whether the field which is passed belongs to the entity which the discriminator is applied to or not.

Consider the following setup (irrelevant parts omitted):

                                                    @Indexed
class Article {
  String articleNumber;
  String internalText;
  
  @IndexedEmbedded
  Set<ArticleText> texts;
}

@AnalyzerDiscriminator( impl = ArticleTextAnalyzerDiscriminator.class )
@ClassBridge( impl = TextClassBridge.class )
class ArticleText {
  String language;
  String name;
  String description;
}

                                                

The bridge creates custom fields per language, i.e. we'd get fields like "texts.name_<language>" and "texts.description_<language>".

When the document is created/updated, the fields from Article (articleNumber and internalText) are added to the document first. Then the first text is processed and during that operation the analyzer discriminator is called.

At that moment, if the text was a German text, the document would contain the following fields:

                                                    _hibernate_class, 
articleNumber, 
internalText, 
texts.name_de, 
texts.descritpion_de
                                                

The discriminator would determine the analyzer to be, let's say analyzer_de. Because the fields from Article have not been processed yet, the fieldToAnalyzerMap would now look like this:

                                                    _hibernate_class=analyzer_de, 
articleNumber=analyzer_de, 
internalText=analyzer_de,
texts.name_de=analyzer_de, 
texts.descritpion_de=analyzer_de
                                                

As you can see, fields like internalText would now not use the default analyzer but analyzer_de, which seems not to be intended.

Possible approaches to a solution:

only apply the discriminator on the fields which have been added by the current entity (possibly even passing it to buildDocumentFields() in order to use it for embedded entities as well)
pass some information to the discriminator whether the field was added by the current entity or not, enabling the discriminator to decide itself

I'd prefer solution no. 1, since it would easily allow to pass the analyzer down to embedded entities and use it there, unless another discriminator would override it.

Currently, I can think of a few workarounds:

define a discriminator on each entity
set the default entity on the root entity (or each embedded entity as well? - I use the programmatic API so I'm not sure here, also the programmatic API doesn't seem to support that directly)
define a discriminator that determines the analyzer based on the field suffix

Add Comment

This message was sent by Atlassian JIRA