Message Title

Change By:	Christoph Straßer

Same root-cause like HSEARCH-3038.

{code:java}
queryBuilder.keyword().wildcard().onFields("field1", "field2").matching("Word1 word2").createQuery();
{code}

We indexed field1 and field2 with LowerCaseFilterFactory. The search with wildcard does not find "Word1".

This happens, because org.hibernate.search.query.dsl.impl.ConnectedMultiFieldsTermQueryBuilder does the following:

{code:java}
private List<String> getAllTermsFromText(String fieldName, String localText, Analyzer analyzer) {
  //it's better not to apply the analyzer with wildcard as * and ? can be mistakenly removed
  List<String> terms = new ArrayList<String>();
  if ( termContext.getApproximation() == TermQueryContext.Approximation.WILDCARD ) {
   terms.add( localText );
  }
  else {
   try {
    terms = Helper.getAllTermsFromText( fieldName, localText, analyzer );
   }
   catch (IOException e) {
    throw new AssertionFailure( "IO exception while reading String stream??", e );
   }
  }
  return terms;
}
{code}

A clean solution would be to look at Solr. Solr solves this issue by defining analysers for indexing and for query.

{code:xml}
<fieldType name="nametext" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeepWordFilterFactory" words="keepwords.txt"/>
    <filter class="solr.SynonymFilterFactory" synonyms="syns.txt"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
{code}

Default-behaviour should be to use the defined analyser analyzer for index and query. But optionaly there should be the option to define a analyser analyzer for query.

Add Comment

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS

This message was sent by Atlassian Jira