[hibernate-dev] [Wildcard] Search: changing the way we search

Tue Mar 4 07:24:32 EST 2014

On 04 Mar 2014, at 12:09, Guillaume Smet <guillaume.smet at gmail.com> wrote:

> On Tue, Mar 4, 2014 at 11:09 AM, Emmanuel Bernard
> <emmanuel at hibernate.org> wrote:
>> I would like to separate the notion of autosuggestion from the wildcard problem. To me they are separate and I would love to Hibernate Search to offer an autosuggest and spell checker API.
> 
> AFAICS from the changelog of each version, autosuggest is still a vast
> work in progress in Lucene/Solr.

So? :)

> 
>> Back to wildcard. If we have an analyser stack that separates normaliser filters from filters generating additional tokens (see my email [AND]), then it is piece of cake to apply the right filters, raise an exception if someone tries to wildcard on ngrams, and simply ignore the synonym filter.
> 
> In theory, yes.
> 
> But for the tokenization, we use WhitespaceTokenizer and
> WordDelimiterFilter which generates new tokens (for example, depending
> on the options you use, you can index wi-fi as wi and fi, wi-fi and
> wifi).

Ok that poses a problem for the wildcard if wi and if are separated. But I don’t think it’s an issue for the AND case as we would get the expected query:
- hotel AND wi-fi
- hotel AND wi AND fi

And to be fair, how do you plan to make wildcard and wi fi work together in Lucene (any solution available). The solution I can think of is to index the property with an analyzer stack that does not split words like that in two tokens.

> 
> The problem of this particular filter is also that we put it after the
> ASCIIFoldingFilter because we want the input to be as clean as
> possible but before the LowerCaseFilter as WordDelimiterFilter can do
> its magic on case change too.
> 
> If you separate normalizer from tokenizer, I don't think it's going to
> be easy to order them adequately.