[hibernate-dev] [Wildcard] Search: changing the way we search

Guillaume Smet guillaume.smet at gmail.com
Tue Mar 4 06:09:00 EST 2014


On Tue, Mar 4, 2014 at 11:09 AM, Emmanuel Bernard
<emmanuel at hibernate.org> wrote:
> I would like to separate the notion of autosuggestion from the wildcard problem. To me they are separate and I would love to Hibernate Search to offer an autosuggest and spell checker API.

AFAICS from the changelog of each version, autosuggest is still a vast
work in progress in Lucene/Solr.

> Back to wildcard. If we have an analyser stack that separates normaliser filters from filters generating additional tokens (see my email [AND]), then it is piece of cake to apply the right filters, raise an exception if someone tries to wildcard on ngrams, and simply ignore the synonym filter.

In theory, yes.

But for the tokenization, we use WhitespaceTokenizer and
WordDelimiterFilter which generates new tokens (for example, depending
on the options you use, you can index wi-fi as wi and fi, wi-fi and
wifi).

The problem of this particular filter is also that we put it after the
ASCIIFoldingFilter because we want the input to be as clean as
possible but before the LowerCaseFilter as WordDelimiterFilter can do
its magic on case change too.

If you separate normalizer from tokenizer, I don't think it's going to
be easy to order them adequately.



More information about the hibernate-dev mailing list