On 04 Mar 2014, at 12:09, Guillaume Smet <guillaume.smet(a)gmail.com> wrote:
On Tue, Mar 4, 2014 at 11:09 AM, Emmanuel Bernard
<emmanuel(a)hibernate.org> wrote:
> I would like to separate the notion of autosuggestion from the wildcard problem. To
me they are separate and I would love to Hibernate Search to offer an autosuggest and
spell checker API.
AFAICS from the changelog of each version, autosuggest is still a vast
work in progress in Lucene/Solr.
So? :)
> Back to wildcard. If we have an analyser stack that separates normaliser filters from
filters generating additional tokens (see my email [AND]), then it is piece of cake to
apply the right filters, raise an exception if someone tries to wildcard on ngrams, and
simply ignore the synonym filter.
In theory, yes.
But for the tokenization, we use WhitespaceTokenizer and
WordDelimiterFilter which generates new tokens (for example, depending
on the options you use, you can index wi-fi as wi and fi, wi-fi and
wifi).
Ok that poses a problem for the wildcard if wi and if are separated. But I don’t think
it’s an issue for the AND case as we would get the expected query:
- hotel AND wi-fi
- hotel AND wi AND fi
And to be fair, how do you plan to make wildcard and wi fi work together in Lucene (any
solution available). The solution I can think of is to index the property with an analyzer
stack that does not split words like that in two tokens.
The problem of this particular filter is also that we put it after the
ASCIIFoldingFilter because we want the input to be as clean as
possible but before the LowerCaseFilter as WordDelimiterFilter can do
its magic on case change too.
If you separate normalizer from tokenizer, I don't think it's going to
be easy to order them adequately.