Re: [hibernate-dev] [Wildcard] Search: changing the way we search

Tuesday, 4 March 2014

On 04 Mar 2014, at 12:09, Guillaume Smet <guillaume.smet(a)gmail.com&gt; wrote:

...
 On Tue, Mar 4, 2014 at 11:09 AM, Emmanuel Bernard
 <emmanuel(a)hibernate.org&gt; wrote:
> I would like to separate the notion of autosuggestion from the wildcard problem. To
me they are separate and I would love to Hibernate Search to offer an autosuggest and
spell checker API.

 AFAICS from the changelog of each version, autosuggest is still a vast
 work in progress in Lucene/Solr. 
So? :)

...

> Back to wildcard. If we have an analyser stack that separates normaliser filters from
filters generating additional tokens (see my email [AND]), then it is piece of cake to
apply the right filters, raise an exception if someone tries to wildcard on ngrams, and
simply ignore the synonym filter.

 In theory, yes.

 But for the tokenization, we use WhitespaceTokenizer and
 WordDelimiterFilter which generates new tokens (for example, depending
 on the options you use, you can index wi-fi as wi and fi, wi-fi and
 wifi). 
Ok that poses a problem for the wildcard if wi and if are separated. But I don’t think
it’s an issue for the AND case as we would get the expected query:
- hotel AND wi-fi
- hotel AND wi AND fi

And to be fair, how do you plan to make wildcard and wi fi work together in Lucene (any
solution available). The solution I can think of is to index the property with an analyzer
stack that does not split words like that in two tokens.

...

 The problem of this particular filter is also that we put it after the
 ASCIIFoldingFilter because we want the input to be as clean as
 possible but before the LowerCaseFilter as WordDelimiterFilter can do
 its magic on case change too.

 If you separate normalizer from tokenizer, I don't think it's going to
 be easy to order them adequately. 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] [Wildcard] Search: changing the way we search