[hibernate-dev] [Wildcard] Search: changing the way we search

Emmanuel Bernard emmanuel at hibernate.org
Tue Mar 4 05:09:32 EST 2014


This email is about wildcard

On 03 Mar 2014, at 17:11, Guillaume Smet <guillaume.smet at hibernate.org> wrote:
> 
> And why is it not ideal:
> 3/ wildcard and analyzers are really a pain with Lucene and you need
> to implement your own cleaning stuff to get a working wildcard query.
> 
> 
> IV. About wildcard queries
> --------------------------------------
> 
> Let's say it frankly: wildcard queries are a pain in Lucene.
> 
> Let's take an example:
> - You index "Parking" and you have a LowerCaseFilter so your index
> contains "parking";
> - You search for Parking without wildcard, it will work;
> - You search for Parki* with wildcard, yeah, it won't work.
> 
> This is due to the fact that for wildcards, the analyzers are ignored.
> Usually, because if you use ? or *, they can be replaced by the
> filters you use in your analyzers.
> 
> While we all understand the Lucene point of view from a technical
> perspective, I don't think we can keep this position for Hibernate
> Search as a user friendly search framework on top of Hibernate.
> 
> At Open Wide, we have a quite complex method which rewrites a search
> as a working autocompletion search which might work most of the time
> (with a high value of most...). It's kinda ugly, far from perfect and
> I'm wondering if we could have something more clever in Search. I once
> talked with Emmanuel about having different analyzers for Indexing,
> Querying (this is the Solr way) and Wildcards/Fuzzy search (this is
> IMHO a good idea as the way you want to normalize your wildcard query
> highly depends on the analyzer used to index your data).

I would like to separate the notion of autosuggestion from the wildcard problem. To me they are separate and I would love to Hibernate Search to offer an autosuggest and spell checker API.

Back to wildcard. If we have an analyser stack that separates normaliser filters from filters generating additional tokens (see my email [AND]), then it is piece of cake to apply the right filters, raise an exception if someone tries to wildcard on ngrams, and simply ignore the synonym filter.




More information about the hibernate-dev mailing list