[hibernate-dev] Hibernate web search

Hardy Ferentschik hibernate at ferentschik.de
Mon Sep 22 04:37:13 EDT 2008


Hi Adam,

in fact I was facing similar problems before, however the problem is more  
of a Lucene problem than a Hibernate Search one.
There are some threads regarding "error tolerant" query parsers on the  
Lucene mailing list. There are several approaches to the problem.
You could catch the ParseException and inspect the error message.  
According to some post it should be possible to extract the cause of the
error from the excpetion and maybe modify the query. A simpler approach  
might be to use QueryParser.escape() in case a ParseException occurs and
just escape the whole query string.

Alternatively, you write your own (more forgiving) parser. 'All' Lucene's  
QueryParser is doing is to transcribe the query string into a combintation
of org.apache.lucene.search.Query subclasses.

I think just using QueryParser.parse() out of the box is most of the times  
not sufficient and you always have to write some custom code around query
generation and handling. The question is whether any custom parser is  
generic enough to be used in a wide range of applications. I like the idea  
of a google
like query parser though. I wonder how Nutch is works in this area.

--Hardy


On Sun, 21 Sep 2008 11:01:36 +0200, Adam Warski <adam at warski.org> wrote:

> Hello,
>
> one feature I find missing from Hibernate Search is a possibility to  
> easily implement a web search.
>
> A good example is a blog app, where you can search contents of posts. A  
> post is a simple entity with a "body" field, which is indexed by  
> Hibernate Search/Lucene.
> You then have the normal search box, where the user enters his/hers  
> query. And now is the problem: what do to with this query?
>
> Solution 1.
> Pass it unchanged to the query parser, as is done for example in the  
> blog example in Seam. But that will in many cases generate exceptions.  
> For example - when there is an unclosed " or a :. You can of course  
> catch that exception and return to the user an empty result list - but  
> that's not what the user excepts.
>
> Solution 2.
> Escape any special characters (using QueryParser.escape) and then pass  
> it safely to query parser - but then the semantics of all special  
> constructs (like phrases: "...", including/excluding words: +/-, fuzzy  
> searches: ~ etc) stop to work. That is also not what the user expects.
>
> My proposed solution.
> The best way out, in my opinion, is to create a custom query pre-parser.  
> This parser would be very "forgiving" in case of any syntax errors.
> I think it would be best to support the query syntax that google uses  
> (that's what the users are accustomed to):
> * standard boolean operators AND, OR
> * quotes "..."
> * fuzzy/synonym search ~ (but in front of the word, not in the end)
> * word inclusion/exclusion: +/-
>
> Some Lucene constructs would be disabled, like boosting (^), field- 
> search (field_name:), *, ?.
>
> Any special characters in invalid positions would be escaped, for  
> example an unmatched quote or a + without a word following it. The  
> parser wouldn't be only a syntax-repairer, but also perform other  
> operations, like moving the ~ from the beginning of a word to the end,  
> or (maybe) adding a * to the end of each term.
>
> The implementation shouldn't be too complicated using either ANTLR/ 
> JavaCC or simply regular expressions and a string builder.
>
> Needless to say, this would also be useful in Seam :).
>
> What do you think?




More information about the hibernate-dev mailing list