[hibernate-dev] Hibernate web search

Emmanuel Bernard emmanuel at hibernate.org
Mon Sep 22 13:08:56 EDT 2008


I like the idea of a parser using the Google syntax (you don't have to  
disable explicit fields BTW - recognzing a term:term syntax should be  
doable). The hard problem to crack is what's behind. I explain that in  
Hibernate Search in Action, a lot of good search engine do searches in  
tiers:
  - exact search
  - phonetic search
  - fuzzy search
  - replace ANDs with ORs

I guess you can simulate part of it by boosting exact fields as  
opposed to approximation fields in the multi field query parser. This  
was not really possible until recently but the  
SearchFactory.getAnalyzer(MyEntity.class) makes it much easier.

We should add the Google like feature to the 3.2 list amongst other  
higher level query enhancement like spell checking.
Who wants to take the lead? I have always considered grammar and  
parser developments awkward for my tastes :)


--
Emmanuel Bernard
http://in.relation.to/Bloggers/Emmanuel | http://blog.emmanuelbernard.com 
  | http://twitter.com/emmanuelbernard
Hibernate Search in Action (http://is.gd/Dl1)

On  Sep 21, 2008, at 05:01, Adam Warski wrote:

> Hello,
>
> one feature I find missing from Hibernate Search is a possibility to  
> easily implement a web search.
>
> A good example is a blog app, where you can search contents of  
> posts. A post is a simple entity with a "body" field, which is  
> indexed by Hibernate Search/Lucene.
> You then have the normal search box, where the user enters his/hers  
> query. And now is the problem: what do to with this query?
>
> Solution 1.
> Pass it unchanged to the query parser, as is done for example in the  
> blog example in Seam. But that will in many cases generate  
> exceptions. For example - when there is an unclosed " or a :. You  
> can of course catch that exception and return to the user an empty  
> result list - but that's not what the user excepts.
>
> Solution 2.
> Escape any special characters (using QueryParser.escape) and then  
> pass it safely to query parser - but then the semantics of all  
> special constructs (like phrases: "...", including/excluding words:  
> +/-, fuzzy searches: ~ etc) stop to work. That is also not what the  
> user expects.
>
> My proposed solution.
> The best way out, in my opinion, is to create a custom query pre- 
> parser. This parser would be very "forgiving" in case of any syntax  
> errors.
> I think it would be best to support the query syntax that google  
> uses (that's what the users are accustomed to):
> * standard boolean operators AND, OR
> * quotes "..."
> * fuzzy/synonym search ~ (but in front of the word, not in the end)
> * word inclusion/exclusion: +/-
>
> Some Lucene constructs would be disabled, like boosting (^), field- 
> search (field_name:), *, ?.
>
> Any special characters in invalid positions would be escaped, for  
> example an unmatched quote or a + without a word following it. The  
> parser wouldn't be only a syntax-repairer, but also perform other  
> operations, like moving the ~ from the beginning of a word to the  
> end, or (maybe) adding a * to the end of each term.
>
> The implementation shouldn't be too complicated using either ANTLR/ 
> JavaCC or simply regular expressions and a string builder.
>
> Needless to say, this would also be useful in Seam :).
>
> What do you think?
>
> -- 
> Adam
>
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev




More information about the hibernate-dev mailing list