I've been thinking about a DSL to build Lucene queries in the last day.
What do you think of this proposal?

A few remarks:
 - it asks the analyzer so that we correctly apply the analyzer on terms
 - it has a few query factory methods
 - it contains a few orthogonal operations
 - I am not quite satisfied with how boolean is handled, any idea?



Examples


SealedQueryBuilder qb = searchFactory.withEntityAnalyzer(Address.class);


Query luceneQuery = 
qb.must(Occurs.MUST)
    .add(
        qb.boolean(Occurs.Should)
            .add( qb.term("city", "Atlanta").boostedTo(4).createQuery() )
            .add( qb.term("address1", "Peachtree").fuzzy().createQuery() )
    )
    .add(
        qb.from("movingDate", "200604").to("201201").exclusive().createQuery()
    )
    .createQuery();
                     


Analyzer choice
queryBuilder.withAnalyzer(Analyzer)
queryBuilder.withEntityAnalyzer(Class<?>)
queryBuilder.basedOnEntityAnalyzer(Class<?>)
                    .overridesForField(String field, Analyzer)
                    .overridesForField(String field, Analyzer)
                    .build() //sucky name
returns a SealedQueryBuilder //sucky name


SealedQueryBuilder contains the factory methods




Factory methods
Hosted onSealedQueryBuilder


.term(String field, String text) //define a new query
.term(String field, String text) //define a new query
   .ignoreAnalyzer() //ignore the analyzer, optional
   .fuzzy() //API prevent wildcard calls, optional
     .threshold() //optional
     .prefixLengh() //optional
.term(String field, String value)
   .wildcard() //API prevent fuzzy calls, optional


//range query
.from(String field, String text)
       .exclusive() //optional
    .to(String text)
       .exclusive() //optional
    .constantScore() //optional, due to constantScoreRangeQuery but in practice inherited from the common operations


//match all docs
.all() 


//phrase query
.phrase(String field)
    .ignoreAnalyzer() //ignore the analyzer, optional
    .addWord(String text) //at least one
    .addWord(String text)
    .sentence(String text) //do we need that?
    .slop() //optional


//search multiple fields for same value
.searchInMultipleFields()
  .onField(String field)
      .boostedTo(float) //optional
      .ignoreAnalyzer() //optional
  .onField(String field)
  .forWords(String) //do we need that?
  .forWord(String)




Boolean operations
SealedQueryBuilder contains the boolean methods


.boolean(Occurs occurs)

  .add( qb.from().to() )
  .add( ... )




Works on all queries
    .boostedTo()
    .constantScore() 
    .filter(Filter) //filter the current query
    .scoreMultipliedByField(field) //FieldScoreQuery + FunctionQuery?? //Not backed
    .createQuery()




Todo
Span*Queries
  
MultiPhraseQuery - needs to fillup all accepted terms
FieldScoreQuery
ValueSourceQuery
FuzzyLikeThis
MoreLikeThis


On 25 août 09, at 16:43, Manik Surtani wrote:


On 25 Aug 2009, at 13:34, Emmanuel Bernard wrote:


On 25 août 09, at 14:27, Manik Surtani wrote:

A DSL would work, but I'd rather not define our own language here.
Which is why I asked for a standard.  Perhaps something based on SQL/
JPA-QL?  Or are you thinking  DSL specific to Lucene - which could
be used by any/all of {Lucene, Hibernate Search, Infinispan}?  In
which case the DSL should ideally be a Lucene project.

Yes I was thinking about a DSL used for Hibernate Search and maybe all
of Lucene if the HS integration benefits offer no value towards
simplicity (but I think i can offer value).


Ok, this should be interesting.  Lets chat about this some more - have  
you drafted any thoughts around this DSL somewhere?