[HSearch] DSL for Lucene queries (was: Re: [infinispan-dev] Query module new API and configurations)
by Emmanuel Bernard
I've been thinking about a DSL to build Lucene queries in the last day.
What do you think of this proposal?
A few remarks:
- it asks the analyzer so that we correctly apply the analyzer on
terms
- it has a few query factory methods
- it contains a few orthogonal operations
- I am not quite satisfied with how boolean is handled, any idea?
Examples
SealedQueryBuilder qb = searchFactory.withEntityAnalyzer(Address.class);
Query luceneQuery =
qb.must(Occurs.MUST)
.add(
qb.boolean(Occurs.Should)
.add( qb.term("city",
"Atlanta").boostedTo(4).createQuery() )
.add( qb.term("address1",
"Peachtree").fuzzy().createQuery() )
)
.add(
qb.from("movingDate",
"200604").to("201201").exclusive().createQuery()
)
.createQuery();
Analyzer choice
queryBuilder.withAnalyzer(Analyzer)
queryBuilder.withEntityAnalyzer(Class<?>)
queryBuilder.basedOnEntityAnalyzer(Class<?>)
.overridesForField(String field, Analyzer)
.overridesForField(String field, Analyzer)
.build() //sucky name
returns a SealedQueryBuilder //sucky name
SealedQueryBuilder contains the factory methods
Factory methods
Hosted onSealedQueryBuilder
.term(String field, String text) //define a new query
.term(String field, String text) //define a new query
.ignoreAnalyzer() //ignore the analyzer, optional
.fuzzy() //API prevent wildcard calls, optional
.threshold() //optional
.prefixLengh() //optional
.term(String field, String value)
.wildcard() //API prevent fuzzy calls, optional
//range query
.from(String field, String text)
.exclusive() //optional
.to(String text)
.exclusive() //optional
.constantScore() //optional, due to constantScoreRangeQuery but
in practice inherited from the common operations
//match all docs
.all()
//phrase query
.phrase(String field)
.ignoreAnalyzer() //ignore the analyzer, optional
.addWord(String text) //at least one
.addWord(String text)
.sentence(String text) //do we need that?
.slop() //optional
//search multiple fields for same value
.searchInMultipleFields()
.onField(String field)
.boostedTo(float) //optional
.ignoreAnalyzer() //optional
.onField(String field)
.forWords(String) //do we need that?
.forWord(String)
Boolean operations
SealedQueryBuilder contains the boolean methods
.boolean(Occurs occurs)
.add( qb.from().to() )
.add( ... )
Works on all queries
.boostedTo()
.constantScore()
.filter(Filter) //filter the current query
.scoreMultipliedByField(field) //FieldScoreQuery +
FunctionQuery?? //Not backed
.createQuery()
Todo
Span*Queries
MultiPhraseQuery - needs to fillup all accepted terms
FieldScoreQuery
ValueSourceQuery
FuzzyLikeThis
MoreLikeThis
On 25 août 09, at 16:43, Manik Surtani wrote:
>
> On 25 Aug 2009, at 13:34, Emmanuel Bernard wrote:
>
>>
>> On 25 août 09, at 14:27, Manik Surtani wrote:
>>
>>> A DSL would work, but I'd rather not define our own language here.
>>> Which is why I asked for a standard. Perhaps something based on
>>> SQL/
>>> JPA-QL? Or are you thinking DSL specific to Lucene - which could
>>> be used by any/all of {Lucene, Hibernate Search, Infinispan}? In
>>> which case the DSL should ideally be a Lucene project.
>>
>> Yes I was thinking about a DSL used for Hibernate Search and maybe
>> all
>> of Lucene if the HS integration benefits offer no value towards
>> simplicity (but I think i can offer value).
>
>
> Ok, this should be interesting. Lets chat about this some more - have
> you drafted any thoughts around this DSL somewhere?
15 years, 2 months
Feedback on Infinispan patch
by Emmanuel Bernard
Hey Lukasz,
Your patch looks quite good and pass tests on my side.
I encourage others to check out the patch before we apply it (ideally
another person form HSearch and one person from infinispan.
Lukasz, I have a few questions/remarks though before applying it. Can
you answer / adjust the patch?
IndexWriterSetting
Why move to return Object in parsing from the initial int?
Move DPHelper#createInfinispanCacheManager to IDP
this is not something that can be shared as it creates a hard
dependency on infinispan otherwise.
in createInfinispanCacheManager
Don't log in error the fact that xml is not used if a default config
is used. Just log in trace at best.
Rename InfinispanCacheManagerConfigurationImpl to
DefaultInfinispanCacheManagerConfiguration or even better with a name
describing nicely the behavior of the infinispan config.
in InfinispanIndexOutput, is it possible to get writeBytes bigger than
buffer size? If yes, does newCheck creates the appropriate numbers of
chunks?
InfinispanDirectoryProvider
put the configuration proeprties available in the
InfinispanDirectoryProvider javadoc.
I think the default cache name should be "Hibernate Search" instead of
"HSInfinispanCache". We know it's in infinispan :)
what's the try catch opening and closing an IW about? It looks weird.
in stop()
you don't close the CacheManager? How is that?
InfinispanCacheManagerConfigurationImpl
What does "Infinispan-Cluster" correspond to? Why this name? Shouldn't
it be "Hibernate Search cluster"?
Is it safe to override the GlobalConfiguration? What if JBoss AS use
infinispan to run?
Why the use of DummyTransactionManagerLookup. Doesn't Infinispan guess
the right TM depending on the environment? e in JBoss As use the JBoss
one etc? I think GenericTransactionManagerLookup does that.
InfinispanCacheManagerConfiguration
some javadoc on the methods would be useful. I don't know what do
implement here.
Is there a better name for Metadata? Like FileMetadata maybe?
Where is ispn-cache-default-conf.xml used? For tests only? If not: is
it possible to use a programmatic version instead and what is "It's a
movie cache"?
Emmanuel
Begin forwarded message:
> From: Łukasz Moreń <lukasz.moren(a)gmail.com>
> Date: 21 août 2009 02:11:03 HAEC
> To: Emmanuel Bernard <emmanuel(a)hibernate.org>
> Subject: GSoC patch with Infinispan Directory Provider
>
> I'm sending patch and piece of documentation - not much but
> necessary information are included.
> There are some todos but I didn't manage to finish it yet.
> I changed maven jgroups dependency to 2.8.beta2, before version was
> clashed with used by infinispan.
> In pom file there was dependency on hibernate common annotations
> 3.2.shapshot. It should't be 3.5?
>
> Cheers,
> Lukasz
15 years, 2 months
Re: [hibernate-dev] [infinispan-dev] [HSearch] DSL for Lucene queries (was: Re: Query module new API and configurations)
by Sanne Grinovero
Sure I like it! I'm in the swamp of old mails, so I give you my first
impression only:
Even if it's fluent it's not (yet) intuitive to me which methods I should call;
Query luceneQuery =
qb.must(Occurs.MUST)
.add(
qb.boolean(Occurs.Should)
.add( qb.term("city", "Atlanta").boostedTo(4).createQuery() )
.add( qb.term("address1", "Peachtree").fuzzy().createQuery() )
)
.add(
qb.from("movingDate", "200604").to("201201").exclusive().createQuery()
)
.createQuery();
I guess there is a typo? As "must(MUST)" is a bit confusing to me.
why not
qb.booleanQuery()
.Must( qb.otherQuery(...).. )
.Should( qb.secondQuery(..).. )
.build();
and
qb.termQuery("city", "Atlanta").boostedTo(4).createQuery())
or even overloading
qb.termQuery("city", "Atlanta").createQuery())
with
qb.termQuery("city", "Atlanta", 4f).createQuery())
is not as readable as "boostedTo" method but more immediate;
intelligent IDEs should propose the options to devs while typing, even
guessing the parameter name and making it's meaning self-evident.
qb.rangeQuery could be either
rangeQuery("field", "fromX", "toY")
or
rangeQuery("field").from("x").to("y")
so why are you choosing ("field","from").to("to") ?
Thinking about the RangeQuery on dates, it would be cool to accept any
type for which we have Bridges, like accepting Date type or even a
user-defined FieldBridge together with an Object.
I like the Analyzer choices, it would be very cool if we could by
default guess the correct one from the searched-for entity types.
We could even consider a Query-By-Example query builder, reading
indexed fields from an instance of an indexed type, or something like
HSEARCH-119 proposal (for termvectors similatory).
cheers,
Sanne
2009/8/28 Emmanuel Bernard <emmanuel(a)hibernate.org>:
> Hey Sanne,
> What do you think of the PAI proposal itself?
> Like it? See improvements?
>
> On 28 août 09, at 10:37, Sanne Grinovero wrote:
>
>> I've nothing against a separate maven module, still Hibernate Search
>> already has lots of "goodies" to work with Lucene which are not
>> necessarily linked to Hibernate (e.g. Analyzer definition helpers,
>> pojo mapping through annotations, enhanced filtering, IndexReader
>> pooling, nice Infinispan Directory...) so this new query builder is
>> not much different. Just a thought.
>>
>> So even if Emmanuel has shown this builder to be useful even with this
>> limited features, it could become even more useful when strongly
>> combined with the other features; 2 come to mind, may be more later:
>>
>> A) adding filters to the builders; I don't think it would be easy to
>> have named filters without the full Search package
>>
>> B) Letting the users forget about the Analyzer matches complexity
>> (optionally), as by using the mapping information we could default to
>> a reasonable Analyzer for each field. Most users on the forum are in
>> trouble because they select the wrong analyzer/ forget to use one when
>> building the F.T.Query.
>>
>> IMHO these are good reasons to couple it to the rest of the code;
>> Maybe it would be possible in future to have Hibernate optional.
>>
>> Sanne
>>
>>
>> 2009/8/27 Manik Surtani <manik(a)jboss.org>:
>>>
>>> On 27 Aug 2009, at 16:10, Emmanuel Bernard wrote:
>>>
>>>
>>> queryBuilder.withAnalyzer(Analyzer)
>>> queryBuilder.withEntityAnalyzer(Class<?>)
>>> queryBuilder.basedOnEntityAnalyzer(Class<?>)
>>> .overridesForField(String field, Analyzer)
>>> .overridesForField(String field, Analyzer)
>>> .build() //sucky name
>>>
>>> Perhaps rename the static factory methods to something like:
>>> QueryBuilder.getQueryBuilder(Analyzer)
>>> QueryBuilder.getQueryBuilder(Class<?>)
>>> and QueryBuilder instances have overrideAnalyzerForField(String,
>>> Analyzer).
>>> Why do you need the build() method at the end?
>>>
>>> if you do that, all of the sudden, a QB can change it's analyzer on the
>>> fly
>>> making it immutable.
>>> Also the overridesForField methods would pollute the API when it's time
>>> to
>>> create a query.
>>> One of the advantages of a fluent API in a strongly typed environment is
>>> that we can hide methods that are meaningless in a given context.
>>>
>>> That been said, if the API ends up being pure Lucene and once we
>>> stabilize
>>> it, we can contribute it back even though I am not necessarily a huge fan
>>> of
>>> ASL.
>>>
>>> Not it doesn't have to be either ASL or even hosted at Apache. I guess
>>> what
>>> I am suggesting is perhaps even a separate project - LuceneQueryBuilder
>>> or
>>> something - which plain-old-Lucene users could use as well. Doesn't
>>> matter
>>> where it's hosted or what the license is - as long as its ASL or LGPL :)
>>>
>>> Let's start it under the Hibernate Search umbrella due to potential
>>> synergies and spin it out if needed.
>>>
>>> Ok. Just make sure we use a different maven module or something so that
>>> there are no dependencies on the rest of HS or Hibernate. Otherwise
>>> spinning out will be a PITA. Lucene should be the only dependencies of
>>> this
>>> code.
>>> Cheers
>>> --
>>> Manik Surtani
>>> manik(a)jboss.org
>>> Lead, Infinispan
>>> Lead, JBoss Cache
>>> http://www.infinispan.org
>>> http://www.jbosscache.org
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>
>
15 years, 2 months
Fwd: [Hibernate-JIRA] Updated: (HHH-1803) Allow fetching with criteria when scrolling
by Sanne Grinovero
On Search's forums it appears that many people get hit by this one, I
agree it looks like important so I promised to "scale the question
up".
Kai Hoerder attached a patch recently, could some expert in Core take
a look into? It appears to have a serious impact on the "rebuild
indexes" time for Hibernate Search.
thanks,
Sanne
---------- Forwarded message ----------
From: Kai Hoerder (JIRA) <noreply(a)atlassian.com>
Date: 2009/8/27
Subject: [Hibernate-JIRA] Updated: (HHH-1803) Allow fetching with
criteria when scrolling
To: sanne.grinovero(a)gmail.com
[ http://opensource.atlassian.com/projects/hibernate/browse/HHH-1803?page=c...
]
Kai Hoerder updated HHH-1803:
-----------------------------
Attachment: Diffs-applied-to-3.3.2.GA.zip
Reports on the provided fix applied to the current Hibernate version (3.3.2.GA)
> Allow fetching with criteria when scrolling
> -------------------------------------------
>
> Key: HHH-1803
> URL: http://opensource.atlassian.com/projects/hibernate/browse/HHH-1803
> Project: Hibernate Core
> Issue Type: Improvement
> Components: query-criteria
> Affects Versions: 3.2.0.cr2
> Reporter: Maarten Winkels
> Attachments: Child.java, criteria-scroll-fetch-collection.patch, CriteriaScrollFetchTest.java, Diffs-applied-to-3.3.2.GA.zip, Parent.java, ParentChild.hbm.xml
>
>
> When querying by criteria, fetching is allowed, but when scrolling a criteria, the fetching corrupts the result.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://opensource.atlassian.com/projects/hibernate/secure/Administrators....
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
15 years, 2 months
how to disable the annotation where
by Tiago Mesquita
Hi Guys,
I'm making a soft delete using @SQLDelete and @Where with a flag called
"active"... when active = 1 the object is not deleted, else, the object was
removed and don't come back in the search queries... but i want to bring him
back even active = 0... so maybe i'll have to disable the @Where annotation
programmatically.
Someone know how to make it?
tnx all!!!
--
Utilize o "CCO" na hora de encaminhar suas msgs,
isso impossibilita que programas espiões capturem os
e-mails de seus contatos e façam deles alvo de Spam.
Atenciosamente,
Tiago Mesquita A. C.
http://twitter.com/tiagomac
15 years, 3 months
Hibernate-dev@lists.jboss.org
by The Post Office
This message was undeliverable due to the following reason:
Your message could not be delivered because the destination computer was
not reachable within the allowed queue period. The amount of time
a message is queued before it is returned depends on local configura-
tion parameters.
Most likely there is a network problem that prevented delivery, but
it is also possible that the computer is turned off, or does not
have a mail system running right now.
Your message was not delivered within 6 days:
Host 147.180.151.28 is not responding.
The following recipients did not receive this message:
<hibernate-dev(a)lists.jboss.org>
Please reply to postmaster(a)lists.jboss.org
if you feel this message to be in error.
15 years, 3 months
Re: [hibernate-dev] Package renaming
by Emmanuel Bernard
Hi
I think it would be slightly better to do the package renaming.
I can't find any downside to it. The diff is too small to be useful to
developers anyway.
On 26 août 09, at 10:25, Hardy Ferentschik wrote:
> Hi,
>
> Regarding HV-185 - I checked the class names and there wouldn't be
> any conflicts. What's your opinion on a package refactoring now?
>
> --Hardy
15 years, 3 months
Hibernate Search test suite
by Emmanuel Bernard
Hibernate Search test suite used to run in about 2 mins. Now it runs
in a solid 7 minutes. Can we get that number back?
It seems like maven is pausing between tests or something like that.
15 years, 3 months