From hibernate-commits at lists.jboss.org Mon Aug 23 13:57:18 2010
Content-Type: multipart/mixed; boundary="===============2960383044670994046=="
MIME-Version: 1.0
From: hibernate-commits at lists.jboss.org
To: hibernate-commits at lists.jboss.org
Subject: [hibernate-commits] Hibernate SVN: r20235 -
search/trunk/hibernate-search/src/main/docbook/en-US/modules.
Date: Mon, 23 Aug 2010 13:57:18 -0400
Message-ID: <201008231757.o7NHvI5b028504@svn01.web.mwc.hst.phx2.redhat.com>
--===============2960383044670994046==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Author: epbernard
Date: 2010-08-23 13:57:17 -0400 (Mon, 23 Aug 2010)
New Revision: 20235
Modified:
search/trunk/hibernate-search/src/main/docbook/en-US/modules/query.xml
Log:
HSEARCH-563 Finish documentation on Hibernate Search query DSL
Add JAVA role for better formatting
Modified: search/trunk/hibernate-search/src/main/docbook/en-US/modules/quer=
y.xml
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- search/trunk/hibernate-search/src/main/docbook/en-US/modules/query.xml =
2010-08-23 17:56:45 UTC (rev 20234)
+++ search/trunk/hibernate-search/src/main/docbook/en-US/modules/query.xml =
2010-08-23 17:57:17 UTC (rev 20235)
@@ -66,7 +66,7 @@
Creating a FullTextSession
=
- Session session =3D sessionFactory.openSession();
+ Session session =3D se=
ssionFactory.openSession();
...
FullTextSession fullTextSession =3D Search.getFullTextSession(session); =
@@ -77,7 +77,7 @@
If you use the Hibernate Search query DSL, it will look like
this:
=
- final QueryBuilder b =3D fullTex=
tSession.getSearchFactory().buildQueryBuilder().forEntity( Myth.class ).get=
();
+ =
final QueryBuilder b =3D fullTextSession.getSearchFactory().buildQueryBuild=
er().forEntity( Myth.class ).get();
org.apache.lucene.search.Query luceneQuery =3D
b.keyword()
.onField("history").boostedTo(3)
@@ -95,7 +95,7 @@
Creating a Lucene query from scratch via the query parser
=
- org.apache.lucene.queryParser.=
QueryParser parser =3D =
+ org.apache.lucene.queryParser.QueryParser parser =3D =
new QueryParser("title", fullTextSession.getSearchFactory().getAnalyze=
r(Myth.class) );
try {
org.apache.lucene.search.Query luceneQuery =3D parser.parse( "history:=
storm^3" );
@@ -121,7 +121,7 @@
Creating a Search query using the JPA API
=
- EntityManager em =3D entityManagerFactory.createEntity=
Manager();
+ EntityManager em =3D entit=
yManagerFactory.createEntityManager();
=
FullTextEntityManager fullTextEntityManager =3D =
org.hibernate.search.jpa.Search.getFullTextEntityManager(em);
@@ -158,7 +158,7 @@
You have several options: use the query parser (fine for simple
queries) or the Lucene programmatic API (for more complex use cases).
Particularly if you plan on using the programmatic API, we highly
- recommend you have a look at the Hibernate Search query DSL.
+ recommend you have a look at the Hibernate Search query DSL.
=
It is out of the scope of this documentation on how to exactly
build a Lucene query. Please refer to the online Lucene documentatio=
n or
@@ -173,7 +173,7 @@
quite complex. It's even more complex to understand the code once
written. Besides the inherent API complexity, you have to remember to
convert your parameters to their string equivalent as well as make s=
ure
- to apply the correct analyzer to the right field (an ngram analyzer =
will
+ to apply the correct analyzer to the right field (a ngram analyzer w=
ill
for example use several ngrams as the tokens for a given word and sh=
ould
be searched as such).
=
@@ -214,13 +214,13 @@
QueryBuilder from the
SearchFactory.
=
- QueryBuilder mythQB =3D searchFactory.buildQueryBuil=
der().forEntity( Myth.class ).get();
+ QueryBuilder mythQB =3D =
searchFactory.buildQueryBuilder().forEntity( Myth.class ).get();
=
You can also override the analyzer used for a given field or
fields. This is rarely needed and should be avoided unless you know =
what
you are doing (like many things :)).
=
- QueryBuilder mythQB =3D searchFactory.buildQueryBuil=
der()
+ QueryBuilder mythQB =3D =
searchFactory.buildQueryBuilder()
.forEntity( Myth.class )
.overridesForField("history","stem_analyzer_definition");
.get();
@@ -234,42 +234,303 @@
=
Here is how you search for a specific word:
=
- Query luceneQuery =3D mythQB.keyword().onField("hist=
ory").matching("storm").createQuery();
+ Query luceneQuery =
=3D mythQB.keyword().onField("history").matching("storm").createQuery();
=
keyword() means that you are trying to
find a specific word. onField() tells in wh=
ich
- lucene field to look. matching() tells what=
to
+ Lucene field to look. matching() tells what=
to
look for. And finally createQuery() does cr=
eate
- the Lucene query object. A lot is going on under this line of code.
- First the value storm is passed through the history
- FieldBridge: it does not matter here but you =
will
- see that it's quite handy when dealing with numbers or dates. Second=
the
- field bridge value is then passed to the analyzer used to index
- history.
+ the Lucene query object. A lot is going on with this line of
+ code.
=
- fluent api contextual autocompletion
+
+
+ The value storm is passed through the
+ history FieldBridge: it
+ does not matter here but you will see that it's quite handy when
+ dealing with numbers or dates.
+
=
- analyzer
+
+ The field bridge value is then passed to the analyzer used=
to
+ index history. This ensures that the query us=
es
+ the same term transformation than the indexing (lower case, n-gr=
am,
+ stemming and so on). If the analyzing process generates several
+ terms for a given word, a boolean query is used with the
+ SHOULD logic (roughly an OR
+ logic).
+
+
=
- query several words
+ Let's see how you can search a property that is not of type
+ string.
=
- ignore analyzer
+ @Entity @Indexed cla=
ss Myth {
+ @Field(index =3D Index.UN_TOKENIZED) @DateBridge(resolution =3D Resoluti=
on.YEAR)
+ public Date getCreationDate() { return creationDate; }
+ public Date setCreationDate(Date creationDate) { this.creationDate =3D c=
reationDate; }
+ private Date creationDate;
+ =
+ [...]
+}
=
- field bridge (ignore)
+Date birthdate =3D ...;
+Query luceneQuery =3D mythQb.keywork().onField("creationDate").matching(bi=
rthdate).createQuery();
=
- fuzzy wildcard
+
+ In plain Lucene, you would have had to convert the
+ Date object to its string representation (in
+ this case the year).
+
=
- range query (form to above below excludeLimit
+ This works for any object, not just Date,
+ provided that the FieldBridge has an
+ objectToString method (all built-in
+ FieldBridge implementations do).
=
- phrase query
+ Let's now have a look at how to search a field that uses ngram
+ analyzers. ngram analyzers do index the succession of ngrams of your
+ words which helps to recover from user typos. For example the 3-gram=
s of
+ the word hibernate are hib, ibe, ber, rna, nat, ate.
=
- boolean queries (must, should must not, all, except)
+ @AnalyzerDef(name =
=3D "ngram",
+ tokenizer =3D @TokenizerDef(factory =3D StandardTokenizerFactory.class ),
+ filters =3D {
+ @TokenFilterDef(factory =3D StandardFilterFactory.class),
+ @TokenFilterDef(factory =3D LowerCaseFilterFactory.class),
+ @TokenFilterDef(factory =3D StopFilterFactory.class),
+ @TokenFilterDef(factory =3D NGramFilterFactory.class,
+ params =3D { =
+ @Parameter(name =3D "minGramSize", value =3D "3"),
+ @Parameter(name =3D "maxGramSize", value =3D "3") } )
+ }
+)
+(a)Entity @Indexed class Myth {
+ @Field(analyzer=3D@Analyzer(definition=3D"ngram") @DateBridge(resolution=
=3D Resolution.YEAR)
+ public String getName() { return name; }
+ public String setName(Date name) { this.name =3D name; }
+ private String name;
+ =
+ [...]
+}
=
- multiple fields
+Date birthdate =3D ...;
+Query luceneQuery =3D mythQb.keywork().onField("name").matching("Sisiphus"=
).createQuery();
=
- boosted
+ The matching word "Sisiphus" will be lower-cased and then split
+ into 3-grams: sis, isi, sip, phu, hus. Each of these n-gram will be =
part
+ of the query. We will then be able to find the Sysiphus myth (with a
+ y). All that is transparently done for you.
=
- list of options
+
+ If for some reason you do not want a specific field to use t=
he
+ field bridge or the analyzer you can call the
+ ignoreAnalyzer() or
+ ignoreFieldBridge() functions
+
+
+ To search for multiple possible words in the same field, simply
+ add them all in the matching clause.
+
+ //search document wi=
th storm or lightning in their history
+Query luceneQuery =3D mythQB.keyword().onField("history").matching("storm =
lightning").createQuery();
+
+ To search the same word on multiple fields, use the
+ onFields method.
+
+ Query luceneQuery =
=3D mythQB.keyword().onFields("history","description","name").matching("sto=
rm").createQuery();
+
+ Sometimes, one field should be treated differently from another
+ field even if searching the same term, you can use the
+ andField() method for that.
+
+ Query luceneQuery =
=3D mythQB.keyword()
+ .onField("history")
+ .andField("name")
+ .boostedTo(5)
+ .andField("description")
+ .matching("storm")
+ .createQuery();
+
+ In the previous example, only field name is boosted to 5.
+
+ To do a fuzzy query (using the Levenshtein distance), start as=
a
+ keyword query and add the fuzzy flag.
+
+ Query luceneQuery =
=3D mythQB
+ .keyword()
+ .fuzzy()
+ .withThreshold( .8f )
+ .withPrefixLength( 1 )
+ .onField("history")
+ .matching("starm")
+ .createQuery();
+
+ threshold is the limit above which two terms
+ are considering matching. It's a decimal between 0 and 1 and default=
s to
+ 0.5. prefixLength is the length of the prefix ign=
ored
+ by the "fuzzyness": while it defaults to 0, a non zero value is
+ recommended for indexes containing a huge amount of distinct
+ terms.
+
+ You can also do wildcard queries (queries where some of parts =
of
+ the word are unknown. ? represents a single chara=
cter
+ and * represents any character sequence. Note that
+ for performance purposes, it is recommended that the query does not
+ start with either ? or *.
+
+ Query luceneQuery =
=3D mythQB
+ .keyword()
+ .wildcard()
+ .onField("history")
+ .matching("sto*")
+ .createQuery();
+
+
+ Wildcard queries do not apply the analyzer on the matching
+ terms. Otherwise the risk of * or
+ ? being mangled is too high.
+
+
+ So far we have been looking for words or sets of words, you can
+ also search exact or approximate sentences. Use the
+ phrase() query.
+
+ Query luceneQuery =
=3D mythQB
+ .phrase()
+ .onField("history")
+ .matching("Thou shalt not kill")
+ .createQuery();
+
+ You can search approximate sentences by adding a slop factor. =
The
+ slop factor represents the number of other words permitted in the
+ sentence: this works like a within or near operator
+
+ Query luceneQuery =
=3D mythQB
+ .phrase()
+ .withSlop(3)
+ .onField("history")
+ .matching("Thou kill")
+ .createQuery();
+
+ We are done with queries related to a given word. You can also=
do
+ range queries (on numbers, dates, strings etc). You can look for a v=
alue
+ in between boundaries (included or not) and for a value below or abo=
ve a
+ given boundary (included or not).
+
+ //look for 0 <=3D=
starred < 3
+Query luceneQuery =3D mythQB
+ .range()
+ .onField("starred")
+ .from(0).to(3).excludeLimit()
+ .createQuery();
+
+//look for myths strictly BC
+Date beforeChrist =3D ...;
+Query luceneQuery =3D mythQB
+ .range()
+ .onField("starred")
+ .below(beforeChrist).excludeLimit()
+ .createQuery();
+
+ Finally, you can aggregate queries together to create more com=
plex
+ queries. These aggregation operators are known as boolean queries wh=
ere
+ the operators are:
+
+
+
+ SHOULD: the query query should contain the matching elemen=
ts
+ of the subquery
+
+
+
+ MUST: the query must contain the matching elements of the
+ subquery
+
+
+
+ MUST NOT: the query must not contain the matching elements=
of
+ the subquery
+
+
+
+ The subqueries can be any Lucene query including a boolean que=
ry
+ itself. Let's look at a few examples://look for popular modern myths that are not urban
+Date twentiethCentury =3D ...;
+Query luceneQuery =3D mythQB
+ .bool()
+ .must( mythQB.keyword().onField("description").matching("urban").cre=
ateQuery() )
+ .not()
+ .must( mythQB.range().onField("starred").above(4).createQuery() )
+ .createQuery();
+
+//look for myths that are preferably urban
+Query luceneQuery =3D mythQB
+ .bool()
+ .should( mythQB.keyword().onField("description").matching("urban").c=
reateQuery() )
+ .must( mythQB.range().onField("starred").above(4).createQuery() )
+ .createQuery();
+
+//look for all myths except religious ones
+Query luceneQuery =3D mythQB
+ .all()
+ .except( monthQb.keyword().onField( "description_stem" ).matching( "=
religion" ).createQuery() )
+ .createQuery();
+
+ You can apply some options to query types and fields:
+
+
+
+ boostedTo (on query type and on
+ field): boost the whole query or the specific field to a given
+ factor
+
+
+
+ withConstantScore (on query): all
+ results matching the query have a constant score equals to the
+ boost
+
+
+
+ filteredBy(Filter) (on query): fi=
lter
+ query results using the Filter
+ instance
+
+
+
+ ignoreAnalyzer (on field): ignore=
the
+ analyzer when processing this field
+
+
+
+ ignoreFieldBridge (on field): ign=
ore
+ field bridge when processing this field
+
+
+
+ Let's check out an example using some of these options
+
+ Query luceneQuery =3D mythQB
+ .bool()
+ .should( mythQB.keyword().onField("description").matching("urban").c=
reateQuery() )
+ .should( mythQB
+ .keyword()
+ .onField("name")
+ .boostedTo(3)
+ .ignoreAnalyzer()
+ .matching("urban").createQuery() )
+ .must( mythQB
+ .range()
+ .boostedTo(5).withConstantScore()
+ .onField("starred").above(4).createQuery() )
+ .createQuery();
+
+ As you can see, Hibernate Search query DSL is a fairly high and
+ easy to read query API. By accepting and producing Lucene queries, y=
ou
+ can easily incorporate query types not (yet) supported by the DSL.
+ Please give us feedback!
=
--===============2960383044670994046==--