Spellchecker in HSearch
by Emmanuel Bernard
OK I looked at the SpellChecker code from the Lucene contrib) and
there is no trivial way to integrate it more nicely into Hibernate
Search. They did not really plan for flexibility.
But it's doable to fork this code and make a spellchecker feature in
Hibernate Search. Here are a few limitations they have that we need to
remove. Here is a proposal.
- more than one field and even more than one index should be the
source of the dictionary
- declaring which field is part of a dictionary should be
declarative @AddToDictionary(name="main-dictionary") (with probably an
optional field name(s))
- for a given dictionary name, we need to associate an index and
probably a field prefix (to share the same index for all
dictionaries). This couold be by annotations or in the configuration
(not sure)
- this index should be kept like any other DirectoryProvider in the
HSearch intrastructure (we could then reuse ReaderProvider internally
and don't have leaks).
- we will probably add a manual dictionary reindex method and the
SpellChecker API should be in the SearchFactory.
sf.getDictionary("main-dictionary").spellcheck(word, suggestion);
sf.getDictionary("main-dictionary").spellcheck(word, suggestion,
entity, properties...); //not sure how useful is this one really
sf.getDictionary("main-dictionary").reindex();
Of course this needs to be refined.
This is probably 3 or 4 days of work but unfortunately we cannot do
that for the book it seems. I would be wort mentioning in the section
that a more integrated solution in on it's way.
--
Emmanuel Bernard
http://in.relation.to/Bloggers/Emmanuel | http://blog.emmanuelbernard.com
| http://twitter.com/emmanuelbernard
Hibernate Search in Action (http://is.gd/Dl1)
On Sep 12, 2008, at 09:15, John Griffin wrote:
>
> Here's the file as a zip. Maybe the pdf wasn't passing.
>
> Also, Nermina pointed out to me that I skipped section 7.3. So ANY
> reference
16 years, 3 months
RE: Hibernate Search Spell Checker
by John Griffin
Leave the sf.getDictionary("main-dictionary").spellcheck(word, suggestion,
entity, properties...); until a later version if at all. I'll look at your
suggestions and compare them to how I was thinking of going about it. I
believe it should be controlled exclusively by annotations and the multiple
fields and dictionaries is in line with my thoughts.
Couldn't possibly be done for the book so I'll put a NOTE in saying we're
working on a better :>) solution for Hibernate Search. I really want to work
on this since I've been away from the project for so long. It WILL take more
than 3 or 4 days since I have limited time for working on the source.
Do you want me to put a Jira issue in on this?
Did you get my full statement on section 7.3?
John Griffin
http://thediningphilosopher.blogspot.com
-----Original Message-----
From: hibernate-dev-bounces(a)lists.jboss.org
[mailto:hibernate-dev-bounces@lists.jboss.org] On Behalf Of
hibernate-dev-request(a)lists.jboss.org
Sent: Saturday, September 13, 2008 10:00 AM
To: hibernate-dev(a)lists.jboss.org
Subject: hibernate-dev Digest, Vol 27, Issue 8
Send hibernate-dev mailing list submissions to
hibernate-dev(a)lists.jboss.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.jboss.org/mailman/listinfo/hibernate-dev
or, via email, send a message with subject or body 'help' to
hibernate-dev-request(a)lists.jboss.org
You can reach the person managing the list at
hibernate-dev-owner(a)lists.jboss.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of hibernate-dev digest..."
Today's Topics:
1. Spellchecker in HSearch (Emmanuel Bernard)
----------------------------------------------------------------------
Message: 1
Date: Sat, 13 Sep 2008 10:19:41 -0400
From: Emmanuel Bernard <emmanuel(a)hibernate.org>
Subject: [hibernate-dev] Spellchecker in HSearch
To: John Griffin <griffij(a)sosstaffing.com>, John Griffin
<jgriffin(a)thebluezone.net>
Cc: hibernate-dev(a)lists.jboss.org
Message-ID: <878228FD-9EC3-4948-A363-D9C4F41D92F9(a)hibernate.org>
Content-Type: text/plain; charset="us-ascii"
OK I looked at the SpellChecker code from the Lucene contrib) and
there is no trivial way to integrate it more nicely into Hibernate
Search. They did not really plan for flexibility.
But it's doable to fork this code and make a spellchecker feature in
Hibernate Search. Here are a few limitations they have that we need to
remove. Here is a proposal.
- more than one field and even more than one index should be the
source of the dictionary
- declaring which field is part of a dictionary should be
declarative @AddToDictionary(name="main-dictionary") (with probably an
optional field name(s))
- for a given dictionary name, we need to associate an index and
probably a field prefix (to share the same index for all
dictionaries). This couold be by annotations or in the configuration
(not sure)
- this index should be kept like any other DirectoryProvider in the
HSearch intrastructure (we could then reuse ReaderProvider internally
and don't have leaks).
- we will probably add a manual dictionary reindex method and the
SpellChecker API should be in the SearchFactory.
sf.getDictionary("main-dictionary").spellcheck(word, suggestion);
sf.getDictionary("main-dictionary").spellcheck(word, suggestion,
entity, properties...); //not sure how useful is this one really
sf.getDictionary("main-dictionary").reindex();
Of course this needs to be refined.
This is probably 3 or 4 days of work but unfortunately we cannot do
that for the book it seems. I would be wort mentioning in the section
that a more integrated solution in on it's way.
--
Emmanuel Bernard
http://in.relation.to/Bloggers/Emmanuel | http://blog.emmanuelbernard.com
| http://twitter.com/emmanuelbernard
Hibernate Search in Action (http://is.gd/Dl1)
On Sep 12, 2008, at 09:15, John Griffin wrote:
>
> Here's the file as a zip. Maybe the pdf wasn't passing.
>
> Also, Nermina pointed out to me that I skipped section 7.3. So ANY
> reference
16 years, 3 months
Search: backend refactoring
by Sanne Grinovero
The short question:
may I add some methods to the implementations of LuceneWork?
I'm refactoring the backends and it would help, but there
is a warning there in the javadoc about not changing it freely.
Sanne
The same question, a bit more verbose:
Hi,
I've been puzzling about several optimization in Search I would like
to implement,
but am needing to do some refactoring in the
org.hibernate.search.backend package.
(mostly done actually, but needing your ideas)
Most changes affect "lucene" implementation, but the code would be
greatly simplified,
more readable and (performing better too IMHO) if I'm permitted to change the
current implementations of LuceneWork; however there's a big warning there
about a requirement to be backwards compatible with the serialized form.
(btw OptimizeLuceneWork is missing the "magic serialization number")
I would like to add them some methods, and a single field which could actually
be transient so I could attempt to maintain the compatibility.
Additionally I've been thinking that iff you like to keep the LuceneWork as
a very simple transport and prefer to not add methods, it would be nicer to
have just one class and have the AddLuceneWork/DeleteLuceneWork/... to
differentiate
by a field (using org.hibernate.search.backend.WorkType ?)
to mark the different type of work; so I could add
the methods I'm needing to the enum.
Also I could see some use of having an UpdateLuceneWork too, so that it is
the backend implementation's business to decide if he wants to split it in a
delete+insert or do something more clever:
the receive order of messages would be less critical and some clever
optimizations
could be applied by the backend by reordering received Work(s) or repackaging
several queues in one.
What I've done already:
a)early division in different queues, basing on affected DirectoryProviders
b)refactoring/simplification of Workspace, no longer needed to keep track of
state for different DP as there is only one in the context.
c)shorter Lock times: no threads ever need more than one Lock;
work is sorted by DP, each lock is released before acquiring the next one.
(deadlockFreeQueue is removed as not needed anymore)
before if we needed lock on DP's A,B,C the time of acquisition looked like:
Alock *********
Block ******
CLock ***
now it is more like
Alock ***
Block ***
Clock ***
And my goal is to make this possible, in separate threads when async:
Alock ***
Block ***
Clock ***
(not implemented yet: will need a new backend, but I'm preparing the common
stuff to make this possible)
d)The QueueProcessor can ask the Work about if they need an indexwriter,
indexreader or have any preference about one for when there is
possibility to make a choice (when we open both a reader and writer
anyway because of strict requirement of other Work in the same queue).
e)basing on d), DeleteLuceneWork is able to run either on reader or writer
(when it's possible to do so, depending on (the number of different
classes using the same DP) == 1); In this last case the work is able to
tell it "prefers" to be executed on an IndexWriter, but will be able
to do it's task with an IndexReader too (or the opposite?)
f)"batch mode" is currently set on all DP if only one Work is of type batch,
the division of Workspace per DP does not need this any more and batch
mode can be set independently.
Another goal I have with this design is the possibility to aggregate
different committed queues in one, having the possibility to
optimize away work (insert then delete => noop) considering the original
order, but also call the strategy optimization again
to reorder the newly created work for best efficiency.
The final effect would be to obtain the same behavior of
my custom batch indexer, but optimizing not only indexing from scratch
but any type of load.
I hope to not scare you, the resulting code is quite simple and I
think there are actually less LOC than the current trunk has;
I've not prepared any special case Test, I just run all existing ones.
kind regards,
Sanne
16 years, 3 months
Envers
by Adam Warski
Hello,
Envers is an entity versioning (or: auditing) library for JPA/
Hibernate, which I've been developing for the last half year.
I was presenting Envers on yesterday's R&D meeting, maybe some of you
have been there.
If not, here are the slides: http://www.jboss.org/files/envers/presentations/envers_rd.pdf
And here's the project page: http://www.jboss.org/envers/
Some people suggested that there may be some interests from the
Hibernate Team in Envers. I also get asked quite frequently if I've
talked to the "Hibernate guys" :).
So if you've got any questions, or suggestions for the future
direction of Envers, please write. Or any other comments, of course.
--
Adam
16 years, 3 months
Build failed in Hudson: hibernate-3.2 » oracle10g,linux-slim #101
by jboss-qa-internal@redhat.com
See http://hudson.qa.jboss.com/hudson/job/hibernate-3.2/./TEST_DATABASE=oracl...
Changes:
[steve.ebersole(a)jboss.com] HHH-3471 : add NumericBooleanType
------------------------------------------
started
Building remotely on dev91-linux
Updating http://anonhibernate.labs.jboss.com/core/branches/Branch_3_2
A src/org/hibernate/type/NumericBooleanType.java
At revision 15170
[linux-slim] $ bash -xe /tmp/hudson30552.sh
+ . /home/hudson/config_repository/scripts/common/common_bash.sh
++ basedir=/home/hudson/config_repository
++ common_variables=/home/hudson/config_repository/resources/common.variables
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/chprop
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/copy-eap-results
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/get-soa
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/kill-children
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/kill-derby
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/kill-glassfish
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/kill-jboss
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/kill-pointbase
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/kill-rmiiiop-server
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/kill-sunri
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/load_all_vars
+++ . /home/hudson/config_repository/resources/common.variables
++++ JAVA14=/qa/tools/opt/j2sdk1.4.2_15
++++ JAVA15=/qa/tools/opt/jdk1.5.0_15
++++ JAVA16=/qa/tools/opt/jdk1.6.0_03
++++ ANT162=/qa/tools/opt/apache-ant-1.6.2
++++ ANT165=/qa/tools/opt/apache-ant-1.6.5
++++ ANT165_jbpm=/home/hudson/static_build_env/jbpm/apache-ant-1.6.5
++++ ANT165_jbpm_soa=/home/hudson/static_build_env/jbpm.3_soa/apache-ant-1.6.5
++++ ANT170=/qa/tools/opt/apache-ant-1.7.0
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/runejb3postresql
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/runejb3tck
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/runtck
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/setant
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/setjava
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/setmaven2
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/stacktrace
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/start-jboss
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/stop-jboss
++ for function in '$basedir/scripts/function.d/*'
++ . /home/hudson/config_repository/scripts/function.d/versions
+ setjava 14
+ '[' 14 ']'
+ local 'command=. /home/hudson/config_repository/resources/common.variables && echo $JAVA14'
++ eval . /home/hudson/config_repository/resources/common.variables '&&' echo '$JAVA14'
+++ . /home/hudson/config_repository/resources/common.variables
++++ JAVA14=/qa/tools/opt/j2sdk1.4.2_15
++++ JAVA15=/qa/tools/opt/jdk1.5.0_15
++++ JAVA16=/qa/tools/opt/jdk1.6.0_03
++++ ANT162=/qa/tools/opt/apache-ant-1.6.2
++++ ANT165=/qa/tools/opt/apache-ant-1.6.5
++++ ANT165_jbpm=/home/hudson/static_build_env/jbpm/apache-ant-1.6.5
++++ ANT165_jbpm_soa=/home/hudson/static_build_env/jbpm.3_soa/apache-ant-1.6.5
++++ ANT170=/qa/tools/opt/apache-ant-1.7.0
+++ echo /qa/tools/opt/j2sdk1.4.2_15
+ local javapath=/qa/tools/opt/j2sdk1.4.2_15
+ '[' -d /qa/tools/opt/j2sdk1.4.2_15/bin ']'
+ export JAVA_HOME=/qa/tools/opt/j2sdk1.4.2_15
+ JAVA_HOME=/qa/tools/opt/j2sdk1.4.2_15
+ export PATH=/qa/tools/opt/j2sdk1.4.2_15/bin:/qa/tools/opt/apache-ant-1.6.5/bin:/qa/tools/opt/maven-2.0.9/bin:/qa/tools/opt/jdk1.5.0_15/bin:/qa/tools/opt/subversion-1.4.0/bin:/usr/xpg4/bin:/usr/xpg4/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/hudson/bin
+ PATH=/qa/tools/opt/j2sdk1.4.2_15/bin:/qa/tools/opt/apache-ant-1.6.5/bin:/qa/tools/opt/maven-2.0.9/bin:/qa/tools/opt/jdk1.5.0_15/bin:/qa/tools/opt/subversion-1.4.0/bin:/usr/xpg4/bin:/usr/xpg4/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/hudson/bin
+ DRIVER_JAR=
+ case "$TEST_DATABASE" in
+ DRIVER_JAR=/qa/tools/opt/jdbc-drivers/ojdbc14.jar
+ cd src
+ [[ -n '' ]]
+ cp /home/hudson/config_repository/resources/hibernate/oracle10g.properties etc/hibernate.properties
+ [[ -n /qa/tools/opt/jdbc-drivers/ojdbc14.jar ]]
+ DRIVER_JAR=-Djar.driver=/qa/tools/opt/jdbc-drivers/ojdbc14.jar
+ cp /home/hudson/config_repository/resources/hibernate/log4j.properties etc/
+ bash build.sh clean
Buildfile: build.xml
[taskdef] Could not load definitions from resource clovertasks. It could not be found.
splash:
[splash] failed to display SplashScreen, caught java.lang.InternalError with message: Can't connect to X11 window server using ':0.0' as the value of the DISPLAY variable.
init:
[echo] Build Hibernate-3.2.6.ga (2008-09-10 11:08:58)
[echo] JDK version: 1.4
clean:
[delete] Deleting directory /qa/services/hudson/hudson_workspace/workspace/hibernate-3.2/TEST_DATABASE/oracle10g/label/linux-slim/src/build
BUILD FAILED
/qa/services/hudson/hudson_workspace/workspace/hibernate-3.2/TEST_DATABASE/oracle10g/label/linux-slim/src/build.xml:148: Unable to delete directory /qa/services/hudson/hudson_workspace/workspace/hibernate-3.2/TEST_DATABASE/oracle10g/label/linux-slim/src/build
Total time: 9 seconds
Recording test results
16 years, 3 months
AnyType mappings
by Felix Gnass
Hello,
when using <any> mappings it's quite likely that you get
ObjectNotFoundExceptions, as it's impossible to use on-delete cascades
in such a scenario. Unlike other relation mappings, <any> currently does
not support the not-found="ignore" setting.
The only work-around I came up with so far is to use a custom Type that
extends AnyType and calls session.internalLoad(entityName, id, false,
*true* ) instead of session.internalLoad(entityName, id, false, *false* ).
I have no idea how many people actually use the <any> mapping feature.
While it might look like a bad design decision at the first glance,
there are indeed some valid use-cases for such a mapping (like the audit
log example quoted in the documentation). For none of the possible
applications that come into my mind, the default behavior of throwing an
ObjectNotFoundException really makes much sense. So I wonder whether it
would be feasible to always use nullable=true in AnyType.java?
If this isn't an option, I'd like to vote for adding a
not-found="ignore|exception" setting :-)
Best regards,
-Felix
16 years, 3 months
@FullTextFilterDef cache
by Emmanuel Bernard
I am thinking about caching and filters.
In 3.0, we cache instances of Filter by using
@FullTextFilterDef.cache=true
In 3.1, we still can cache instances of Filter using
@FullTextFilterDef(cache=true, cacheBitResult= CacheBitResults.NO)
an we can cache BitSet results by using
@FullTextFilterDef(cache=true) //, cacheBitResult=
CacheBitResults.AUTOMATIC
I think we introduced cacheBitResult to keep @FullTextFilterDef
backward compatible. But reality is that this feature is not backward
compatible:
- a new layer of caching is automatically added if you keep the
annotation as in 3.0. It probably won't do too much harm but add an
unnecessary indirection
Fundamentally, it seems that the annotation should be
@FullTextFilterDef(cache=FilterCacheMode.RESULTS)
@FullTextFilterDef(cache=FilterCacheMode.FILTER_INSTANCE)
@FullTextFilterDef(cache=FilterCacheMode.NO)
This will force a change from 3.0 apps to 3.1 apps but it's much
clearer.
I can't seem to find a compelling reason to have instance caching and
result caching configurable independently.
WDYT?
--
Emmanuel Bernard
http://in.relation.to/Bloggers/Emmanuel | http://blog.emmanuelbernard.com
| http://twitter.com/emmanuelbernard
Hibernate Search in Action (http://is.gd/Dl1)
16 years, 3 months
How join for two arbitrary tables?
by ManiKanta G
Hi,
How to perform outer join (outer left, outer right) two independent
arbitrary tables (have no relation between them) in HQL? When I tried, I
m getting 'path expected' exception.
All the examples I see have tables with relation table1 left join
table1.table2.
Is there any solution for this apart from native SQL?
Regards,
ManiKanta G
********** DISCLAIMER **********
Information contained and transmitted by this E-MAIL is proprietary to
Sify Limited and is intended for use only by the individual or entity to
which it is addressed, and may contain information that is privileged,
confidential or exempt from disclosure under applicable law. If this is a
forwarded message, the content of this E-MAIL may not have been sent with
the authority of the Company. If you are not the intended recipient, an
agent of the intended recipient or a person responsible for delivering the
information to the named recipient, you are notified that any use,
distribution, transmission, printing, copying or dissemination of this
information in any way or in any manner is strictly prohibited. If you have
received this communication in error, please delete this mail & notify us
immediately at admin(a)sifycorp.com
16 years, 3 months