[SEARCH] Translating analyzer definitions from HSearch to Elasticsearch
by Yoann Rodiere
Hello everyone,
I'm currently working on HSEARCH-2219, "Define analyzers via the REST API",
whose purpose is to automatically translate @AnalyzerDefs in Hibernate
Search to settings in Elasticsearch, removing the need for users to
configure analyzers separately in their Elasticsearch instance.
The thing is, the structure of our configuration in Hibernate Search is
different from the one in Elasticsearch. In particular, we can't name
instances of token filters, char filters, etc, while in Elasticsearch one
*has* to name them in order to provide parameters.
See for instance:
@AnalyzerDef(
name = "myAnalyzer",
tokenizer = @TokenizerDef(
factory = StandardTokenizerFactory.class,
parameters = @Parameters(@Parameter(name = "maxTokenLength", value =
"900"))
)
)
compared to the Elasticsearch way:
index :
analysis :
analyzer :
myAnalyzer :
type : custom
tokenizer : myTokenizer1
tokenizer :
myTokenizer1 :
type : standard
max_token_length : 900
The analyzer name is there on both sides, @TokenizerDef.factory would give
me the tokenizer type, and parameters are pretty obvious too. But
"myTokenizer1", the tokenizer name, has absolutely no equivalent in
Hibernate Search.
I could try to generate names automatically, but those would need to be
more or less stable across multiple executions in order for schema
validation to work properly. And there's nothing we could really use as an
identifier in our annotations, at least not reliably.
To fill the gap, I'd like to add a "name" attribute to the TokenizerDef,
CharFilterDef and TokenFilterDef annotations. This attribute would be
optional and the documentation would mention that it's useless for embedded
Lucene.
Another solution would be to have a "magic" @Parameter, named after a
constant (ElasticsearchParameters.TOKENIZER_NAME for instance), and detect
that parameter automatically, but it feels wrong... mainly because
@AnalyzerDef already has its own "name" attribute, so why wouldn't
@TokenizerDef?
And finally, we could bring our annotations closer to the Elasticsearch
way, by providing a way to define tokenizers/char filters/token filters and
a separate way to reference those definitions, but I don't think that's 5.6
material, since we'd likely have to break things or lose consistency.
WDYT?
Yoann Rodière <yoann(a)hibernate.org>
Hibernate NoORM Team
8 years
JDK 9 b148 including a refresh of the module system is available on java.net
by Rory O'Donnell
Hi Sanne,
As requested, I will send availability emails to hibernate-dev mailing
list in future.
JDK 9 build b148 <https://jdk9.java.net/download/> includes an important
Refresh of the module system [1] , summary of changes are listed here
<http://download.java.net/java/jdk9/changes/jdk-9+148.html>.
*This refresh includes a disruptive change that is important to understand.
*For those that have been trying out modules with regular JDK 9 builds
then be aware that `requires public` changes to `requires transitive`.
In addition, the binary representation of the module declaration
(module-info.class) has changed so that you need to recompile any
modules that were compiled with previous JDK 9 builds.
As things stand today in JDK 9 then you use setAccessible to break into
non-public elements of any type in exported packages. However, it cannot
be used to break into any type in non-exported package. The current
specified behavior was a compromise for the initial integration of the
module system. It is of course not very satisfactory, hence the
#AwkwardStrongEncapsulation issue [2] on the JSR 376 issues list. With
the updated proposal in the JSR, this refresh changes setAccessible
further so that it cannot be used to break into non-public types, or
non-public elements of public types, in exported packages. Code that
uses setAccessible to hack into the private constructor of
java.lang.invoke.MethodHandles.Lookup will be disappointed for example.
This change will expose hacks in many existing libraries and tools. As a
workaround then a new command line option `--add-opens` can be used to
open specific packages for "deep reflection". For example, a really
popular build tool fails with this refresh because it uses setAccessible
+ core reflection to hack into a private field of an unmodifiable
collection so that it can mutate it, facepalm! This code will continue
to work as before when run with `--add-opens
java.base/java.util=ALL-UNNAMED` to open the package java.util in module
java.base to "all unnamed modules" (think class path).
*Any help reporting issues to popular tools and libraries would be
appreciated. *
A debugging aid that is useful to identify issues is to run with
-Dsun.reflect.debugModuleAccessChecks=true to get a stack trace when
setAccessible fails, this is particularly useful when code swallows
exceptions without any logging.
Rgds,Rory
[1]
http://mail.openjdk.java.net/pipermail/jdk9-dev/2016-November/005276.html
<http://mail.openjdk.java.net/pipermail/jpms-spec-experts/2016-October/000...>
[2]
http://openjdk.java.net/projects/jigsaw/spec/issues/#AwkwardStrongEncapsu...
--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland
8 years
RH Summit 2017 CFP
by Chris Cranford
The Red Hat Summit 2017 CFP is closing on December 16th and I would
recommend we try and submit our abstracts this week if we could. From
the ORM side, is there any particular topics we'd care to present or
anyone who would like to present?
Chris
8 years
AutoCloseable is great.. what about JPA?
by Sanne Grinovero
We updated several Hibernate APIs to allow using the
try-with-resources pattern a while back,
and I've been finding that quite handy.
Had to write some code using JPA's EntityManager these days; in
contrast it feels it's stuck somewhere between the stone age and the
dark ages to not be able to do the same.
Should we propose the same API fixes to the JPA expert group?
Thanks,
Sanne
8 years