[Search] Native Java serialization support
by Hardy Ferentschik
Hi,
I would like to summarize a discussion we had on IRC to get some more feedback and come to
a decision on how to move forward.
I am currently in the need of extending our serialization support for the distributed Search
deployment scenarios. Basically we are serializing our different LuceneWork instances from slave
to master in this case. This includes things like Lucene's Document instances, which are part
of add/update operations. Historically, this needs arose with Lucene dropping all serialization
support for their classes, so we were forced to implement our custom serialization. To do so we
defined an SPI (org.hibernate.search.indexes.serialization.spi.*) and provided two implementations,
one based on native Java serialization and one based on Avro [1]. The two implementations are provided
as separate artifacts (the serialization/java and serialization/avro modules in our build) and
theoretically it should be possible to switch between them by exchanging jar files.
I am saying theoretically, since I found out during my recent work, that the Java serialization module
is broken at several places. In its current state it would not work (I guess we never noticed since
the default is Avro and we do not even document the possibility to change implementation. However, it also
shows that no one has even tried).
The question is, what do we do now? Do we want two implementations and should the Java serialization
be fixed and then extended with the new functionality (btw, I need to serialize DocValues now) or is
it time to drop this module, reducing the amount of code we have to maintain and making it a bit
easier to implement new serialization requirements. With dropping the module I mean to
remove the serialization/java module leaving everything else in place. So you still can write
your own serialization implementation, however, we provide no alternative to our preferred choice
of Avro (which is afaik considerably faster than native Java serialization which was one of the
driving factors of using it).
I think on IRC we already "kind of" agreed that we should drop native Java serialization. I
just wanted to put it out once more for everyone to comment/vote.
--Hardy
[1] http://avro.apache.org/
9 years, 10 months
[Search] Deprecating the @Key annotation
by Sanne Grinovero
Gunnar made a nice patch[1] to simplify usage of parameterized,
cacheable FulltextFilters:
these no longer require the user to create a custom key to identify
the parameterset.
Documentation wise, we're struggling in finding a good explanation on
why someone might want to still use the @Key annotation - besides
backward compatibility.
It would be nice to simply remove the references to @Key in the
documentation, and avoid trying to explain when to choose between the
two alternatives.
This would imply to deprecate the annotation.. any objection?
Sanne
1 - https://github.com/hibernate/hibernate-search/pull/775
9 years, 10 months
Some migration pains HSearch 5
by Marc Schipperheyn
So, I've started migrating our production environment to HSearch 5 at long
last.
Some of the initial pains that may warrant some documentation love:
* @IndexedEmbedded basically inverts the default because before HSearch 5,
the default was essentially: @IndexedEmbedded(includeEmbeddedObjectId=true),
whereas now it's essentially: @IndexedEmbedded(includeEmbeddedObjectId=false).
Inverting defaults seems like a dangerous upgrade choice to me.
* I use a lot of indexedEmbedded(includePaths="id") style includes.
public class MyClass{
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@Column(name="userId", nullable=false, insertable=false, updatable=false)
@DocumentId
public Long getId() {
return id;
}
}
I always queried these as follows:
qb.keyword().onField("id").matching(myLongId).createQuery()
where the Long would implicitly converted to a String. and the term query
would be
+id:1
Now, it becomes a NumericRangeQuery based on the fact that I'm passing a
Long. But DocumentIds apparently are still strings by default. And this
query will fail to deliver results.
It makes most sense to me to convert DocumentId to NumericFields and adding
@NumericField to it seems to fix it, but I'm not sure if this could create
problems in other areas since this is the documentId. Anyway, this is
undocumented.
* The way to access a MutableSearchFactory has changed and is not
documented. This is more of an edge case
That's it for now.
Cheers,
Marc
9 years, 10 months
Hibernate ORM 4.2.18.Final released
by Gail Badner
I am having problems creating a weblog entry on in.relation.to. I will send another announcement with details when that is resolved.
Gail Badner
Red Hat, Hibernate ORM
9 years, 10 months
[HSEARCH] *.next Jira versions
by Gunnar Morling
Hi,
There are many (new?) versions in Jira such as 3.1.next, 3.2.next etc.
Are those all needed? 5.x makes sense to me, and maybe 4.5.next, but all
the old ones? All these unreleased versions make it a bit unwieldy when
assigning a fix version to an issue.
Thx,
--Gunnar
9 years, 10 months
[Hibernate Search] Donating some of our source code to Infinispan
by Sanne Grinovero
All,
we've discussed several times the issues we have because of the
circular dependency between Hibernate Search and Infinispan.
Normally the pain point we aim to address with such discussions is the
need to carefully coordinate between releases [of Search and
Infinispan] in our quest for a final stable release, and brings some
release pressure.
More recently it has also become a compatibility problem, as the two
projects target different platforms/environments, and have different
life-cycle expectations - creating more maintenance work such as lots
of back-porting of patches.. distracting from our goals.
We already discussed some solutions, but none too convincing. I have a
new fresh proposal which I feel is more interesting:
# New plan
1) the module "/infinispan" from the Hibernate Search source tree is
moved to the Infinispan project into some "hibernate-search-directory"
Maven module.
2) Hibernate Search drops any dependency to Infinispan
Reminder: this "/infinispan" module we have contains just a couple of
classes, and represents the "DirectoryProvider" implementation which
integrates with the Hibernate Search autodiscovery and creates a
Directory instance (whose implementation always lived in Infinispan).
Most of the code is about applying configuration properties,
integration tests.
It's a very simple plan, but has some valuable consequences:
- Search can move on without ever needing to wait for Infinispan -
and vice versa.
- There is no longer a circular dependency
- Search would be in control of its Lucene dependency, and can
upgrade as needed. We could experiment with different Lucene versions
without necessarily wait for Infinispan to solve compatiblity issues
first - and often more complex as that means then for Infinispan to be
able to guarantee a compatibility with a *range* of Lucene versions,
to include both the target Lucene and the version currently consumed
via Infinispan Query / Hibernate Search Engine.
- Infinispan can *opt* to stick with an older version of Search - or
update - provided it can satisfy both a) integration with possible
changes to the Search SPI b) an update for possible new requirements
of the new Lucene version (the Lucene Directory might need
compatibility fixes)
- The dependency structure would better match the one as provided by
our Enterprise Products. For example, it's the JDG distribution - not
Hibernate Search - to provide these integration bits, so it makes more
sense to build the Directory against the specific version of
Infinispan than against the specific version of Search.
- It will be easier to have new Infinispan code take advantage of
features exposed on the Infinispan Lucene Directory - if all changed
parts are in the same [Infinispan] repository. This is actually being
a problem right now, as it's holding back a POC meant to deliver great
improvements with the Directory implementation.
And not least we'll have faster builds ;-)
# Drawbacks
First one is we'll probably have our users need to change some details
of how they build.
Infinispan might need to fix some SPI related code before being able
to upgrade, but historically our SPI has been extremely stable.
The real problem is that when such a thing happens, after we've
released a version of Search there might be some time before a
compatible version of Infinispan is made available as well.
In practice this means the gap of time in which we have to catch up on
API changes is "exposed" to end users wanting to use our latest and
possibly blocked - but while they would then see the tip of the
iceberg of our integration, I believe it would still take the same
amount of waiting time in terms of calendar dates in which the working
duo is available to them - as with the current model in such a
situation we need to wait for the same Infinispan release to happen
before we can release ours.
So: same time, but we'd have a leaner process, and possibly quicker
releases for all users not interested in that - or just benefits in
all those scenarios in which we don't break APIs which is very common.
I've not identified other problems, so my opinion is that these are
well worth the benefits.
# Consequences for our users
Not much. Even today we expect our users to depend on several jar
files provided by the Infinispan team; this would be just one more.
Opens some questions though:
A) Should the Maven group id be changed? I'd expect it to be
transferred to "org.infinispan" group at least, and probably need a
better artifact id too.
B) License. Our code is LGPL, most of Infinispan is ASL - but not all
of it. So I expect it would be possible to keep the existing license
at least for now, and defer eventual license changes as a separate
step (if people feel need for any change at all).
C) Documentation. Besides the needed updates in Maven coordinates /
download sources, I don't expect much of a difference: we'd still
explain how to set this integration up.
D) Distribution. Today we distribute this module, and its
dependencies, in our release bundles. Which implies we distribute a
copy of various Infinispan jars.
I think we should drop these from our distribution - even though it
might seem counter-intuitive:
while it might seem convenient to have these included, the whole point
of the change would be that there would be more flexibility in which
versions of Infinispan would work with Search. And actually the
integration tests and this specific knowledge would be responsibility
of Infinispan.
Am I failing to see a more critical issue?
How would you all feel about our code being transferred to the
different project?
Sanne
9 years, 11 months
Search 5 migration pains: MultiFieldQueryParser and Numeric Fields
by Sanne Grinovero
As reported on SO [1], it's not a straight forward migration for those
who embraced the convenience of using a MultiFieldQueryParser, or one
of the other Lucene provided parsers.
In the specific example, I think the right answer would be to use the
programmatic API or our DSL.. but let's consider the use case in which
you want to parse user input, from a text input in your application?
Using the parser is quite convenient in such a case, as people can
express boolean operators and field names, while keeping the UI very
simple.
Wouldn't it be nice to have a custom "parser" in our DSL, which
essentially mimicks the functionality of the MultiFieldQueryParser but
takes advantage of our indexing metadata - like we do for the HQL
Parser?
Sanne
1 - http://stackoverflow.com/questions/28138308/hibernate-search-5-0-numeric-...
9 years, 11 months
Re: [hibernate-dev] [Hibernate Search] Donating some of our source code to Infinispan
by Sanne Grinovero
On 26 January 2015 at 09:52, Tristan Tarrant <tristan(a)infinispan.org> wrote:
> On 26/01/2015 10:45, Hardy Ferentschik wrote:
>>
>>
>> A) Should the Maven group id be changed? I'd expect it to be
>> transferred to "org.infinispan" group at least, and probably need a
>> better artifact id too.
>> +1 At least group id needs top change.
>
> Yes.
>>
>>
>>> B) License. Our code is LGPL, most of Infinispan is ASL - but not all
>>> of it. So I expect it would be possible to keep the existing license
>>> at least for now, and defer eventual license changes as a separate
>>> step (if people feel need for any change at all).
>>
>> Would it not be easier to change to ASL?
>
> All of the embeddable Infinispan code is ASL. The exception is for the
> server itself which is LGPL (being based on WildFly which is LGPL itself).
> So I'd rather not confuse the issues any more.
>
> Tristan
As long as we're clear that wherever you expose/include Search and
parser code it's LGPL ;-)
And JPACacheStore, and ..
So I don't think it's a very simple story to explain to users today,
one more module wouldn't be a significant change, especially since
it's not public API! And remember you're already using the same code
today, with the same strings attached. If any this should make it
easier to change license.
I'm fine to change this to ASL, but I'd prefer we could move forward
with it without needing to involve legal matters as a blocking
process.
As mentioned above, this would make it far easier to develop some
improvements which would highly benefit both projects in the short
time.
Sanne
9 years, 11 months
Search: upgrade to Apache Lucene 5
by Sanne Grinovero
Apache Lucene 5 is in candidate release now, and might be released
before the end of the month.
I've been testing it this weekend... and it's still Java7 compatible!
Sorry I got confused when I previously mentioned it would require
Java8: they did indeed switch "trunk" branch to require Java8, but
"trunk" is meant to become 6.0, while branch "5.0" was already
branched before that change in requirements.
So we could propose a timely update to Lucene 5 without necessarily
waiting for it to be a good time for Hibernate Search to upgrade to
Java8 / major release.
The initial "damage" is accounted as about 600 compile errors; I could
resolve approx 230 already as they were trivial.
The remaining ones require some more care & investigation, but seems
like we could get it done in 1-2 weeks, if we were to put our focus on
this... it definitely looks better than the migration 3 -> 4.
Changes would again (obviously) affect users wherever they use
"native" Lucene APIs, but it seems like that this time we could do a
zero-changes in our APIs.. so we could do this in a minor release
without violating our "backwards compatibility policy" but I guess
that's arguable, as while our APIs would be "drop-in" compatible, the
user application wouldn't work with all the changes in Lucene code.
We would need to rewrite areas relating to:
- Faceting
- Filters and filter stacking
- Custom Collectors (i.e. most code of Spatial)
- FieldCache
- Some IndexWriter code related to I/O errorhandling locking
- Analyzer
A more extensive preview of changes is documented here:
http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC1-rev16...
We might want to start working soon on a new major already?
Whatever we do, we can't allow our users needing to wait as long as we
did for 4, especially since the upgrade is not as nasty.
Sanne
9 years, 11 months
[OGM] Transaction type "RESOURCE_LOCAL", still JTA is required
by Gunnar Morling
Hi,
For a demo I have an OGM application which defines a persistence unit with
transaction type RESOURCE_LOCAL.
I thus assumed I wouldn't have to add a JTA implementation to the class
path, but actually I'm getting a CNFE (see [1] for the complete trace):
ClassNotFoundException: Could not load requested class :
com.arjuna.ats.jta.TransactionManager
Indeed Arjuna is what we use as TM by default. It is set by OGM's
JtaPlatform implementation which in turn is used by transactions created by
OGM's default TransactionFactory [2].
Unless I'm doing something wrong configuration-wise, I feel that requiring
a JTA implementation for a non-transactional backend such as MongoDB is
confusing and may make users ask whether OGM is doing the right thing.
Would it be feasible to to provide an "OGM local" TransactionImplementor +
TransactionFactory to be used in such cases where the store does not
support transactions (so no rollbacks etc.), but we'd "only" need a trigger
for writing out changes to the datastore?
Any thoughts?
--Gunnar
[1] https://gist.github.com/gunnarmorling/ba193caecb7d5cdbd0a4
[2]
https://github.com/hibernate/hibernate-ogm/blob/master/core/src/main/java...
9 years, 11 months