HSEARCH-3000 Pick Jigsaw Automatic Module names for all published modules
by Sanne Grinovero
Picking automatic module names for Hibernate Search isn't going to be
straight-forward as our two main jars (hibernate-search-engine &
hibernate-search-orm) suffer from split package among them.
We can't really fix the split package problem without breaking all
users, so if we want to consider that, we can debate it but that will
need to happen at another round as we're doing a minor release, so
let's focus on:
# Which names to pick
# Should we pick the names at all
# Which modules should have a name
For a great background on the possible strategies and pitfalls I
recommend reading Stephen Colebourne's blog on this subject [1].
He persuaded me there are good reasons to use the "reverse DNS, the
top level package", however since we have the split package problem we
can't simply go with that.
Still, we can respect the principles he recommends with a small
variation. It's fair to assume that the `org.hibernate.search` prefix
is "ours"; since the nature of the suggestion is focused on making
sure there are no misunderstandings in the community about which names
you can choose - as there is no central authority making sure module
names aren't clashing - we should be fine within Hibernate projects
with any `org.hibernate.X` prefix, as long as we coordinate and reach
an agreement on this list.
So, I propose we use:
Engine module:
- org.hibernate.search.engine
ORM integration module:
- org.hibernate.search.orm
JGroups, JMS backends:
[ no automatic module name ? Excepting some "guidelines" in the JMS
module, these are not public API so nobody would benefit from it -
also we think we might want to phase out the name "backend" in the
future ]
Elasticsearch integration module [hibernate-search-elasticsearch.jar]:
- org.hibernate.search.elasticsearch
Elasticsearch / AWS security integration:
[ no automatic module name: no public API ]
Serialization / Avro
[ no automatic module name: no public API ]
WDYT?
We could also pick names for the ones which I've listed as "no public
API" but I see no point: as we're only assigning an "Automatic Module
Name" we won't be able to explicitly state that the other modules
depend on these. So nobody will use them, and things are a bit in flux
anyway in this area because of Hibernate Search 6 plans.
Another optional altogether: since we have split packages which we'll
have to resolve before we can actually transform these into fully
fledged modules, I think an acceptable position is also to say we
won't be publishing any automatic module name yet. Personally I'm
inclined to go with the names suggested above, at least some others
can start making baby steps, even if it's not all there.
# What I don't like:
For one, that the typical application will need to import both
`org.hibernate.search.engine` and `org.hibernate.search.orm`, and
likely more as well (e.g. Elasticsearch API, Lucene API module is
coming, ..).
Maybe similar to BOM's today we could publish a module which
statically imports multiple of these, that could be nicer to use but
we risk needing to publish (and document) one for each of a selection
of combinations. So let's not start with such things yet.
Thanks,
Sanne
[1] http://blog.joda.org/2017/05/java-se-9-jpms-automatic-modules.html
6 years, 7 months
Why hibernate-jcache is not enough (for Infinispan)
by Radim Vansa
Hi Steve,
on HipChat you've asked why hibernate-jcache + Infinispan's
implementation of JCache is not enough. It ultimately boils down to
1. performance
2. correctness
where (2) can be fine with some providers but then (1) suffers.
Infinispan offers transactional mode, but it's not serializable (gosh,
sometimes it's even read_uncommitted) and has its quirks. The
performance can't be as good as with non-tx mode, too. That's why the
native transactional caches support will be dropped in 5.3 and we'll
just emit a warning to update their configuration (and continue with
non-tx config).
As a demonstration of this we can use the putFromLoad. If you implement
this as a (ideal serializable) transactional cache putIfAbsent, the
provider must either
a) lock the key (pessimistic approach) - but we don't want to block
other nodes removing data from the cache (on write) or putFromLoading
(on read!)
b) resolve conflicts when the transaction is committing: you figure out
that there are two concurrent updates and rollback one of the
transactions - that's not acceptable to us either, as any failure in
cache should not affect DB transaction. And there's a risk of blocking
between the 2 phases of commit, too.
Theoretically you could just wipe any modified data on conflict - I
don't know if anyone does that, 'drop everything and proceed with
commit' is not something I'd expect from a general-purpose (NoSQL) DB. I
recall Alex's JCache implementation (for 5.2) storing some 'lock'
objects in the cache, and you probably don't want to wipe those.
Interaction with evictAll/removeAll could be also problematic: not sure
about the other providers but Infinispan's clear() operation is
non-transactional even on tx cache (since Infinispan 7 or so) because
it's impractical to resolve all conflicts. I don't know details how
others provide that operation but there may be a hidden problem.
Last but not least, you assume that the provider is transactional and it
provides JCache interface. JCache does not define interaction with JTA,
because it was hard to get agreement on non-tx behaviour (why did it
take 13 years to complete the JSR?) and it would be even harder for JTA.
So what you expect is just your extrapolation or wishful thinking, and
it's up to integrators to verify that the unwritten contract is
fulfilled within the scope of hibernate-jcache module use. Not that SPI
implementors would be in a better position, but at least we are aware
that (for us) it's not enough to implement those 3 classes and job's done.
Of course the correctness aspect may be ignored with 'it's just a cache'
implying 'users expect stale/uncommitted data' as Sanne (who is much
closer to the customers than me) keeps repeating. However this is not
what 2LC promises as I understand it: the same results as DB would do.
I am really grateful that in 5.3 you've provided the
CacheTransactionSynchronization that will help us boost (1) even further
by allowing us to execute all operations in parallel. And it's good that
you've made the SPI more expressive with the intent; there'll be a bunch
of TODOs in the 5.3 implementation to cover use cases that were not
handled in previous versions but now are obvious.
Cheers
Radim
--
Radim Vansa <rvansa(a)redhat.com>
JBoss Performance Team
6 years, 8 months
API differences in Hibernate ORM 5.1 vs 5.3
by Gail Badner
Hi,
There were lots of differences in the compatibility report, so as a first
step, I've excluded packages/classes that I considered SPI, internal, or
"grey area". This reduced the the differences to a more manageable amount.
You can see a summary of the incompatibilities along with suggested
mitigation at [1].
The report is attached to [1], along with a zip with instructions for
running the report.
I believe there are some "false positives" in the report, and I have
documented them in the section, "False Positives?".
Feel free to comment on the article.
Thanks,
Gail
[1]
https://developer.jboss.org/wiki/HibernateORMBinaryCompatibilityBetween51...
6 years, 9 months
Hibernate OGM mapping for "server side Hibernate Search" via Infinispan Remote
by Sanne Grinovero
Hi all,
this one is a very desirable feature, yet tricky as there's high
chances of ambiguity and confusion for end users.
# Infinispan Remote indexing
Infinispan embeds the Hibernate Search engine, and uses it to index
data being inserted in any cache having indexing enabled. As you know
Infinispan can be used to store Java POJOs, which get serialized using
JBoss Marshalling - or encoded into Protobuf entries using Infinispan
Protostream as helper layer.
Hibernate OGM supports both modes, one meant for "Infinispan Embedded"
and one for "Infinispan Remote" as that's what each encoding strategy
is suited for.
# Protobuf & indexing
Protobuf is a well defined format with plenty of documentation which
focuses on a "schema" of the encoding; Hibernate OGM is able to
generate such schemas dynamically and will generate encoders and
decoders which follow the encoding guidelines for Java objects.
The meta schema of protobuf is not super flexible, yet there's the
option of annotating the Protobuf schema elements using "annotations"
in comments.
Protostream allows inserting Hibernate Search annotations directly in
these comments and will use them to generate the server side indexing
configuration, implicitly also allowing such properties to be queried
using indexed.
For example you might have this string literally within the comments:
"@Field(store = Store.YES, analyze = Analyze.YES)"
A full example of schema can be found here [1].
(The Infinispan documentation is a bit sparse on this as they
encourage people to use another code gen tool, best refer to tests as
examples when working for OGM)
# What should OGM users experience?
A naive solution would be to allow people to use the Hibernate Search
annotations on their JPA entities, and we have OGM copy these into the
generated schema; there's a number of problems with that:
- not all such annotations "translate" equally well [2]
- there's a mismatch between JPA properties and underlying encoding fields
- if I run a FullTextQuery do I expect it to work remotely?
- what if I want to use Hibernate Search locally as well or instead?
- references to local classes obviously won't work (custom
fieldbridges, analyzers, etc..)
An alternative is to look at these as "indexes" of the underlying
store, so we'd use them to hint the Infinispan server about user
provided hints such as those generated by `javax.persistence.Index`.
I do think this is the cleaner approach, yet has two drawbacks:
A- I guess ORM might implicitly generate some indexes in its metadata
which the user might not have explicitly asked; e.g. accelerate unique
constraints and foreign keys; it's possible these might not be as
useful as expected in the Infinispan case.
B- we won't be able to leverage the awesome full-text capabilities :-(
I believe A is something we could ignore for now and revisit if
there's actual demand.
B is also not urgent, yet disappointing limitation as this capability
is a distinguishing feature of this NoSQL. Would we agree that
exposing such full-text capabilities would best be let to an ad-hoc
backend in Hibernate Search 6?
Thanks,
Sanne
1 - http://blog.infinispan.org/2018/02/restful-queries-coming-to-infinispan-9...
2 - https://github.com/infinispan/infinispan/blob/master/remote-query/remote-...
6 years, 9 months
Why do we have the date in the URL of blog posts?
by Gunnar Morling
Hi,
While talking to a few bloggers from the Java ecosphere at JavaLand last
week, the question came up why we have the date in the URL of blog posts.
Arguably, it doesn't add value there (we show the date on the actual posts
themselves), and makes the URLs slightly worse to read. In particular, we
don't allow for browsing posts by year or month (e.g.
http://in.relation.to/2018/), so it's even a bit misleading. Omitting the
date would also make the original idea of the URL fly again ("in relation
to xyz").
Anyone with thoughts whether we should change the scheme (keeping existing
ones of course)?
That all said, I've no idea whether the date in there is good to have or
not in terms of SEO. I suppose it doesn't matter.
--Gunnar
6 years, 9 months
Hibernate ORM 5.3 CR2 ?
by Sanne Grinovero
Steve, all,
would it be possible to tag a CR2 as soon as the Caching SPI changes
are complete?
That would help the Infinispan team so they can get started
immediately on their side of things.
Thanks,
Sanne
6 years, 9 months
Defaultable service strategies
by Steve Ebersole
Thoughts on allowing certain services to be defaulted to the single
implementor registered with the `StrategySelector` service when there is
just one?
For example, when resolving the RegionFactory unless both (a)
`hibernate.cache.region.factory_class` and (b) one of
`hibernate.cache.use_second_level_cache` or
`hibernate.cache.use_query_cache` are defined caching support will be
disabled. What I am proposing would kick in in this case and check to see
if there is just a single RegionFactory registered with the
StrategySelector and if so use that one.
It would allow Hibernate to more seamlessly operate as a JPA provider too,
as currently to use caching with JPA and Hibernate users have to do the
normal JPA stuff and then also define these Hibernate settings. It would
be nicer if they could just do the JPA stuff.
6 years, 9 months