The current Hibernate Search sprint: lots of topics!
by Sanne Grinovero
All,
let me clarify the general goal of this sprint. I don't expect to
celebrate with a 5.2.0.Final this time, but I'd aim at getting some of
the long standing big tasks done, and finish these three weeks with a
5.2.Beta1. We need to organize in several parallel significant themes.
There are some "big" themes going on which you need to be aware of
beyond the granularity of JIRA.
Your help in properly inspecting these with experiments and then break
them down in smaller tasks is what I'm needing the most right now. I'd
highly appreciate if each of you could take on leadership of one of
these themes, and get at least one other team member as primary
reviewer and brainstorming mate.
These are the primary themes:
- the Faceting refactoring - led by Hardy
- the dynamic types work - led by me
- Hibernate ORM 5 compatibility and testing - almost done
- getting rid of the Infinispan module - led by Gustavo
- a discussion with the wildFly team about how to share the module
structure / build / definitions (more on this soon)
- Lucene 5
- R&D: explore better clustering strategies, better master election
(or no-master architectures)
- Better integration with ORM's Multi-tenancy - being quite requested
recently - Davide?
If we really could upgrade both ORM and Lucene to 5, then we could
promote this to a new major release. Of course I'm dreaming and that's
not going to happen in practice - not least that would require an ORM
5.0.0.Final.
So what I'm expecting is that we explore the needs for these, and you
help me identify which steps are needed to get these both upgraded in
the near future. That means we might be raising more issues than
solving them, but that's good as it clarifies which atomic, self
contained and consistent steps we then need to perform to get there.
I'm currently working on ORM5 tasks, will soon share some PRs of
things which could already be merged, but of course the final step
won't be applied as we're not really going to upgrade yet - unless we
agree we're only releasing betas until ORM is final too.
For Lucene 5: the work which Hardy is doing is essential:
- update the Faceting code
- move our code to use the new FieldDocs
After that, the upgrade won't be that bad (not as hard as Lucene 4)
I just created some JIRAs as "container" for these larger themes, just
please keep in mind that I'm not setting the version to be "5.2" as
they will probably span multiple releases. The goal should not be to
resolve them, but to start them and split them up in subtasks which
can be merged already.
I'm pretty sure that several resulting sub-tasks can be merged already.
There is a new label in JIRA: "current_sprint", so we can identify
them all even though they are not marked to be fixed for version 5.2.
The "R&D" tasks are not in JIRA at all, I'm still gathering
requirements - still we'll need to dedicate some time to
experimentation and brainstorming.
I realize these are many parallel paths to work on; we're many
experienced devs though, and these should be workable in parallel.
If each of you can take some leadership on an area I hope we can close
them all by the next iteration (except probably the R&D task).
===
That said on the larger themes, there is of course a list of
traditional tasks which will shape the 5.2 improvements.
These are marked "5.2" on JIRA; some are trivial, like missing javadoc
or a paragraph of documentation but need some figuring out to craft
the right docs.
Let me comment these briefly to see if any picks your interest.
# HSEARCH-1848 Replace the Infinispan Directory provider with the one
distributed by the Infinispan project
As discussed: we'll remove the module, but need to make sure we can
plug in the one distributed by Infinispan. Needs Infinispan to release
it first.
# HSEARCH-1214 Review SearchFactory initialization
For our own sake of mind.. the boot process is hard to understand. I
have some ideas, and there are many things to keep in mind so I'll
probably try to take this myself but otherwise I'll transfer my brain
dump.. best over voice.
# HSEARCH-1472 Broaden collection of built in IndexManager
implementations to simplify choice of sensible configurations
As discussed at the team meeting. The goal is to simplify
configuration and documentation, prevent sick configuration choices.
# HSEARCH-1474 MassIndexer needs to avoid being timed out by the
TransactionManager
This is high value and long standing, but complex. Gunnar started
working on a test.
# HSEARCH-1536 Improve the test suite around MoreLikeThis
(association, custom fieldbridge, class bridges)
There are several open tasks around MLT. This is the warmup point to
finalize it MLT... I didn't schedule the other tasks for this sprint.
# HSEARCH-1589 ServiceManager closes services too aggressively
A sensible optimisation, probably easy. Beware: concurrency and
bootstrap related.
# HSEARCH-1654 Disable merge policy during Massindexing
A great performance optimisation for mass indexing people. I think
it's trivial, but to be verified you'll need to setup a relatively
long run - we have a repository with instructions to reindex the
Wikipedia
# HSEARCH-1681 Index optimisation should commit to publish the
performed optimisation
Trivial to do - one liner - but not so trivial to test for.
# HSEARCH-1684 ResultTransformer ignores transformList on tuples
No idea, needs to be looked at to make Marc S. happy.
# HSEARCH-1708 Using DistanceSortField does not verify the field
parameter passed to the constructor
# HSEARCH-1711 EntityIndexingInterceptor executes on different part of
the hierarchy
# HSEARCH-1729 Document the Infinispan configuration property
`metadata_writes_async`
This was not documented as it's an highly experimental property. I was
hoping we could run some more tests, but I won't have the time for
that at the moment, so either someone volunteers for the test, or we
keep it a secret, or decide to document it with warnings.
# HSEARCH-1762 Improve javadocs of builtin bridges
# HSEARCH-1773 org.hibernate.search.backend.impl.WorkVisitor not
exported by engine osgi bundle
Or find some alternative way... but whatever the solution we need to
get OSGi as "done" status.
# HSEARCH-1783 Reproduce transaction timeouts during mass indexing
Gunnar already on it.
# HSEARCH-1793 CriteriaObjectInitializer causes too many object loads
in cross hierarchy queries
This one is nasty, we should get rid of it.
# HSEARCH-1803 Infinispan integration test search in the wrong node
since we're removing the code.. we need to apply this as
https://issues.jboss.org/browse/ISPN-5339
# HSEARCH-1804 Boost on IndexedEmbedded properties
This really should just work as the user requests
# HSEARCH-1811 WIldcard with multiple fields
Another sensible usability improvement
# HSEARCH-1812 Documentation doesn't clearly explain how one obtains
the existing SearchIntegrator
Start a documentation section "integrators and framework developers" ?
# HSEARCH-1815 Clarify the need to depend on an implementation of
SerializationProvider
Apparently we don't state one will be needed ;)
# HSEARCH-1816 Explicitly validate the version of Hibernate ORM
A usability improvement, as proposed on the mailing list. +1 for Gunnar's ideas.
# HSEARCH-1826 Make it possible to test Hibernate Search with preview
builds of Hibernate ORM 5
I'm working on this one.
# HSEARCH-1828 Clarify documentation about ways to disable Hibernate Search
# HSEARCH-1839 FieldBridge instance initialization might use reference
access to the booting framework
This is needed by the jBPM / Drools teams. At least the programmatic
configuration should be trivial.
# HSEARCH-1844 Review which components should no longer be tagged as
experimental
# HSEARCH-1847 Create a FSDirectory extension which doesn't ever sync to disk
Requested by Infinispan - might become an urgent requirement soon,
better have this ready.
10 years, 8 months
[Hibernate Search] Repository Notice: migrating Infinispan integration
by Sanne Grinovero
All,
please do not make changes (or propose patches) to any sources for the
Maven module
org.hibernate:hibernate-search-infinispan
In other words, anything under the path /infinispan in the repository.
We're currently working to move this module to the Infinispan project,
at the following repository:
https://github.com/infinispan/infinispan
Of course we'll still maintain and love our integration, it's just
much easier to maintain if it is released together with the Infinispan
core modules.
We'll also migrate some of the integration tests.
Thanks,
Sanne
10 years, 8 months
ORM Jenkins Builds
by Steve Ebersole
I was curious why it took so long to run the master ORM jobs on the CI
machine compared to running the job locally. Locally I run `clean test` at
the root prject quite often and it takes roughly 9-10 minutes. The master
CI jobs generally take 45-50 minutes to complete.
So I enabled "Gradle build profiling" in our job. The results were
surprising in terms of ratios.
I figured findBugs, checkStyle etc probably added significant times to the
build. But I was shocked how much it added.
BTW, you can view these profile reports in {root}/build/reports/profile...
So hibernate-core, overall took 17m22.19s to run for one job. Of that,
12.5 was findBugs! checkStyle as actually "reasonable" at just under 30s.
The ratios were similar across all modules.
The aggregatedJavadoc task took a shade over 2m.
Considering that these jobs are run on ever check-in (and eventually it
would be great to auto-run them against PRs too), plus the fact that we
aren't even failing the build for the majority of findBug/checkStyle hits I
think we should define these jobs a little differently. Its not just the
time it takes. Yes we all hate to wait. But it's also the CI resources
taken up.
I'm going to put some thought into this after the 5.0 Beta release, but I
wanted to get some thoughts and feedback in the meantime. Things to
consider.
10 years, 8 months
Re: [hibernate-dev] Bytecode enhanced, Reference Cached immutable Entities
by Sanne Grinovero
[adding the mailing list]
Generally speaking, looks like we agree on the direction: EntityEntry
needs to be an interface, and some clever logic to select the
appropriate implementations.
In your draft you're having a single EntityEntryFactory as a global
service; I'm wondering if we shouldn't have the possibiliy to have a
different factory implementation per Entry type.. more on this below.
What is your primary differentiator between 'SharedEntityEntry' and
'StatefulEntityEntry' ?
For our purposes I'd have used different names, but since there's no
javadoc yet I wonder if you had different intentions.
Personally I'd have chosen something like "ImmutableEntityEntry" and
"MutableEntityEntry", there the Mutable one is a rename of the
existing implementation, and the Immutable would be a slimmed down
version which might not need fields such as:
- loadedState (not needed for readonly)
- version (what would be the point)
- ..
A concern I have is to avoid ever needing to "promote" an
ImmutableEntityEntry into a MutableEntityEntry: it's easy to mark an
existing instance of ImmutableEntityEntry as READ_ONLY, but there is
no going back if the entity entry was initially loaded as READ_ONLY.
One could think of swapping the existing entityentry, but that could
get hairy and defeats the point of optimising object allocations.
Is there a strong guarantee which we can rely on, that if an
EntityEntry is marked READ_ONLY at first load, noone will ever need to
re-mark it as mutable?
If not, the current check in DefaultEntityEntryFactory basing the
choice on the current status of the Entity might not be enough, we
might need to be a bit more conservative and only based that on
getPersister().isMutable() ?
The READ_ONLY point which you're leveraging for this specific
optimisation seems to be key for the specific optimisation we have in
mind at this point; but generalizing the concept it seems to me that
the choice of EE implementation to use for a specific Entity type will
be a consistent choice for the lifecycle of the EntityPersister, and
depending on immutable flags on the EntityPersister. Which is why I'm
suggesting that the EntityPersister should have a dedicated
EntityEntryFactory. Making the EntityEntryFactory a global service
would force to go through all the checks of the EE implementation
choice each time, while the choice should always be the same. I
wouldn't argue to save a couple of simple "if" evaluations, but it's
very possible that some more clever EntityEntryFactory implementations
than this current draft might need to do more work, for example
consult more Services to call back into OGM metadata.
Not least, having a per-type EntityEntryFactory would make it possible
to refer to it from some EntityEntry implementations and save some
memory around the common state.
Concurrency
Since the goal is to share the ImmutableEntityEntry instance among
multiple threads reading it, I'd rather make sure that it is really
immutable. For example it now holds onto potentially lazy initialized
fields, such as getEntityKey().
If it's not possible to make it really immutable (all final fields),
we'll need to make it threadsafe and question the name I'm proposing.
LockMode
From a logical perspective of users, one might think that an entity
being "immutable" doesn't necessarily imply I can't lock on it..
right? I'm not sure how far it could be a valid use case, but these
patches are implying that one can't lock an immutable entity anymore,
as the lock state would be as immutable as anything else on the
EntityEntry.
Are we good with that? Alternatively one might need to think to
separate the lock state handling from the EntityEntry.
On smaller details:
# org.hibernate.engine.internal.SharedEntityEntry is hosted in an
.internal package, I don't think it's right to refactor all the public
API javadoc which was referring to EntityEntry to now refer to the
internal implementation.
# things like EntityEntryExtraState should probably get moved to
.internal packages as well now - we couldn't do that before without
breaking either encapsulation or APIs.
In terms of git patches, the complexity of the changeset risks to get
a bit our of hand. What about we focus on creating a clean pull
request which focuses exclusively on making EntityEntry an interface,
and move things to the right packages and javadoc?
You'd have a trivial EntitEntryFactory, and we can then build the
evolution on top of that, not least maybe helping out by challenging
some points in parallel work.
These are the things I'd leave for a second iteration:
- add various implementations of EntityEntry iteratively, as needed
- the strategy such a Factory would use the pick an implementation
- ultimately, make it possible for an integrator to override such a Factory
For example with Hibernate OGM we might want to override / re
configure the factories to use custom EntityEntry implementations -
requirements are not fully clear at this point but it seems likely.
The priority being to define the API as that would be a blocker for
5.0, we have then better choices to leave more smarter and advanced
EntityEntry implementations for the future; we'd still need to
implement at least the essential ones to make sure the API of the
EntityEntryFactory has all the context it needs.
Thanks,
Sanne
On 24 March 2015 at 09:27, John O'Hara <johara(a)redhat.com> wrote:
> Steve,
>
> Have you had chance to look at this? Do you have any comments/observations?
>
> Thanks
>
> John
>
>
> On 17/03/15 09:24, John O'Hara wrote:
>
> Steve,
>
> I have been having a think about the EntityEntry interface, and have forked
> a branch here:
>
> https://github.com/johnaoahra80/hibernate-orm/tree/EntityEntryInterface
>
> I know it is nowhere near complete, but was this the sort of idea you had in
> mind?
>
> Thanks
>
> John
>
>
> On 13/03/15 09:44, John O'Hara wrote:
>
> EntityEntry retains a reference to a persistenceContext internally that
> org.hibernate.engine.spi.EntityEntry#setReadOnly makes calls to, is this
> where the session reference is kept? As
> org.hibernate.engine.spi.PersistenceConext is an interface could we have a
> different implementation for this use case? e.g. an
> ImmutablePersistenceContext that could be shared across sessions?
>
> For the bytecode enhancement, could we change the enhancer so that it adds
> an EntityEntry interface with javassist.
> ClassPool.javassist.ClassPool.makeInterface()() as opposed to adding a class
> javassist.ClassPool.makeClass()? I need to have a look at javassit to
> confirm what javassist.ClassPool.makeInterface() does.
>
> Thanks
>
> John
>
> On 12/03/15 18:52, Steve Ebersole wrote:
>
> It is possible. Although some of the changes are particularly painful.
> Most of EntityEntry, if it is an interface, can be made to work with your
> use case. org.hibernate.engine.spi.EntityEntry#setReadOnly I think is the
> one exception, because:
> 1) your use case needs it
> 2) it expects the Session to be available internally (its not passed)
>
> The bigger thing I am worried about for you is the bytecode stuff, as that
> ties very tightly with EntityEntry.
>
10 years, 9 months
SessionFactory building APIs
by Steve Ebersole
I had not heard anything back in regards to this, so I wanted to ask one
more time before I get ready to start cutting 5.0 pre-releases in a week or
2.
I'd love to heard feedback of any kind about the new APIs, but specific
things I know I personally question:
1) What do you think of the split in MetadataSources and MetadataBuilder?
Does the aplit make sense? Or does it make more sense to combine them into
one contract?
2) What do you think of all the overloaded methods named #with tacking
different argument types, versus distinctly named methods? E.g.
MetadataBuilder#with(ImplicitNamingStrategy),
MetadataBuilder#with(PhysicalNamingStrategy), etc rather than
MetadataBuilder#withImplicitNamingStrategy(ImplicitNamingStrategy),
MetadataBuilder#withPhysicalNamingStrategy(PhysicalNamingStrategy)
Also, I am not so sure about the term "with" anymore. I had chosen that at
the time because I thought it flowed nicely with method chaining.
10 years, 9 months
Date/Time Support and timezones
by Steve Ebersole
As I start work on supporting Java 8 Date/Time types, I wanted to get
everyone's opinion on handling OffsetDateTime, OffsetTime and ZonedDateTime
with regards to timezone. Each represent a date/time in a particular
timezone/offset (much like a Calendar). A few options:
1) Forego OffsetDateTime, OffsetTime and ZonedDateTime support and just
stick with LocalDateTime, LocalDate and LocalTime.
2) Use the timezone/offset to pass along to the driver (for proper
conversion); when reading back we'd have to read back based on the default
timezone. This is essentially the old strategy used in CalendarType which
I never really liked because its not reflexive.
3) Break them into a tuple of the store each piece. E.g., for
OffsetDateTime the Tuple is a LocalDateTime (the Timestamp) and a TZ
offset. So we'd store each individually in the database and be able to
rebuild them in a fully reflexive manner.
4) Handle them using UTC or GMT at the JDBC level. This is essentially the
same as (2)
10 years, 9 months