Lucene 5 is coming: pitfalls to consider
by Sanne Grinovero
Hi all,
the Hibernate Search branch upgrading to Apache Lucene 5.2.x is almost
ready, but there are some drawbacks on top of the many nice efficiency
improvements.
# API changes
The API changes are not too bad, and definitely an improvement. I'll
provide a detailed list as usual in the Hibernate Search migration
guide - for now let it suffice to know that it's an easy upgrade for
end users, as long as they were just creating Query instances and not
using the more powerful and complex stuff.
# Sorting
To sort on a field will require an UninvertingReader to wrap the
cached IndexReaders, and the uninverting process is very inefficient.
On top of that, the result of the uninverting process is not
cacheable, so that will need to be repeated on each index, for each
query which is executed.
In short, I expect performance of sorted queries to be quite degraded
in our first milestone using Lucene 5, and we'll have to discuss how
to fix this.
Needless to say, fixing this is a blocking requirement before we can
consider the migration complete.
Sorting will not need an UninvertingReader if the target field has
been indexed as DocValues, but that implies:
- we'll need an explicit, upfront (indexing time) flag to be set
- we'll need to detect if the matching indexing options are
compatible with the runtime query to skip the uninverting process
This is mostly a job for Hibernate Search, but in terms of user
experience it means you have to mark fields for "sortability"
explicitly; will we need to extend the protobuf schema?
Please make sure we'll just have to hook in existing metadata, we
can't fix this after API freeze.
# Filters
We did some clever bitset level optimisations to merge multiple Filter
instances and save memory to cache multiple filter instances, I had to
drop that code as we don't deal with in-heap structures more but the
design is about iterating off heap chunks of data, and resort on the
more traditional Lucene stack for filtering.
I couldn't measure the performance impact yet; it's a significantly
different approach and while it sounds promising on paper, we'll need
some help testing this. The Lucene team can generally be trusted to go
in the better direction, but we'll have to verify if we're using it in
the best way.
# Analyzers
It is no longer possible to override the field->analyzer mapping at
runtime. We did expose this feature as a public API and I found a way
to still do it, but it comes with a performance price tag.
We'll soon deprecate this feature; if you can, start making sure
there's no need for this in Infinispan as at some time in the near
future we'll have to drop this, with no replacement.
# Index encoding
As usual the index encoding evolves and the easy solution is to
rebuild it. Lucene 5 no longer ships with backwards compatible
de-coders, but these are available as separate dependencies. If you
feel the need to be able to read existing indexes, we should include
these.
(I'm including these as private dependencies in the Hibernate Search modules).
Thanks,
Sanne
9 years, 3 months
Hidden failures in the testsuite
by Sanne Grinovero
Hi all,
I just updated my local master fork and started the testsuite, as I
sometimes do.
It's great to see that the build was successful, and no tests
*appeared* to have failed.
But! lazily scrolling up in the console, I see lots of exceptions
which don't look like intentional (I'm aware that some tests
intentionally create error conditions). Also some tests are extremely
verbose, which might be the reason for nobody noticing these.
Some examples:
- org.infinispan.it.compatibility.EmbeddedRestHotRodTest seems to log
TRACE to the console (and probably the whole module)
- CDI tests such as org.infinispan.cdi.InfinispanExtensionRemote seem
to fail in great number because of some ClassNotFoundException(s)
and/or ResourceLoadingException(s)
- OSGi integration tests seem to be all broken by some invalid
integration with Aries / Geronimo
- OSGi integration tests dump a lot of unnecessary information to the
build console
- the Infinispan Query tests log lots of WARN too, around missing
configuration properties and in some cases concerning exceptions; I'm
pretty sure that I had resolved those in the past, seems some
refactorings were done w/o considering the log outputs.
Please don't ignore the output; if it's too verbose to watch, that
needs to be resolved too.
I also monitor the "expected execution time" of some modules I'm
interested in, that's been useful in some cases to figure out that
there was some regression.
One big question: why is it that so many tests "appear to be good" but
are actually broken? I would like to understand that.
Thanks,
Sanne
9 years, 3 months
Redis infinispan cache store
by Simon Paulger
Hi,
I'm interested in developing inifinispan integration with Redis for use in
JBoss. Before working on JBoss, I first need to add the capability to
Infinispan itself.
Is this an enhancement that the infinispan community would be interested in?
Regards,
Simon
9 years, 4 months
Blue-Green deployment scenario
by Christian Beikov
Hello,
I have been reading the rolling upgrade chapter[1] from the
documentation and I have some questions.
1. The documentation states that in the target cluster, every cache
that should be migrated, should use a CLI cache loader pointing to
the source cluster. I suppose that this can only be configured via
XML but not via the CLI or JMX? That would be bad because after a
node restart the cache loader would be enabled again.
2. How would the JMX URL look like if I wanted to connect to a secured
Wildfly over HTTP? I was thinking of
jmx:http-remoting-jmx://USER:PASSWORD@HOST:PORT/CACHEMANAGER/CACHE
3. What do I need to do to rollback to the source cluster after
switching a few nodes to the target cluster?
Thanks in advance!
Regards,
Christian
[1]
http://infinispan.org/docs/7.2.x/user_guide/user_guide.html#_rolling_upgr...
9 years, 4 months
Early Access builds for JDK 8u66 b02 and JDK 9 b78 are available on java.net
by Rory O'Donnell
Hi Galder,
Early Access build for JDK 8u66 b02 <http://jdk8.java.net/download.html>
is available on java.net, summary of changes are listed here.
<http://download.java.net/jdk8u66/changes/jdk8u66-b02.html?q=download/jdk8...>
Early Access build for JDK 9 b78 <https://jdk9.java.net/download/> is
available on java.net, summary of changes are listed here
<http://download.java.net/jdk9/changes/jdk9-b78.html?q=download/jdk9/chang...>.
With respect to ongoing JDK 9 development, I'd like to draw your
attention to the following requests to provide
feedback on the relevant mailing lists.
*OpenJDK JarSigner API*
JDK 9 is more restricted on calling sun.* public methods but we know
there are users calling
sun.security.tools.jarsigner.Main to sign jar files. A new API is
proposed
<http://mail.openjdk.java.net/pipermail/security-dev/2015-August/012636.html>for
this very purpose in OpenJDK.
Feedback on this API should be provided on the security-dev
<http://mail.openjdk.java.net/mailman/listinfo/security-dev> mailing list.
*RFC JEP: NIST SP 800-90A SecureRandom implementations : *Feedback on
this draft
<http://mail.openjdk.java.net/pipermail/security-dev/2015-August/012667.html>
JEP should be provided
on the security-dev
<http://mail.openjdk.java.net/mailman/listinfo/security-dev> mailing list.
*
* *Public API for internal Swing classes*
According to the JEP 200: The Modular JDK
<http://openjdk.java.net/jeps/200> we expect that classes from internal
packages (like sun.swing) will not be
accessible. If you are using the internal Swing API and it is not
possible to replace it by public API, please provide
feedback on the swing-dev
<http://mail.openjdk.java.net/mailman/listinfo/swing-dev> mailing list.
If you haven’t already subscribed to a list then please do so first,
otherwise your message will be discarded as spam.
Finally, videos of presentations from the JVM Language Summit have been
published at :
http://www.oracle.com/technetwork/java/javase/community/jlssessions-2015-...
.
Rgds, Rory
--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland
9 years, 4 months
Shared vs Non-Shared CacheStores
by Sanne Grinovero
I would like to propose a clear cut separation between our shared and
non-shared CacheStores,
in all terms such as:
- Configuration options
- Integration contracts (Split the CacheStore SPI)
- Implementations
- Terminology, to avoid any further confusion around valid
configurations and sensible architectures
We have loads of examples of users who get in trouble by configuring
one incorrectly, but also there are plenty of efficiency improvements
we could take advantage of by clearly splitting the integration points
and the implementations in two categories.
Not least, it's a very common and dangerous pitfall to assume that
Infinispan is able to restore a consistent state after having stopped
a DIST cluster which passivated into non-shared CacheStore instances,
or even REPL clusters when they don't shutdown all at the same exact
time (and "exact same time" is a strange concept at least..). We need
to clarify the different options, tradeoffs and their consequences..
to users and ourselves, as a clearly defined use case will avoid bugs
and simplify implementations.
# The purpose of each
I think that people should use a non-shared (local?) CacheStore for
the sole purpose of expanding to storage capacity of each single
node.. be it because you don't have enough memory at all, or be it
because you prefer some extra safety margin because either your
estimates are complex, or maybe because we live in a real world were
the hashing function might not be perfect in practice. I hope we all
agree that Infinispan should be able to take such situations with at
worst a graceful performance degradatation, rather than complain
sending OOMs to the admin and setting the service on strike.
A Shared CacheStore is useful for very different purposes; primarily
to implement a Cache on some other service - for example your (single,
shared) RDBMs, a slow (or expensive) webservice your organization has
to call frequently, etc.. Or it's useful even as a write-through cache
on a similar service, maybe internal but not able to handle the high
variation of load spikes which Infinsipan can handle better.
Finally, a great use case is to have a consistent backup of all your
data-grid content, possibly in some "reference" form such as JPA
mapped entities.
# Benefits of a Non-Shared
A non-shared CacheStore implementor should be able to take advantage
of *its purpose*, among the big ones I see:
- Exclusive usage -> locking of a specific entry can be handled at
datacontainer level, can simplify quite some internal code.
- Reliability -> since a clustered node needs to wipe its state at
reboot (after a crash), it's much simpler to code any such CacheStore
to avoid any form of disk synch or persistance guarantees.
- Encoding format -> this can be controlled entirely by Infinispan,
and no need to take factors like rolling upgrade compatible encodings
in mind. JBoss Marshalling would be good enough, or some
implementations might not need to serialize at all.
Our non-shared CacheStore implentation(s) could take advantage of
lower level more complex code optimisations and interfaces, as users
would rarely want to customize one of these, while the use case of
mapping data to a shared service needs a more user friendly SPI so to
keep it simple to plug in custom stores: custom data formats, custom
connectors, get some help in implementing concurrency correctly.
Proper Transaction integration for the CacheStore has been on our
wishlist for some time too, I suspect that accepting that we have been
mixing up two different things under a same name so far, would make it
simpler to implement further improvements such as transactions: the
way to do such a thing is very different in each of these use cases,
so it would help at least to implement it on a subset first, or maybe
only if it turns out there's no need for such things in the context of
the local-only-dedicated "swapfile".
# Mixed types should be killed
I'm aware that some of our current implementations _could_ work both as
shared or non-shared, for example the JDBC or JPACacheStore or the
Remote Cachestore.. but in most cases it doesn't make much sense. Why
would you ever want to use the JPACacheStore if not to share data with
a _shared_ database?
We should take such options away, and by doing so focus on the use
cases which actually matter and simplify the implementations and
improve the configuration validations.
If ever a compelling storage technology is identified which we'd like to
offer as an option for both shared or non-shared, I would still
recommend to make two different implementations, as there certainly are
different requirements and assumptions when coding such a thing.
Not least, I would very like to see a default local CacheStore:
picking one for local "emergency swapping" should be a no-brainer for
users; we could setup one by default and not bother newcomers with
complex choices.
If we simplify the requirement of such a thing, it should be easy to
write one on standard Java NIO2 APIs and get rid of the complexities of
maintaining the native integration with things like LevelDB, not least
the inefficiency of Java to make such native calls.
Then as a second step, we should attack the other use case: backups;
from a *purpose driven perspective* I'd then see us revive the Cassandra
integration; obviously as a shared-only option.
Cheers,
Sanne
9 years, 4 months
JCache integration with Wildfly provided configuration
by Christian Beikov
Hello,
I am using Infinispan 7.2.3.Final within Wildfly 9.0.1 and I would like
to use the JCache integration but I struggle a bit.
I configured the JGroups subsystem in the standalone.xml of my Wildfly
installation to enable clustering of Infinispan caches. That works as
expected, but I wasn't sure how I would have my caches clustered too. I
thought of some possible solutions but they both aren't really what I am
looking for.
1. Put the cache container configuration into standalone.xml
2. Copy the JGroups configuration and create a new transport in a
custom infinispan configuration
When doing 1. I can't really use the JCache integration because there is
no way to tell the caching provider, that I want a CacheManager for a
specific cache container. If you would recommend doing 1. then it would
be nice if the caching provider would not only accept file URIs, but
also something like JNDI names. By doing that, I could reference
existing cache containers which at least solves the problem with the
JCache integration. Still I would prefer option 2. because I wouldn't
have to change the standalone.xml every time I add a cache.
When doing 2. I can use the infinispan configuration file as URI when
creating the cache manager so the JCache integration works without a
problem. The only thing that is bothering me is, that I have to copy the
JGroups configuration to have a separate transport for my applications
cache container. I can't seem to reference the transport that I
configured in the standalone.xml nor does it default to that. I would
really like to reuse the JGroup channel that is already established.
What I would like to know is, whether there is a possibility to make use
of the JGroups configuration I did in the standalone.xml. If there
isn't, what should I do when wanting to cluster my caches? Just go with
option 1?
Regards,
Christian Beikov
9 years, 4 months