after some performance tests on Infinispan we concluded that the way
Hibernate Search writes to the index is not as efficient as we'd like
While these results came from extensive Infinispan testing, it's
actually highlighting a performance issue which isn't strictly related
to Infinispan storage, and could affect any kind of index storage
which we support.
Surprisingly this wasn't too bad for normal disk based FSDirectory,
especially as we expect users to got for the NRT backend, but as you
might know an NRT backend is not suited for clustered deployments (be
it using a master/slave approach or the Infinispan index, or other
more experimental shared index strategies like NFS based sharing).
So, while NRT performs great, the "normal" backend actually doesn't
perform very well because of the high frequency of commits.
Surprisingly even an ASYNC backend does the same amount of commits, so
enabling the async workers of Hibernate Search would decouple the
latency of the main thread from the speed of the backend, but if you
have a sustained amount of writes the overall throughput that the
backend is able to deliver would not change.
A much better design is to allow the ASYNC backends to _not_ flush at
each commit, but to flush periodically. The overall throughput
increase is estimated to be 2 to 3 orders of magnitude, details
depending on the kind of storage you're having.
This idea is being proposed as:
Gustavo sent a first pull request for this already, some preliminary
results can be found on the pull request description:
Note that his tests focus on Infinispan storage but the code is
unrelated to Infinispan and will benefit all users independently from
the storage technology.
This patch will expose a new configuration property which will allow
users to specify how often they need the indexes to be committed; in
other words, you'll be able to specify that Async is acceptable for
your application, but as long as the "staleness" of your index doesn't
exceed a set threshold of time.
Now the best part of this, is that we can also propose a solution for
the SYNC backends which require an illusion of transactional
behaviour. Essentially we're aiming at implementing a synchronous
backend which has performance close to the capabilities of NRT, but
without the drawbacks so that it can safely be used on shared indexes.
This ideas is a bit more complex to implement, I've attempted to describe it on: