On May 19, 2009, at 02:44, Ray Hilton wrote:
On a related note, I like the idea that each node builds and stores
its own indices for data it currently hosts. It would also be nice
if these indices could be further partitioned (in our case, we would
like to do this per-day, other situations maybe based on a modulous
of the hash code) to reduce the size of the indices. Replicated
indices are ok if they dont change that often, but ours will be
updated almost constantly, and so you either need to nominate one
node to do the work, or have some distributed locking in place,
which would probably kill performance. Building an index locally
also provides protection against index corruption by having
redundant copies around the grid. We are intending to do
deduplication and score normalization ourselves.
You can partition in Hibernate Search by using what is called sharding
(that would be for your day type partition). Otherwise the drawback of
using distributed indices is that you need to read them all to answer
your query. Not sure it's a good fit, at least it's unlikely a 1-1 fit
with where the data is stored.