Message Title

Yoann Rodière updated an issue

Change By:	Yoann Rodière

Currently, when we have to query a given entity with dynamic sharding enabled, we determine the index managers to query by a simple request to the DynamicIndexShardingStrategy, which basically delegates to a ShardIdentifierProvider. This class internally holds a list of all the shards that are relevant for a given entity.

This is nice because it allows different entities to share the same "logical" index, but have a different ShardIdentifierProvider instance, meaning you can have some shards that are exclusive to some entities and won't even be queried for other entities.

The problem is, this prevents Hibernate Search to ever add a shard independently from the ShardIdentifierProvider. This is necessary for instance with JMS/JGroups, where we have *multiple* instances of a ShardIdentifierProvider over multiple servers, each with its own state. We would need to replicate this state, creating new shards on the master when we detect they are needed on the slave. See HSEARCH-1886 an example of issue that arise if we don't.

To be more concrete, the slave calls {{org.hibernate.search.store.IndexShardingStrategy.getIndexManagerForAddition(Class<?>, Serializable, String, Document)}} on its own sharding strategy, which adds a new shard to the list of available shards, and returns the index manager. Then we know which index manager to target, and we send its name along with a list of works to the master. On the master, ideally we'd want to retrieve the index manager, and apply the works directly. But we have to call {{getIndexManagerForAddition(Class<?>, Serializable, String, Document)}} for each and every work, because it is the only way to update the internal state of the DynamicIndexShardingStrategy/ShardIdentifierProvider.

A perhaps better solution would be to hold the state of the ShardIdentifierProvider (a set of strings representing shard identifiers) *out* of the ShardIdentifierProvider. We would have to change its methods, though:

* {{void initialize(Properties properties, BuildContext buildContext);}} would now return a Set<String> (the initial state)
* {{getShardIdentifier}}, {{getShardIdentifierForDeletion}} and {{getShardIdentifierForQuery}} would all take a Set<String> as a parameter (the current state, unmodifiable)
* {{getAllShardIdentifiers}} would not be needed anymore

The state could be stored in DynamicShardingStrategy for example (which is an implementation class). Then we could add an {{getOrCreateIndexManager(String shardIdentifier)}} method to this class, which would update the state independently from the ShardIdentifierProvider.

Note that for this change to be useful for JMS/JGroups, we would also need to change the way we group tasks for these: we would need to group the works by entityType + shardIdentifier, instead of indexNameBase + shardIdentifier currently. That's because the DynamicShardingStrategies are defined per entity, not per index.

Add Comment

This message was sent by Atlassian JIRA