Currently, when we have to query a given entity with dynamic sharding enabled, we determine the index managers to query by a simple request to the DynamicIndexShardingStrategy, which basically delegates to a ShardIdentifierProvider. This class internally holds a list of all the shards that are relevant for a given entity. This is nice because it allows different entities to share the same "logical" index, but have a different ShardIdentifierProvider instance, meaning you can have some shards that are exclusive to some entities and won't even be queried for other entities. The problem is, this prevents Hibernate Search to ever add a shard independently from the ShardIdentifierProvider. This is necessary for instance with JMS/JGroups, where we have multiple instances of a ShardIdentifierProvider over multiple servers, each with its own state. We would need to replicate this state, creating new shards on the master when we detect they are needed on the slave. To be more concrete, the slave calls org.hibernate.search.store.IndexShardingStrategy.getIndexManagerForAddition(Class<?>, Serializable, String, Document) on its own sharding strategy, which adds a new shard to the list of available shards, and returns the index manager. Then we know which index manager to target, and we send its name along with a list of works to the master. On the master, ideally we'd want to retrieve the index manager, and apply the works directly. But we have to call getIndexManagerForAddition(Class<?>, Serializable, String, Document) for each and every work, because it is the only way to update the internal state of the DynamicIndexShardingStrategy/ShardIdentifierProvider. A perhaps better solution would be to hold the state of the ShardIdentifierProvider (a set of strings representing shard identifiers) out of the ShardIdentifierProvider. We would have to change its methods, though:
- void initialize(Properties properties, BuildContext buildContext); would now return a Set<String> (the initial state)
- getShardIdentifier, getShardIdentifierForDeletion and getShardIdentifierForQuery would all take a Set<String> as a parameter (the current state, unmodifiable)
- getAllShardIdentifiers would not be needed anymore
The state could be stored in DynamicShardingStrategy for example (which is an implementation class). Then we could add an getOrCreateIndexManager(String shardIdentifier) method to this class, which would update the state independently from the ShardIdentifierProvider. Note that for this change to be useful for JMS/JGroups, we would also need to change the way we group tasks for these: we would need to group the works by entityType + shardIdentifier, instead of indexNameBase + shardIdentifier currently. That's because the DynamicShardingStrategies are defined per entity, not per index. |