Currently, when a routing key is specified in a search query, we take care of targeting only the shards that can actually contain documents with the given routing keys.
However, since a shard may contain documents with different routing keys, it is possible that some matching documents found in these shards actually used a different routing key.
The only reason we don't currently apply a filter automatically is performance: users defining routing keys are likely to already filter their results based on an indexed field with the same value as the routing key.
However, I don't think it would be very expensive to also create an indexed meta-field holding the routing key, and to automatically add a filter on that field for all search queries that define routing keys explicitly. The field already exists in ES: {{ _routing }} ), and it's indexed. For Lucene, we would need to add it.
Out of the top of my head, here are the changes we would need. They're actually quite reasonable:
* For the Lucene backend, we'd need to index the routing key: currently it's just used for routing, not indexed. * For the Lucene and Elasticsearch backends, we'd need to automatically add a filter to the query when routing keys are specified. * - For the Lucene and Elasticsearch backends, we'd need to offer a way to retrieve the routing key of a particular search hit... maybe? Not sure you need this. - => No * - For the Lucene backend, we may want to introduce a new (default) sharding strategy where routing keys are enabled but only used as discriminators, not for actual sharding. * In mapper APIs - => Not necessary , we'll need to add a way to purge specific routing keys the default sharding strategy works just fine for that . (e.g. {{ SearchWorkspace.purge( "key1", "key2" ) }} ) |
|