On 10 Aug 2015, at 23:18, Sanne Grinovero <sanne(a)hibernate.org>
wrote:
On 7 August 2015 at 13:31, Gustavo Fernandes <gustavo(a)infinispan.org
<mailto:gustavo@infinispan.org>> wrote:
> On Fri, Aug 7, 2015 at 1:14 PM, Sanne Grinovero <sanne(a)hibernate.org> wrote:
>>
>> A quick update on some more exploration on this:
>> it turns out sorting on a NumericField when this field is also using
>> an "indexNullAs" token gets the UninvertingReader approach to throw an
>> exception.
>> My two conclusions:
>> - we need to move away from supporting those tokens in NumericField,
>> especially as stricter schema is coming in next gen Lucene
>> - yet another reason to clearly separate fields meant for searching vs
>> sorting
>
>
> One possible workaround is to enforce the indexNullAs value to match the
> underlying field type, at the
> moment it is always a string.
Interesting idea, but the user would need to provide which "value"
he's ok to give up, as he would need to pick a number to be treated as
NaN.
Since the indexNullAs parameter requires a string, would we expect
people to write a number in there?
I’m not sure if uninverting a field containing a mix of string/numeric values will be
supported again in Lucene.
Maybe it will [1], maybe not, in the meantime, the indexNullAs would need at least to be
parseable to a number by the
underlying field bridge. Many use cases allows for NaN to be a number itself without
problems.
[1]
https://issues.apache.org/jira/browse/SOLR-7521
<
https://issues.apache.org/jira/browse/SOLR-7521>
> Another possibility is to model "null" as field not present in the lucene
> document, instead of a marker token.
Thanks! I've run some tests with this and have a patch working based
on this idea; I was sceptical initially because of scoring, but it
seems it's doable without needing to score all documents on this
negation. I'll send a PR soon; basically with this approach the
indexNullAs attribute will be ignored, but it's ok I think as this
solution is better and there's no drawback (nor needing any user
configuration).
+1
Gustavo