some inline comments:
2008/4/22, Hardy Ferentschik <hibernate(a)ferentschik.de>:
On Tue, 22 Apr 2008 03:29:18 +0200, Sanne Grinovero
<sanne.grinovero(a)gmail.com> wrote:
> A new proposal:
> I got inspired by the "3VL" considerations described in Emmanuel's
> link to wikipedia, and think backwards compatibility is nice:
> add a "@IndexNullMarker" on the property, this will add an additional
> Field to the index for null values:
>
Hmm, interesting idea. It addresses one of the biggest concerns I have
with this null marker thing, namely ambiguities. But would querying look
like in this
case. Wouldn't it become harder? Whenever you want to use this feature you
would have
to combine two fields - foo and fooIsFalse - within a boolean query to get
the expected
result. Something like this: "foo:bar OR fooIsNull:true".
Yes right but I don't see how this is worse than searching for
"foo:bar OR foo:NULL-KEYWORD", just some less ambiguity.
If you just want to search for null fields, fooIsNull:true
Of course it would also mean that the index size grows since we are
adding
more fields.
And the bigger the index, ...
Interesting I wasn't expecting the index to grow as I remove a Field
and replace it with another; I've made a test for this: on 10,000,000
docs having 50% a random text value (chosen from 800 constants to
limit total string tokens) and 50% nulls the index
size grows by 3.5% compared to no null values (same docs and 800 consts).
I wasn't expecting any growth above some bytes, anyway I think 3.5% is
quite good.
> The Field and StringBridge API would remain as-is;
> If you prefer not to add an additional @IndexNullMarker could be
> dropped if you think adding this field is acceptable for all fields.
>
I think it should stay an optional and explicit feature. Adding one
addtional field for
each indexed properties does not seem justified. Especially, since we agree
that
the best solution would be to re-think your design and come up with a
proper
non-null default. So by offering this feature we might end up encouraging
people
to stick with there less optimal design ;-)
Absolutely agree, default is disabled. But I hope it will be possible
to log an explicit error if someone tries to use the field (without
enabling this feature) with the "NullQuery" Emmanuel proposed.
Cheers,
Hardy
Regards,
Sanne