[hibernate-dev] [SEARCH] Remove support for (or deprecate) date/calendar resolution?
Yoann Rodiere
yoann at hibernate.org
Thu Jan 5 11:20:56 EST 2017
Hi,
That's a catchy subject, now I'm sure I got Guillaume's (angry) attention :)
Sanne wanted us to think about feature removal... Here's a proposal.
The feature I'd like to deprecate/remove: automatic truncation of
date/calendars to a given resolution. It's used like that:
@Field
@CalendarBridge(resolution = Resolution.DAY)
private Calendar dateSent;
I've been working on date-related tests for the past few hours (well,
months, too), and I'm becoming more and more convinced that this resolution
thing is the wrong solution to an admittedly common issue. The other (in my
opinion right) solution being to use range queries.
When we truncate a date to a given resolution, we must use a particular
timezone, obviously. It makes no sense to truncate to the day when we don't
choose a particular timezone.
Then the truncated date will match only if we search for dates whose
truncation is identical in this timezone.
And guess what? In the current implementation, we use... GMT. Always. See
how org.apache.lucene.document.DateTools.round(long, Resolution) works,
getting the calendar used for truncation from TL_CAL, which uses the GMT
timezone.
What it means is that the truncation currently produces rather unexpected
results as soon as applications use a timezone different from GMT.
So it's likely to be considered as broken:
- for multi-timezone applications
- for single-timezone applications using a different timezone than GMT
For instance, let's index 2016-01-01T00:00:00+01:00 and query
2016-01-01T12:00:00+01:00 with a DAY resolution:
- We'll index 2015-12-31 (2016-01-01T00:00:00+01:00 is
2015-12-31T23:00:00+00:00, which once truncated is 2015-12-31+00:00)
- We'll query 2016-01-01 (2016-01-01T12:00:00+01:00 is
2016-01-01T11:00:00+00:00, which once truncated is 2016-01-01+00:00)
That won't match, but from the point of view of a single-timezone user it
should have.
Of course I'm exaggerating a bit, and the issue probably hasn't been
noticed by many, because:
- there is no problem with the MINUTES, SECONDS and MILISECONDS
resolutions, which are not affected by timezones
- there is no problem with the HOUR resolution in most timezones (there
is a problem, however, when the timezone is something like -11:30)
- the problem is less likely to be detected when the user timezone is
close to GMT, because then only date/times close to midnight (i.e. outside
business hours) will be affected.
But my point is: the current solution produces counter-intuitive, sometimes
even useless results.
It was useful when dates were indexed as strings, but now that we (mostly)
index them as integers, this does not seem relevant. Even the DateTools
javadoc hints that DateTools (including "round") are not relevant for
numeric dates:
Another approach is {@link NumericUtils}, which provides
> a sortable binary representation (prefix encoded) of numeric values, which
> date/time are.
> For indexing a {@link Date} or {@link Calendar}, just get the unix
> timestamp as
> <code>long</code> using {@link Date#getTime} or {@link
> Calendar#getTimeInMillis} and
> index this as a numeric value with {@link LongField}
> and use {@link NumericRangeQuery} to query it.
What I'd like to propose is to deprecate or remove date/calendar
resolution, or limit it to STRING encoding, and instead encourage users to
query their dates with range queries whenever they don't want exact matches.
Of course we could also implement a way for users to customize the timezone
to be used when truncating. But range queries have other advantages:
- they fit the multi-timezone use case
- they allow to query the same field with multiple resolutions
At the very least, I'd like us to agree that in the future, we will only
implement automatic truncation with date/time types where the timezone is
either irrelevant or included (for instance ZonedDateTime or LocalDate or
LocalTime, but *not* Instant).
Relevant ticket: https://hibernate.atlassian.net/browse/HSEARCH-2378.
What do you think?
Yoann Rodière <yoann at hibernate.org>
Hibernate NoORM Team
More information about the hibernate-dev
mailing list