I wanted to start a discussion about this issue.
It's about stored field retrieval. When searching, Elasticsearch can return
field values two different ways:
* through the "_source" attribute , which basically provides a
copy-paste of the JSON that was submitted when indexing
* or through the "fields" attribute , which only works for stored
fields and provides the actual value that Elasticsearch stored
The main difference really boils down to formatting. With the "_source"
attribute, there's no formatting involved, you get exactly what was
originally submitted. With the "fields" attribute, the value is formatted
according to the first format in the mapping's format list .
The thing is, Elasticsearch allows admins to set multiple formats for a
given field. This won't change the output format, but will allow using any
one of these formats when submitting information. Since these "extra"
formats probably aren't understood by Hibernate Search, this means that
using the "_source" attribute to retrieve field values becomes unreliable
as soon as someone else adds/changes documents in Elasticsearch...
So we have two solutions:
1. Either we only use the "fields" attribute to retrieve field values, and
we force users to have the output format set to something HSearch will
understand, but allow extra input formats.
2. or we use the "_source" attribute to retrieve field values, and then we
force both output and input format on users, and do not allow extra formats.
I'd be in favor of 1, which seems more rational to me. It only has one
downside: if we go on with this approach, Calendar values (and
ZonedDateTime, ZonedTime, etc.) will have to be stored as String, not as
Date, since Elasticsearch doesn't store the timezone, just the UTC
timestamp. We're currently working this around by inspecting the "_source",
which contains the original timezone (since it's just the raw, originally
What do you think?
Yoann Rodière <yoann(a)hibernate.org>
Hibernate NoORM Team