[hibernate-dev] [Search] ElasticSearch - Progress on faceting
Gunnar Morling
gunnar at hibernate.org
Thu Feb 4 13:01:23 EST 2016
Hi Guillaume,
Awesome, great to see you making progress here! I need to take a
closer look, but some quick comments inline.
@Sanne, this reminds me of that pending PR of mine; Any chance you
spend some cycles reviewing and merging it? Guillaume's work is based
on it and it'll also serve as base of my own future work. Thanks!
2016-02-04 16:37 GMT+01:00 Guillaume Smet <guillaume.smet at gmail.com>:
> Hi,
>
> So the good news: most of the faceting tests now pass with ES.
>
> I rebased the patch onto master.
>
> https://github.com/gsmet/hibernate-search/commits/elastic-facet2
> and for now only one commit:
> https://github.com/gsmet/hibernate-search/commit/657ecfe684e6b4e952d11274523a4c5683e0c7eb
>
> It's not a PR yet as it's not feature complete and there are still (a few)
> tests failing.
>
> General infrastructure
> ================
>
> - I introduced a JsonBuilder to make the gson API chainable: I think it's
> more readable this way. I haven't updated everything to use it (especially
> the Lucene Query -> Json part is not ported yet). Should I continue with
> it? Please, please, say yes :).
It's a great example of how subjective readability is, because I don't
find it necessarily better than the previous version, it's just
different :) An actual disadvantage is the higher number of object
allocations due to the builders.
In the end I don't really care about it that much; how the JSON is
created is more a technical detail which we may change at any time. So
if you feel strongly about it, feel free to do it. But I would not
expose this builder and the usage of GSON through the public API so we
actually have the freedom of future changes.
> - All the API is now based on JsonObject so you don't have to play with
> String -> Json -> String -> Json manipulation
> ---> would appreciate feedback on these 2 changes so that I can make
> everything consistent
+1 very good that you avoided some forth-and-back cycles, that has
been bugging me, too.
> - I introduced a ToElasticSearch class which translates Lucene/HSearch
> objects to Json. I moved the existing query translator there.
> - I'm not sure building the exact same document we build for Lucene is the
> way to go. I'm wondering if we should have a distinct document builder for
> ES. I'll see how it works for facets not being fields.
Yes, keep in mind that the route through creating a Lucene Document
object is just temporary hack to have a PoC of integrating ES. For HS
6 we'll look into refactoring the required pieces and avoid that
intermediary step. Not sure how much sense it makes to spend cycles on
this right now, it'll be larger task for sure.
> - I made a few fixes here and there: especially the indexing didn't support
> multiple values per field. I made something complicated, maybe we could
> just index a JsonArray even for single values? see
> https://github.com/gsmet/hibernate-search/commit/cb88b5790996442e137fab4aec330be04dfef587#diff-e832854d983fba756218944587fd5c57R234
That's cool. Actually I had prepared that very thing locally, too, but
your impl is even nicer :) I wouldn't use arrays for single values,
the upgrade from single value to array if needed seems nicer.
Btw. that's one of the cases where going through the Lucene document
is limiting us. When having element collections of component types, we
cannot assemble the JSON array in such way that's ensured that each
array element represents one "row" from the collection. This is
because we don't know which of the Lucene fields belong together.
That'll change with the larger re-work in 6.
Also the case of root.components[0].foo, root.components[0].bar,
root.components[1].foo, root.components[1].bar where the array would
have to be at the level of "components" and not at the leave level.
>
> Faceting
> =======
>
> - The querying/extraction should be OK
> - ES doesn't support range bounds inclusion/exclusion for range faceting so
> I used the query faceting to build the range faceting
> - I have to work on the mapping as having multiple facets on the same field
> is not supported yet: currently I use the indexed field to build the
> facets, not specific fields
> - I have some additional work to do on indexing related to the above matter
>
> Various issues
> ===========
>
> - A few tests don't pass:
> . ElasticSearch returns facets results even if the field doesn't exist: not
> sure we can do anything about that
> . A test in the WebShopTest is failing as I don't support yet faceting
> based on a non existing field
> . CollectionUpdateEventTest is failing: this one is weird
> - I had to work around an issue with the EsDateBridge and faceting (marked
> as XXX in DocumentBuilderIndexedEntity): Dates are considered to be Long
> and for ES, we want them to be Strings. Would appreciate a comment on that.
Couldn't you do the conversion in the ElasticSearchIndexWorkVisitor?
>
> Comments and thoughts appreciated.
>
> --
> Guillaume
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
More information about the hibernate-dev
mailing list