[hibernate-dev] [HSEARCH] Geospatial indexing and queries

Sanne Grinovero sanne at hibernate.org
Mon Dec 5 11:01:35 EST 2011


Thanks! Looks like a great start.

Some questions/considerations:
 # In case of multi-level grids, I'd assume that the option is
specified only on the mapping, but doesn't need to be specified at
Query time?
# Not using anything fancy from the Lucene side correct? Just exact
term matches with boolean operators?
# Who are those geosuckers ?
# I would consider it very useful to provide scoring according to
distance; shouldn't be hard to make a custom Similarity which does
that, might be much harder to create one which is actually fast
enough, but I believe there are several examples in the Lucene
community.

To answer your question on "centeredOn", I'd avoid a strong dependency
on JTS. The "Object" option you propose could be an interface, and we
create a factory class which returns them; users should then be able
to use the JTS enabled factory or a dumb one.

Sanne

On 5 December 2011 15:21, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> Nicolas and I have made good progress on Geospatial queries for Hibernate Search.
>
> # Geospatial indexing and queries
>
> Our goal is to give a reasonable but pragmatic answer to geoloc queries. We do not try and implement the most obscure geo-projection favored by ancient greeks, we do not try and find matching elements within a triangular-shaped donut on Mars' surface etc. We have purposely limited the current implementation to:
>
> - find matching elements in a circle (we have plans to extends to matching elements in a rectangle if popular demands arise but in our opinion this will not be useful or rather be misleading)
> - use the internationally accepted geo projection as it is i18n neutral and not centered on one particular country. We can plan on opening to other projections if the need arise (esp if data points are provided in different projections).
>
> We made sure to expose as few gory details as possible.
>
> That being said, here are more information and questions.
>
> The JIRA is https://hibernate.onjira.com/browse/HSEARCH-923
> The branch is https://github.com/emmanuelbernard/hibernate-search/tree/HSEARCH-923
>
> ## How is geoloc data exposed to the domain model?
>
> We plan on supporting three approaches:
>
> ### Special interface and embeddable object
>
> Using a specific interface as the property return type: `o.h.s.spatial.Coordinates`
>
>    @Indexed
>    public class Address {
>        @Field String city;
>        @Spatial Coordinates location = new Coordinates() {
>            public double getLatitude() { ... }
>            public double getLongitude() { ... }
>        }
>    }
>
> ### Special interface implemented by the entity
>
> Using a specific interface implemented by the entity: `o.h.s.spatial.Coordinates`
>
>    @Indexed @Spatial
>    public class Address {
>        @Field String city;
>
>        public double getLatitude() { ... }
>        public double getLongitude() { ... }
>    }
>
> ### Use JTS's Point type
>
> Use `Point` as the spatial property type.
>
> ### (maybe) `double` hosted by two unrelated properties
>
> The problem is to find a nice way to bind these properties to the spatial data.
>
> ## How is geoloc data indexed
>
> There will be two strategies
>
> - index latitude and longitude as is and do a AND query matching both. This typically works nicely for small datasets.
> - index latitude and longitude as matching a 15 level grid (from most generic to most specific). this typically works nicely for big datasets
>
> ## Query DSL
>
> We have worked to make a fluent spatial API to the current query DSL. Before we go on implementing it, we would like your feedback. Some points remains open.
>
> ### General overview
>
>    builder.spatial()
>        .scoreByProximity() //not implemented yet
>        .onField("coord")
>            .boostedTo(2)
>        .within(2).km()
>            .of( coordinates )
>        .createQuery();
>
> ### onField
>
> onField is not a good name. It has a slightly meaning than when it's used in range().onField(). We need to find a better name:
>
> - onField
> - onGrid
> - onCoordinates
> - onLocation
>
> This really represents the metadata set where the location will be stored. In the boolean approach, we store latitude and longitude. In the grid approach, we store latitude,
> longitude and the set of grids coordinates belong to.
>
> .onField() does accept a field name which can be the `Coordinates` property or the virtual field used by the class-level bridge (if lat and long are top level properties).
>
> When latitude and longitude are independent properties, we would use
>
>    builder.
>        .onLatitudeField("lat")
>        .andLongitudeField("lat")
>
> ### Surface checked
>
> #### Option 1: centeredOn
>
>    .centeredOn(double, double)
>    //or
>    .centeredOn()
>      .latitude(double)
>      .longitude(double)
>    //or
>    .centeredOn(SpatialIndexable)
>    .centeredOn(JTS.Point) // hard dependency on JTS even for non spatial users :(
>    .centeredOn(Object) //? to avoid JTS dep
>
> - Should we have a version accepting Object?
> - What is best, centeredOn(double, double) or centeredOn().latitude(double).longitude(double)?
>
> #### Option 2: in / within
>
>
>    //query within circle
>    b.spatial()
>        .onField("coord")
>        .within(2).km()
>        .of(SpatialIndexable)
>
>        .within(2).km()
>        .of()
>            .latitude()
>            .longitude()
>         .createQuery()
>
>   //or with a different unit handling
>
>    //query within circle
>    b.spatial()
>        .onField("coord")
>        .within(2, Unit.km)
>        .of(SpatialIndexable)
>
>        .within(2, Unit.km)
>        .of()
>            .latitude()
>            .longitude()
>         .createQuery()
>
> My reason to support units is that a. it's explicit and b. when those geosuckers improve, we could support time units like mins or hours. Note, that's a very hard problem to crack and solutions are resource intensive and not very accurate. None really do it correctly, not Google for sure.
>
>
> We could support rectangles / boxes if really needed
>
>    //query in box
>    b.spatial()
>        .onField("coord")
>        .inBox()
>            .from()
>            .to()
>         .createQuery();
>
>    //more formal but more correct wrt projection
>    b.spatial()
>        .onField("coord")
>        .inBox()
>            .withUpperLeft()
>            .withLowerRight()
>         .createQuery();
>
>
> Please give us your feedback.
>
> ## TODOs
>
> - Implement fluent DSL
> - Implement Special interface implemented by the entity
> - Implement  Use JTS's Point type
> - Implement bridge that supports indexing for boolean queries based on lat and long instead of the grid.
> - Implement @Spatial as a marker annotation for the spatial bridge
> - Implement variable score based on proximity
>  Today we use constant score, ie in = 1, out = 0. We can think about a score that goes from 1 to 0 based on the distance from the center to the circle size
>  We can imagine queries that should return close elements above far elements.
>  Note we might need a score going from 1 to .5 or some other value. Need to think about that.
> - Write how to focused doc
> - Write doc on perf comparing grid vs boolean queries
> - Convert to JBoss logging
> - Add unit test using faceting + spatial queries
>
> Emmanuel
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev




More information about the hibernate-dev mailing list