[hibernate-dev] [HSEARCH] Geospatial indexing and queries

Emmanuel Bernard emmanuel at hibernate.org
Mon Dec 5 10:21:38 EST 2011


Nicolas and I have made good progress on Geospatial queries for Hibernate Search.

# Geospatial indexing and queries

Our goal is to give a reasonable but pragmatic answer to geoloc queries. We do not try and implement the most obscure geo-projection favored by ancient greeks, we do not try and find matching elements within a triangular-shaped donut on Mars' surface etc. We have purposely limited the current implementation to:

- find matching elements in a circle (we have plans to extends to matching elements in a rectangle if popular demands arise but in our opinion this will not be useful or rather be misleading)
- use the internationally accepted geo projection as it is i18n neutral and not centered on one particular country. We can plan on opening to other projections if the need arise (esp if data points are provided in different projections).

We made sure to expose as few gory details as possible.

That being said, here are more information and questions.

The JIRA is https://hibernate.onjira.com/browse/HSEARCH-923
The branch is https://github.com/emmanuelbernard/hibernate-search/tree/HSEARCH-923

## How is geoloc data exposed to the domain model?

We plan on supporting three approaches:

### Special interface and embeddable object 

Using a specific interface as the property return type: `o.h.s.spatial.Coordinates`

    @Indexed
    public class Address {
        @Field String city;
        @Spatial Coordinates location = new Coordinates() {
            public double getLatitude() { ... }
            public double getLongitude() { ... }
        }
    }

### Special interface implemented by the entity

Using a specific interface implemented by the entity: `o.h.s.spatial.Coordinates`

    @Indexed @Spatial
    public class Address {
        @Field String city;

        public double getLatitude() { ... }
        public double getLongitude() { ... }
    }

### Use JTS's Point type

Use `Point` as the spatial property type.

### (maybe) `double` hosted by two unrelated properties

The problem is to find a nice way to bind these properties to the spatial data.

## How is geoloc data indexed

There will be two strategies

- index latitude and longitude as is and do a AND query matching both. This typically works nicely for small datasets.
- index latitude and longitude as matching a 15 level grid (from most generic to most specific). this typically works nicely for big datasets

## Query DSL

We have worked to make a fluent spatial API to the current query DSL. Before we go on implementing it, we would like your feedback. Some points remains open.

### General overview

    builder.spatial()
        .scoreByProximity() //not implemented yet
        .onField("coord")
            .boostedTo(2)
        .within(2).km()
            .of( coordinates )
        .createQuery();

### onField

onField is not a good name. It has a slightly meaning than when it's used in range().onField(). We need to find a better name:

- onField
- onGrid
- onCoordinates
- onLocation

This really represents the metadata set where the location will be stored. In the boolean approach, we store latitude and longitude. In the grid approach, we store latitude,
longitude and the set of grids coordinates belong to.

.onField() does accept a field name which can be the `Coordinates` property or the virtual field used by the class-level bridge (if lat and long are top level properties).

When latitude and longitude are independent properties, we would use

    builder.
        .onLatitudeField("lat")
        .andLongitudeField("lat")

### Surface checked

#### Option 1: centeredOn

    .centeredOn(double, double)
    //or
    .centeredOn()
      .latitude(double)
      .longitude(double)
    //or
    .centeredOn(SpatialIndexable)
    .centeredOn(JTS.Point) // hard dependency on JTS even for non spatial users :(
    .centeredOn(Object) //? to avoid JTS dep

- Should we have a version accepting Object?
- What is best, centeredOn(double, double) or centeredOn().latitude(double).longitude(double)?

#### Option 2: in / within


    //query within circle
    b.spatial()
        .onField("coord")
        .within(2).km()
        .of(SpatialIndexable)

        .within(2).km()
        .of()
            .latitude()
            .longitude()
         .createQuery()

   //or with a different unit handling

    //query within circle
    b.spatial()
        .onField("coord")
        .within(2, Unit.km)
        .of(SpatialIndexable)

        .within(2, Unit.km)
        .of()
            .latitude()
            .longitude()
         .createQuery()

My reason to support units is that a. it's explicit and b. when those geosuckers improve, we could support time units like mins or hours. Note, that's a very hard problem to crack and solutions are resource intensive and not very accurate. None really do it correctly, not Google for sure.


We could support rectangles / boxes if really needed

    //query in box
    b.spatial()
        .onField("coord")
        .inBox()
            .from()
            .to()
         .createQuery();

    //more formal but more correct wrt projection
    b.spatial()
        .onField("coord")
        .inBox()
            .withUpperLeft()
            .withLowerRight()
         .createQuery();


Please give us your feedback.

## TODOs

- Implement fluent DSL
- Implement Special interface implemented by the entity
- Implement  Use JTS's Point type
- Implement bridge that supports indexing for boolean queries based on lat and long instead of the grid.
- Implement @Spatial as a marker annotation for the spatial bridge
- Implement variable score based on proximity
  Today we use constant score, ie in = 1, out = 0. We can think about a score that goes from 1 to 0 based on the distance from the center to the circle size
  We can imagine queries that should return close elements above far elements.
  Note we might need a score going from 1 to .5 or some other value. Need to think about that.
- Write how to focused doc
- Write doc on perf comparing grid vs boolean queries
- Convert to JBoss logging
- Add unit test using faceting + spatial queries

Emmanuel


More information about the hibernate-dev mailing list