[hibernate-dev] Coordinates storage in Lucene index for spatial functionality

Nicolas Helleringer nicolas.helleringer at gmail.com
Mon May 7 09:41:32 EDT 2012


Here are some results :

Mean time with Grid : 4.9297471630769225 ms. Average number of docs
 fetched : 2416.373846153846
Mean time with Grid + Distance filter : 6.48634534 ms. Average number of
docs  fetched : 425.84
Mean time with DoubleRange : 15.39593650051282 ms. Average number of docs
 fetched : 542.72
Mean time with DoubleRange + Distance filter : 21.158394677435897 ms.
Average number of docs  fetched : 425.8779487179487

Sounds weird that with distance filter the two results are note the same. I
shall investigate that.

Niko

2012/5/7 Emmanuel Bernard <emmanuel at hibernate.org>

> Do you know the average amount of POI that were filtered in memory but the
> DistanceFilter during these runs?
>
> Emmanuel
>
> On 7 mai 2012, at 10:31, Nicolas Helleringer wrote:
>
> Hi all,
>
> I have done a radian patch/branch and some benchmarks on geonames french
> database.
>
> Benchs are on 2k calls each run.
>
> Radians:
> run 1
> Mean time with Grid : 4.808043092820513 ms
> Mean time with Grid + Distance filter : 6.571108878461538 ms
> Mean time with DoubleRange : 14.62661525128205 ms
> Mean time with DoubleRange + Distance filter : 20.143597923076925 ms
>
> run 2
> Mean time with Grid : 5.290368523076923 ms
> Mean time with Grid + Distance filter : 6.706567517435897 ms
> Mean time with DoubleRange : 14.878960702564102 ms
> Mean time with DoubleRange + Distance filter : 20.75806591948718 ms
>
> Degrees:
> run 1
> Mean time with Grid : 5.101956610769231 ms
> Mean time with Grid + Distance filter : 6.548685109230769 ms
> Mean time with DoubleRange : 14.767478146153845 ms
> Mean time with DoubleRange + Distance filter : 20.668063972820512 ms
>
> run 2
> Mean time with Grid : 4.683360031282051 ms
> Mean time with Grid + Distance filter : 6.7065247435897435 ms
> Mean time with DoubleRange : 14.617140157948716 ms
> Mean time with DoubleRange + Distance filter : 20.074868595897435 ms
>
> The radian branch is here for review :
> https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923-RADIANS
>
> While moving from degrees to radians I have seen that DSL has still some
> work to do.
> I shall focus on that now.
>
> Niko
>
> 2012/5/3 Sanne Grinovero <sanne at hibernate.org>
>
>>
>> On May 3, 2012 10:10 AM, "Emmanuel Bernard" <emmanuel at hibernate.org>
>> wrote:
>> >
>> > How comes the DistanceFilter has to compute the distance for the whole
>> corpus?
>>
>> You're right in that's not always the case, but it's possible. If there
>> are more filters enabled and they are executed first, our filter will need
>> to do the math only on the matched documents by the previous filters, but
>> if there are no other constraints or filters our DistanceFilter might need
>> to process all documents in all segments. This happens also when a limit is
>> enabled on the collector - although limited to the current index segment -
>> when the filter needs to be cached as it needs to evaluate each document in
>> the segment.
>>
>> In our case this DistanceFilter is only applied after RangeQuery was
>> applied on both longitude and latitude, so I'm not sure if this is a big
>> problem; personally I was just wondering but I'd be fine in keeping this as
>> a possible future improvement - but if we go for a separate issue, let's
>> keep in mind that that the index format would not be backwards compatible.
>>
>>
>> > By the way the actual storage (say via Hibernate ORM, or Infinispan)
>> does not need to store in radian, so we don't need to do a conversion when
>> reading an entity.
>>
>> Right, another reason to index only in whatever format makes querying
>> more efficient.
>>
>> -- Sanne
>>
>>
>> >
>> > On 3 mai 2012, at 10:45, Sanne Grinovero wrote:
>> >
>> > > The reason for my comment is that the code is doing a conversion to
>> > > radians in the DistanceFilter, which needs to be extremely efficient
>> > > as it's not only applied on the resultset but potentially on the whole
>> > > corpus of all Documents in the index.
>> > > So even if it's true that conversion would be needed on the final
>> > > results, we always expect people to retrieve only a limited amount of
>> > > entities (like with pagination), while the index might need to perform
>> > > this computation millions of times per query.
>> > >
>> > > If I look at the complexity of Point.getDistanceTo(double, double), I
>> > > get a feeling that that method will hardly provide speedy queries
>> > > because of the complex computations in it - this is just speculation
>> > > at this point of course, to be sure we'd need to compare them with a
>> > > large enough dataset, but it seems quite obvious that storing
>> > > normalized radians should be more efficient as it would avoid a good
>> > > deal of math to be executed on each Document in the index.
>> > >
>> > > Also if we assume people might want to use radians in their user data
>> > > (I know some who definitely would never touch decimals for such a use
>> > > case), there would be no need at all to convert the end result.
>> > >
>> > > Some more thoughts inline:
>> > >
>> > > On 3 May 2012 09:12, Nicolas Helleringer <
>> nicolas.helleringer at gmail.com> wrote:
>> > >> Hi all,
>> > >>
>> > >> Sanne and I have been wondering about the way the spatial
>> > >> branch/module/functionality for Hibernate Search shall store its
>> > >> coordinates in the Lucene index.
>> > >>
>> > >> Today it is implemented with decimal degree for :
>> > >> - easy debugging/readability
>> > >> - ease of conversion on storage as we want to accept mainly decimal
>> degree
>> > >> from users data
>> > >
>> > > Valid points, but consider that "storage" is going to be way slower
>> > > anyway, and typically you'll process a Document to evaluate it for a
>> > > hit many many orders of magnitude more frequently than the times you
>> > > store it.
>> > >
>> > >>
>> > >> Sanne pointed out that when the search is done there is quite a few
>> > >> conversion to radians for distance calculation and suggested that we
>> may
>> > >> store directly coordinates under their radians form.
>> > >>
>> > >> I have tried a patch to implement this and as I was coding it I feel
>> that
>> > >> the code was less readable, in the coordinates normalisation mainly
>> and
>> > >> that there was as many conversion as before.
>> > >> Conversions had moved from search to import / export of coordinates
>> in and
>> > >> out the spatial module scope to user scope.
>> > >
>> > > I'm sure the amount of points in the code in which they are converted
>> > > won't change. I'm concerned about the cardinality of the collections
>> > > on which it's applied ;)
>> > > "Less readable" isn't nice, but we can work on that I guess?
>> > >
>> > >>
>> > >> What the docs does not tell (yet), is that we are waiting for WGS 84
>> (this
>> > >> is a coordinate system) decimal degree coordinates input, as these
>> are
>> > >> quite a de facto standard (GPS output this way).
>> > >
>> > > How does it affect this?
>> > >
>> > >>
>> > >> Today this is not the purpose of Hibernate Search spatial initiative
>> to
>> > >> handle projections. There are opensource libs to handle that on user
>> side
>> > >> very well (Proj4j)
>> > >>
>> > >> So. The question is : shall we store as radians or decimal degree ?
>> > >>
>> > >> Niko
>> > >>
>> > >> P.S : Hope it is clear. If not ask for more.
>> > >
>> > > Thanks!
>> > > Sanne
>> > > _______________________________________________
>> > > hibernate-dev mailing list
>> > > hibernate-dev at lists.jboss.org
>> > > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> >
>>
>
>
>


More information about the hibernate-dev mailing list