Hi all,
I have done a radian patch/branch and some benchmarks on geonames french
database.
Benchs are on 2k calls each run.
Radians:
run 1
Mean time with Grid : 4.808043092820513 ms
Mean time with Grid + Distance filter : 6.571108878461538 ms
Mean time with DoubleRange : 14.62661525128205 ms
Mean time with DoubleRange + Distance filter : 20.143597923076925 ms
run 2
Mean time with Grid : 5.290368523076923 ms
Mean time with Grid + Distance filter : 6.706567517435897 ms
Mean time with DoubleRange : 14.878960702564102 ms
Mean time with DoubleRange + Distance filter : 20.75806591948718 ms
Degrees:
run 1
Mean time with Grid : 5.101956610769231 ms
Mean time with Grid + Distance filter : 6.548685109230769 ms
Mean time with DoubleRange : 14.767478146153845 ms
Mean time with DoubleRange + Distance filter : 20.668063972820512 ms
run 2
Mean time with Grid : 4.683360031282051 ms
Mean time with Grid + Distance filter : 6.7065247435897435 ms
Mean time with DoubleRange : 14.617140157948716 ms
Mean time with DoubleRange + Distance filter : 20.074868595897435 ms
The radian branch is here for review :
While moving from degrees to radians I have seen that DSL has still some
work to do.
I shall focus on that now.
Niko
2012/5/3 Sanne Grinovero <sanne(a)hibernate.org>
On May 3, 2012 10:10 AM, "Emmanuel Bernard" <emmanuel(a)hibernate.org>
wrote:
>
> How comes the DistanceFilter has to compute the distance for the whole
corpus?
You're right in that's not always the case, but it's possible. If there
are more filters enabled and they are executed first, our filter will need
to do the math only on the matched documents by the previous filters, but
if there are no other constraints or filters our DistanceFilter might need
to process all documents in all segments. This happens also when a limit is
enabled on the collector - although limited to the current index segment -
when the filter needs to be cached as it needs to evaluate each document in
the segment.
In our case this DistanceFilter is only applied after RangeQuery was
applied on both longitude and latitude, so I'm not sure if this is a big
problem; personally I was just wondering but I'd be fine in keeping this as
a possible future improvement - but if we go for a separate issue, let's
keep in mind that that the index format would not be backwards compatible.
> By the way the actual storage (say via Hibernate ORM, or Infinispan)
does not need to store in radian, so we don't need to do a conversion when
reading an entity.
Right, another reason to index only in whatever format makes querying more
efficient.
-- Sanne
>
> On 3 mai 2012, at 10:45, Sanne Grinovero wrote:
>
> > The reason for my comment is that the code is doing a conversion to
> > radians in the DistanceFilter, which needs to be extremely efficient
> > as it's not only applied on the resultset but potentially on the whole
> > corpus of all Documents in the index.
> > So even if it's true that conversion would be needed on the final
> > results, we always expect people to retrieve only a limited amount of
> > entities (like with pagination), while the index might need to perform
> > this computation millions of times per query.
> >
> > If I look at the complexity of Point.getDistanceTo(double, double), I
> > get a feeling that that method will hardly provide speedy queries
> > because of the complex computations in it - this is just speculation
> > at this point of course, to be sure we'd need to compare them with a
> > large enough dataset, but it seems quite obvious that storing
> > normalized radians should be more efficient as it would avoid a good
> > deal of math to be executed on each Document in the index.
> >
> > Also if we assume people might want to use radians in their user data
> > (I know some who definitely would never touch decimals for such a use
> > case), there would be no need at all to convert the end result.
> >
> > Some more thoughts inline:
> >
> > On 3 May 2012 09:12, Nicolas Helleringer <
nicolas.helleringer(a)gmail.com> wrote:
> >> Hi all,
> >>
> >> Sanne and I have been wondering about the way the spatial
> >> branch/module/functionality for Hibernate Search shall store its
> >> coordinates in the Lucene index.
> >>
> >> Today it is implemented with decimal degree for :
> >> - easy debugging/readability
> >> - ease of conversion on storage as we want to accept mainly decimal
degree
> >> from users data
> >
> > Valid points, but consider that "storage" is going to be way slower
> > anyway, and typically you'll process a Document to evaluate it for a
> > hit many many orders of magnitude more frequently than the times you
> > store it.
> >
> >>
> >> Sanne pointed out that when the search is done there is quite a few
> >> conversion to radians for distance calculation and suggested that we
may
> >> store directly coordinates under their radians form.
> >>
> >> I have tried a patch to implement this and as I was coding it I feel
that
> >> the code was less readable, in the coordinates normalisation mainly
and
> >> that there was as many conversion as before.
> >> Conversions had moved from search to import / export of coordinates
in and
> >> out the spatial module scope to user scope.
> >
> > I'm sure the amount of points in the code in which they are converted
> > won't change. I'm concerned about the cardinality of the collections
> > on which it's applied ;)
> > "Less readable" isn't nice, but we can work on that I guess?
> >
> >>
> >> What the docs does not tell (yet), is that we are waiting for WGS 84
(this
> >> is a coordinate system) decimal degree coordinates input, as these are
> >> quite a de facto standard (GPS output this way).
> >
> > How does it affect this?
> >
> >>
> >> Today this is not the purpose of Hibernate Search spatial initiative
to
> >> handle projections. There are opensource libs to handle that on user
side
> >> very well (Proj4j)
> >>
> >> So. The question is : shall we store as radians or decimal degree ?
> >>
> >> Niko
> >>
> >> P.S : Hope it is clear. If not ask for more.
> >
> > Thanks!
> > Sanne
> > _______________________________________________
> > hibernate-dev mailing list
> > hibernate-dev(a)lists.jboss.org
> >
https://lists.jboss.org/mailman/listinfo/hibernate-dev
>