maybe even simpler set a constant as the seed of your random
generator: should provide a reproducible sequence of values.
>>
>> On 11 May 2012 08:40, Nicolas Helleringer <
nicolas.helleringer(a)gmail.com>
>> wrote:
>> > There, back and again ...
>> >
>> > After fixing a bug in grid search here are some updated results on 2k
>> > calls
>> >
>> > Degrees :
>> > Mean time with Grid : 4.4897266425641025 ms. Average number of docs
>> > fetched
>> > : 2506.96
>> > Mean time with Grid + Distance filter : 6.4930799487179485 ms. Average
>> > number of docs fetched : 425.33435897435896
>> > Mean time with DoubleRange : 14.430638703076923 ms. Average number of
>> > docs
>> > fetched : 542.0410256410256
>> > Mean time with DoubleRange + Distance filter : 20.483300545128206 ms.
>> > Average number of docs fetched : 425.33435897435896
>> >
>> > Radians :
>> > Mean time with Grid : 5.650845744102564 ms. Average number of docs
>> > fetched
>> > : 5074.830769230769
>> > Mean time with Grid + Distance filter : 8.627138825128204 ms. Average
>> > number
>> > of docs fetched : 426.7902564102564
>> > Mean time with DoubleRange : 15.337755502564102 ms. Average number of
>> > docs
>> > fetched : 1087.705641025641
>> > Mean time with DoubleRange + Distance filter : 20.82852138769231 ms.
>> > Average
>> > number of docs fetched : 426.7902564102564
>> >
>> > Next thing I do not explain yet is the distance filter overhead
mismatch
>> > :
>> > It is less on grid search with more docs to test than on DoubleRange.
>> >
>> > Niko
>> >
>> >
>> > 2012/5/7 Nicolas Helleringer <nicolas.helleringer(a)gmail.com>
>> >>
>> >> Here are some results :
>> >>
>> >> Mean time with Grid : 4.9297471630769225 ms. Average number of docs
>> >> fetched : 2416.373846153846
>> >> Mean time with Grid + Distance filter : 6.48634534 ms. Average number
>> >> of
>> >> docs fetched : 425.84
>> >> Mean time with DoubleRange : 15.39593650051282 ms. Average number of
>> >> docs
>> >> fetched : 542.72
>> >> Mean time with DoubleRange + Distance filter : 21.158394677435897 ms.
>> >> Average number of docs fetched : 425.8779487179487
>> >>
>> >> Sounds weird that with distance filter the two results are note the
>> >> same.
>> >> I shall investigate that.
>> >>
>> >> Niko
>> >>
>> >> 2012/5/7 Emmanuel Bernard <emmanuel(a)hibernate.org>
>> >>>
>> >>> Do you know the average amount of POI that were filtered in memory
but
>> >>> the DistanceFilter during these runs?
>> >>>
>> >>> Emmanuel
>> >>>
>> >>> On 7 mai 2012, at 10:31, Nicolas Helleringer wrote:
>> >>>
>> >>> Hi all,
>> >>>
>> >>> I have done a radian patch/branch and some benchmarks on geonames
>> >>> french
>> >>> database.
>> >>>
>> >>> Benchs are on 2k calls each run.
>> >>>
>> >>> Radians:
>> >>> run 1
>> >>> Mean time with Grid : 4.808043092820513 ms
>> >>> Mean time with Grid + Distance filter : 6.571108878461538 ms
>> >>> Mean time with DoubleRange : 14.62661525128205 ms
>> >>> Mean time with DoubleRange + Distance filter : 20.143597923076925
ms
>> >>>
>> >>> run 2
>> >>> Mean time with Grid : 5.290368523076923 ms
>> >>> Mean time with Grid + Distance filter : 6.706567517435897 ms
>> >>> Mean time with DoubleRange : 14.878960702564102 ms
>> >>> Mean time with DoubleRange + Distance filter : 20.75806591948718
ms
>> >>>
>> >>> Degrees:
>> >>> run 1
>> >>> Mean time with Grid : 5.101956610769231 ms
>> >>> Mean time with Grid + Distance filter : 6.548685109230769 ms
>> >>> Mean time with DoubleRange : 14.767478146153845 ms
>> >>> Mean time with DoubleRange + Distance filter : 20.668063972820512
ms
>> >>>
>> >>> run 2
>> >>> Mean time with Grid : 4.683360031282051 ms
>> >>> Mean time with Grid + Distance filter : 6.7065247435897435 ms
>> >>> Mean time with DoubleRange : 14.617140157948716 ms
>> >>> Mean time with DoubleRange + Distance filter : 20.074868595897435
ms
>> >>>
>> >>> The radian branch is here for review
>> >>>
>> >>> :
https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923-R...
>> >>>
>> >>> While moving from degrees to radians I have seen that DSL has
still
>> >>> some
>> >>> work to do.
>> >>> I shall focus on that now.
>> >>>
>> >>> Niko
>> >>>
>> >>> 2012/5/3 Sanne Grinovero <sanne(a)hibernate.org>
>> >>>>
>> >>>>
>> >>>> On May 3, 2012 10:10 AM, "Emmanuel Bernard" <
emmanuel(a)hibernate.org>
>> >>>> wrote:
>> >>>> >
>> >>>> > How comes the DistanceFilter has to compute the distance
for the
>> >>>> > whole
>> >>>> > corpus?
>> >>>>
>> >>>> You're right in that's not always the case, but
it's possible. If
>> >>>> there
>> >>>> are more filters enabled and they are executed first, our
filter
will
>> >>>> need
>> >>>> to do the math only on the matched documents by the previous
filters,
>> >>>> but if
>> >>>> there are no other constraints or filters our DistanceFilter
might
>> >>>> need to
>> >>>> process all documents in all segments. This happens also when
a
limit
>> >>>> is
>> >>>> enabled on the collector - although limited to the current
index
>> >>>> segment -
>> >>>> when the filter needs to be cached as it needs to evaluate
each
>> >>>> document in
>> >>>> the segment.
>> >>>>
>> >>>> In our case this DistanceFilter is only applied after
RangeQuery
was
>> >>>> applied on both longitude and latitude, so I'm not sure if
this is
a
>> >>>> big
>> >>>> problem; personally I was just wondering but I'd be fine in
keeping
>> >>>> this as
>> >>>> a possible future improvement - but if we go for a separate
issue,
>> >>>> let's
>> >>>> keep in mind that that the index format would not be backwards
>> >>>> compatible.
>> >>>>
>> >>>>
>> >>>>
>> >>>> > By the way the actual storage (say via Hibernate ORM, or
>> >>>> > Infinispan)
>> >>>> > does not need to store in radian, so we don't need to
do a
>> >>>> > conversion when
>> >>>> > reading an entity.
>> >>>>
>> >>>> Right, another reason to index only in whatever format makes
querying
>> >>>> more efficient.
>> >>>>
>> >>>> -- Sanne
>> >>>>
>> >>>>
>> >>>> >
>> >>>> > On 3 mai 2012, at 10:45, Sanne Grinovero wrote:
>> >>>> >
>> >>>> > > The reason for my comment is that the code is doing
a
conversion
>> >>>> > > to
>> >>>> > > radians in the DistanceFilter, which needs to be
extremely
>> >>>> > > efficient
>> >>>> > > as it's not only applied on the resultset but
potentially on
the
>> >>>> > > whole
>> >>>> > > corpus of all Documents in the index.
>> >>>> > > So even if it's true that conversion would be
needed on the
final
>> >>>> > > results, we always expect people to retrieve only a
limited
>> >>>> > > amount
>> >>>> > > of
>> >>>> > > entities (like with pagination), while the index
might need to
>> >>>> > > perform
>> >>>> > > this computation millions of times per query.
>> >>>> > >
>> >>>> > > If I look at the complexity of
Point.getDistanceTo(double,
>> >>>> > > double),
>> >>>> > > I
>> >>>> > > get a feeling that that method will hardly provide
speedy
queries
>> >>>> > > because of the complex computations in it - this is
just
>> >>>> > > speculation
>> >>>> > > at this point of course, to be sure we'd need to
compare them
>> >>>> > > with a
>> >>>> > > large enough dataset, but it seems quite obvious that
storing
>> >>>> > > normalized radians should be more efficient as it
would avoid a
>> >>>> > > good
>> >>>> > > deal of math to be executed on each Document in the
index.
>> >>>> > >
>> >>>> > > Also if we assume people might want to use radians in
their
user
>> >>>> > > data
>> >>>> > > (I know some who definitely would never touch
decimals for
such a
>> >>>> > > use
>> >>>> > > case), there would be no need at all to convert the
end result.
>> >>>> > >
>> >>>> > > Some more thoughts inline:
>> >>>> > >
>> >>>> > > On 3 May 2012 09:12, Nicolas Helleringer
>> >>>> > > <nicolas.helleringer(a)gmail.com> wrote:
>> >>>> > >> Hi all,
>> >>>> > >>
>> >>>> > >> Sanne and I have been wondering about the way the
spatial
>> >>>> > >> branch/module/functionality for Hibernate Search
shall store
its
>> >>>> > >> coordinates in the Lucene index.
>> >>>> > >>
>> >>>> > >> Today it is implemented with decimal degree for
:
>> >>>> > >> - easy debugging/readability
>> >>>> > >> - ease of conversion on storage as we want to
accept mainly
>> >>>> > >> decimal
>> >>>> > >> degree
>> >>>> > >> from users data
>> >>>> > >
>> >>>> > > Valid points, but consider that "storage"
is going to be way
>> >>>> > > slower
>> >>>> > > anyway, and typically you'll process a Document
to evaluate it
>> >>>> > > for a
>> >>>> > > hit many many orders of magnitude more frequently
than the
times
>> >>>> > > you
>> >>>> > > store it.
>> >>>> > >
>> >>>> > >>
>> >>>> > >> Sanne pointed out that when the search is done
there is quite
a
>> >>>> > >> few
>> >>>> > >> conversion to radians for distance calculation
and suggested
>> >>>> > >> that
>> >>>> > >> we may
>> >>>> > >> store directly coordinates under their radians
form.
>> >>>> > >>
>> >>>> > >> I have tried a patch to implement this and as I
was coding it
I
>> >>>> > >> feel that
>> >>>> > >> the code was less readable, in the coordinates
normalisation
>> >>>> > >> mainly
>> >>>> > >> and
>> >>>> > >> that there was as many conversion as before.
>> >>>> > >> Conversions had moved from search to import /
export of
>> >>>> > >> coordinates
>> >>>> > >> in and
>> >>>> > >> out the spatial module scope to user scope.
>> >>>> > >
>> >>>> > > I'm sure the amount of points in the code in
which they are
>> >>>> > > converted
>> >>>> > > won't change. I'm concerned about the
cardinality of the
>> >>>> > > collections
>> >>>> > > on which it's applied ;)
>> >>>> > > "Less readable" isn't nice, but we can
work on that I guess?
>> >>>> > >
>> >>>> > >>
>> >>>> > >> What the docs does not tell (yet), is that we are
waiting for
>> >>>> > >> WGS
>> >>>> > >> 84 (this
>> >>>> > >> is a coordinate system) decimal degree
coordinates input, as
>> >>>> > >> these
>> >>>> > >> are
>> >>>> > >> quite a de facto standard (GPS output this way).
>> >>>> > >
>> >>>> > > How does it affect this?
>> >>>> > >
>> >>>> > >>
>> >>>> > >> Today this is not the purpose of Hibernate Search
spatial
>> >>>> > >> initiative to
>> >>>> > >> handle projections. There are opensource libs to
handle that
on
>> >>>> > >> user side
>> >>>> > >> very well (Proj4j)
>> >>>> > >>
>> >>>> > >> So. The question is : shall we store as radians
or decimal
>> >>>> > >> degree ?
>> >>>> > >>
>> >>>> > >> Niko
>> >>>> > >>
>> >>>> > >> P.S : Hope it is clear. If not ask for more.
>> >>>> > >
>> >>>> > > Thanks!
>> >>>> > > Sanne
>> >>>> > > _______________________________________________
>> >>>> > > hibernate-dev mailing list
>> >>>> > > hibernate-dev(a)lists.jboss.org
>> >>>> > >
https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> >>>> >
>> >>>
>> >>>
>> >>>
>> >>
>> >
>
>