On a side note, what do you need System.exit(0); for ? You should
close the SessionFactory.
Because i m better with geo/data than with code =)
Thanks for pointing me the right direction.
The last numbers series is from a 50k calls run in radian mode that lasted
45 minutes.
For each center the bench runs the 4 modes of request ending up in ~45 ms
the loop.
Niko
Cheers,
Sanne
On 15 May 2012 14:04, Nicolas Helleringer <nicolas.helleringer(a)gmail.com>
wrote:
> I did the seed on the random generator.
>
> Here are some results:
>
> Degrees 2K calls
> Mean time with Grid : 4.769457488717949 ms. Average number of docs
fetched
> : 2524.982564102564
> Mean time with Grid + Distance filter : 6.501712946153845 ms. Average
number
> of docs fetched : 426.1876923076923
> Mean time with DoubleRange : 14.336663392307692 ms. Average number of
docs
> fetched : 543.6035897435897
> Mean time with DoubleRange + Distance filter : 19.7123163574359 ms.
Average
> number of docs fetched : 426.1876923076923
>
> Radians 2K calls
> Mean time with Grid : 4.430686068205128 ms. Average number of docs
fetched
> : 2524.982564102564
> Mean time with Grid + Distance filter : 6.717519717948718 ms. Average
number
> of docs fetched : 426.1876923076923
> Mean time with DoubleRange : 14.35186034 ms. Average number of docs
fetched
> : 543.6035897435897
> Mean time with DoubleRange + Distance filter : 20.073972284102563 ms.
> Average number of docs fetched : 426.1876923076923
>
> Radians 50k calls
> Mean time with Grid : 4.440979528643216 ms. Average number of docs
fetched
> : 2459.169386934673
> Mean time with Grid + Distance filter : 6.722681398331658 ms. Average
number
> of docs fetched : 416.2335879396985
> Mean time with DoubleRange : 14.532376860201005 ms. Average number of
docs
> fetched : 530.2923618090452
> Mean time with DoubleRange + Distance filter : 20.21980649284422 ms.
Average
> number of docs fetched : 416.2335879396985
>
> On the random part you can see by looking at the average umber of docs on
> the 2k calls that the seed did its works, the requests are the same.
>
> As you can see there is not such a difference between 2k and 50k calls
runs.
>
> What I have investigated too is the overhead of the distance filter over
the
> double range approach. I do fear that the wrapping
> of the lat,long range query in a QueryWrapperFilter is costly but i
cannnot
> prove it, yet.
>
> Back to the main question : does radian storage gives better performance
? I
> cannot say with my test env. It seems pretty close to me.
> Maybe if someone manages to launch the bench on a different
environnement.
>
> Niko
>
> PS : both branches are up to date in my github
> :
https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923 &
https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923-R...
>
> 2012/5/14 Nicolas Helleringer <nicolas.helleringer(a)gmail.com>
>>>
>>> maybe even simpler set a constant as the seed of your random
>>> generator: should provide a reproducible sequence of values.
>>
>> /facepalm
>> I should have guess that :s
>>
>> Niko
>>
>>>
>>> >>
>>> >> On 11 May 2012 08:40, Nicolas Helleringer
>>> >> <nicolas.helleringer(a)gmail.com>
>>> >> wrote:
>>> >> > There, back and again ...
>>> >> >
>>> >> > After fixing a bug in grid search here are some updated
results on
>>> >> > 2k
>>> >> > calls
>>> >> >
>>> >> > Degrees :
>>> >> > Mean time with Grid : 4.4897266425641025 ms. Average number
of
docs
>>> >> > fetched
>>> >> > : 2506.96
>>> >> > Mean time with Grid + Distance filter : 6.4930799487179485
ms.
>>> >> > Average
>>> >> > number of docs fetched : 425.33435897435896
>>> >> > Mean time with DoubleRange : 14.430638703076923 ms. Average
number
>>> >> > of
>>> >> > docs
>>> >> > fetched : 542.0410256410256
>>> >> > Mean time with DoubleRange + Distance filter :
20.483300545128206
>>> >> > ms.
>>> >> > Average number of docs fetched : 425.33435897435896
>>> >> >
>>> >> > Radians :
>>> >> > Mean time with Grid : 5.650845744102564 ms. Average number of
docs
>>> >> > fetched
>>> >> > : 5074.830769230769
>>> >> > Mean time with Grid + Distance filter : 8.627138825128204 ms.
>>> >> > Average
>>> >> > number
>>> >> > of docs fetched : 426.7902564102564
>>> >> > Mean time with DoubleRange : 15.337755502564102 ms. Average
number
>>> >> > of
>>> >> > docs
>>> >> > fetched : 1087.705641025641
>>> >> > Mean time with DoubleRange + Distance filter :
20.82852138769231
ms.
>>> >> > Average
>>> >> > number of docs fetched : 426.7902564102564
>>> >> >
>>> >> > Next thing I do not explain yet is the distance filter
overhead
>>> >> > mismatch
>>> >> > :
>>> >> > It is less on grid search with more docs to test than on
>>> >> > DoubleRange.
>>> >> >
>>> >> > Niko
>>> >> >
>>> >> >
>>> >> > 2012/5/7 Nicolas Helleringer
<nicolas.helleringer(a)gmail.com>
>>> >> >>
>>> >> >> Here are some results :
>>> >> >>
>>> >> >> Mean time with Grid : 4.9297471630769225 ms. Average
number of
docs
>>> >> >> fetched : 2416.373846153846
>>> >> >> Mean time with Grid + Distance filter : 6.48634534 ms.
Average
>>> >> >> number
>>> >> >> of
>>> >> >> docs fetched : 425.84
>>> >> >> Mean time with DoubleRange : 15.39593650051282 ms. Average
number
>>> >> >> of
>>> >> >> docs
>>> >> >> fetched : 542.72
>>> >> >> Mean time with DoubleRange + Distance filter :
21.158394677435897
>>> >> >> ms.
>>> >> >> Average number of docs fetched : 425.8779487179487
>>> >> >>
>>> >> >> Sounds weird that with distance filter the two results are
note
the
>>> >> >> same.
>>> >> >> I shall investigate that.
>>> >> >>
>>> >> >> Niko
>>> >> >>
>>> >> >> 2012/5/7 Emmanuel Bernard <emmanuel(a)hibernate.org>
>>> >> >>>
>>> >> >>> Do you know the average amount of POI that were
filtered in
memory
>>> >> >>> but
>>> >> >>> the DistanceFilter during these runs?
>>> >> >>>
>>> >> >>> Emmanuel
>>> >> >>>
>>> >> >>> On 7 mai 2012, at 10:31, Nicolas Helleringer wrote:
>>> >> >>>
>>> >> >>> Hi all,
>>> >> >>>
>>> >> >>> I have done a radian patch/branch and some benchmarks
on
geonames
>>> >> >>> french
>>> >> >>> database.
>>> >> >>>
>>> >> >>> Benchs are on 2k calls each run.
>>> >> >>>
>>> >> >>> Radians:
>>> >> >>> run 1
>>> >> >>> Mean time with Grid : 4.808043092820513 ms
>>> >> >>> Mean time with Grid + Distance filter :
6.571108878461538 ms
>>> >> >>> Mean time with DoubleRange : 14.62661525128205 ms
>>> >> >>> Mean time with DoubleRange + Distance filter :
20.143597923076925
>>> >> >>> ms
>>> >> >>>
>>> >> >>> run 2
>>> >> >>> Mean time with Grid : 5.290368523076923 ms
>>> >> >>> Mean time with Grid + Distance filter :
6.706567517435897 ms
>>> >> >>> Mean time with DoubleRange : 14.878960702564102 ms
>>> >> >>> Mean time with DoubleRange + Distance filter :
20.75806591948718
>>> >> >>> ms
>>> >> >>>
>>> >> >>> Degrees:
>>> >> >>> run 1
>>> >> >>> Mean time with Grid : 5.101956610769231 ms
>>> >> >>> Mean time with Grid + Distance filter :
6.548685109230769 ms
>>> >> >>> Mean time with DoubleRange : 14.767478146153845 ms
>>> >> >>> Mean time with DoubleRange + Distance filter :
20.668063972820512
>>> >> >>> ms
>>> >> >>>
>>> >> >>> run 2
>>> >> >>> Mean time with Grid : 4.683360031282051 ms
>>> >> >>> Mean time with Grid + Distance filter :
6.7065247435897435 ms
>>> >> >>> Mean time with DoubleRange : 14.617140157948716 ms
>>> >> >>> Mean time with DoubleRange + Distance filter :
20.074868595897435
>>> >> >>> ms
>>> >> >>>
>>> >> >>> The radian branch is here for review
>>> >> >>>
>>> >> >>>
>>> >> >>> :
https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923-R...
>>> >> >>>
>>> >> >>> While moving from degrees to radians I have seen that
DSL has
>>> >> >>> still
>>> >> >>> some
>>> >> >>> work to do.
>>> >> >>> I shall focus on that now.
>>> >> >>>
>>> >> >>> Niko
>>> >> >>>
>>> >> >>> 2012/5/3 Sanne Grinovero <sanne(a)hibernate.org>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> On May 3, 2012 10:10 AM, "Emmanuel
Bernard"
>>> >> >>>> <emmanuel(a)hibernate.org>
>>> >> >>>> wrote:
>>> >> >>>> >
>>> >> >>>> > How comes the DistanceFilter has to compute
the distance for
>>> >> >>>> > the
>>> >> >>>> > whole
>>> >> >>>> > corpus?
>>> >> >>>>
>>> >> >>>> You're right in that's not always the
case, but it's possible.
If
>>> >> >>>> there
>>> >> >>>> are more filters enabled and they are executed
first, our
filter
>>> >> >>>> will
>>> >> >>>> need
>>> >> >>>> to do the math only on the matched documents by
the previous
>>> >> >>>> filters,
>>> >> >>>> but if
>>> >> >>>> there are no other constraints or filters our
DistanceFilter
>>> >> >>>> might
>>> >> >>>> need to
>>> >> >>>> process all documents in all segments. This
happens also when a
>>> >> >>>> limit
>>> >> >>>> is
>>> >> >>>> enabled on the collector - although limited to the
current
index
>>> >> >>>> segment -
>>> >> >>>> when the filter needs to be cached as it needs to
evaluate each
>>> >> >>>> document in
>>> >> >>>> the segment.
>>> >> >>>>
>>> >> >>>> In our case this DistanceFilter is only applied
after
RangeQuery
>>> >> >>>> was
>>> >> >>>> applied on both longitude and latitude, so I'm
not sure if this
>>> >> >>>> is a
>>> >> >>>> big
>>> >> >>>> problem; personally I was just wondering but
I'd be fine in
>>> >> >>>> keeping
>>> >> >>>> this as
>>> >> >>>> a possible future improvement - but if we go for a
separate
>>> >> >>>> issue,
>>> >> >>>> let's
>>> >> >>>> keep in mind that that the index format would not
be backwards
>>> >> >>>> compatible.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> > By the way the actual storage (say via
Hibernate ORM, or
>>> >> >>>> > Infinispan)
>>> >> >>>> > does not need to store in radian, so we
don't need to do a
>>> >> >>>> > conversion when
>>> >> >>>> > reading an entity.
>>> >> >>>>
>>> >> >>>> Right, another reason to index only in whatever
format makes
>>> >> >>>> querying
>>> >> >>>> more efficient.
>>> >> >>>>
>>> >> >>>> -- Sanne
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> >
>>> >> >>>> > On 3 mai 2012, at 10:45, Sanne Grinovero
wrote:
>>> >> >>>> >
>>> >> >>>> > > The reason for my comment is that the
code is doing a
>>> >> >>>> > > conversion
>>> >> >>>> > > to
>>> >> >>>> > > radians in the DistanceFilter, which
needs to be extremely
>>> >> >>>> > > efficient
>>> >> >>>> > > as it's not only applied on the
resultset but potentially
on
>>> >> >>>> > > the
>>> >> >>>> > > whole
>>> >> >>>> > > corpus of all Documents in the index.
>>> >> >>>> > > So even if it's true that conversion
would be needed on the
>>> >> >>>> > > final
>>> >> >>>> > > results, we always expect people to
retrieve only a limited
>>> >> >>>> > > amount
>>> >> >>>> > > of
>>> >> >>>> > > entities (like with pagination), while
the index might need
>>> >> >>>> > > to
>>> >> >>>> > > perform
>>> >> >>>> > > this computation millions of times per
query.
>>> >> >>>> > >
>>> >> >>>> > > If I look at the complexity of
Point.getDistanceTo(double,
>>> >> >>>> > > double),
>>> >> >>>> > > I
>>> >> >>>> > > get a feeling that that method will
hardly provide speedy
>>> >> >>>> > > queries
>>> >> >>>> > > because of the complex computations in
it - this is just
>>> >> >>>> > > speculation
>>> >> >>>> > > at this point of course, to be sure
we'd need to compare
them
>>> >> >>>> > > with a
>>> >> >>>> > > large enough dataset, but it seems quite
obvious that
storing
>>> >> >>>> > > normalized radians should be more
efficient as it would
avoid
>>> >> >>>> > > a
>>> >> >>>> > > good
>>> >> >>>> > > deal of math to be executed on each
Document in the index.
>>> >> >>>> > >
>>> >> >>>> > > Also if we assume people might want to
use radians in their
>>> >> >>>> > > user
>>> >> >>>> > > data
>>> >> >>>> > > (I know some who definitely would never
touch decimals for
>>> >> >>>> > > such a
>>> >> >>>> > > use
>>> >> >>>> > > case), there would be no need at all to
convert the end
>>> >> >>>> > > result.
>>> >> >>>> > >
>>> >> >>>> > > Some more thoughts inline:
>>> >> >>>> > >
>>> >> >>>> > > On 3 May 2012 09:12, Nicolas
Helleringer
>>> >> >>>> > > <nicolas.helleringer(a)gmail.com>
wrote:
>>> >> >>>> > >> Hi all,
>>> >> >>>> > >>
>>> >> >>>> > >> Sanne and I have been wondering
about the way the spatial
>>> >> >>>> > >> branch/module/functionality for
Hibernate Search shall
store
>>> >> >>>> > >> its
>>> >> >>>> > >> coordinates in the Lucene index.
>>> >> >>>> > >>
>>> >> >>>> > >> Today it is implemented with decimal
degree for :
>>> >> >>>> > >> - easy debugging/readability
>>> >> >>>> > >> - ease of conversion on storage as
we want to accept
mainly
>>> >> >>>> > >> decimal
>>> >> >>>> > >> degree
>>> >> >>>> > >> from users data
>>> >> >>>> > >
>>> >> >>>> > > Valid points, but consider that
"storage" is going to be
way
>>> >> >>>> > > slower
>>> >> >>>> > > anyway, and typically you'll process
a Document to evaluate
>>> >> >>>> > > it
>>> >> >>>> > > for a
>>> >> >>>> > > hit many many orders of magnitude more
frequently than the
>>> >> >>>> > > times
>>> >> >>>> > > you
>>> >> >>>> > > store it.
>>> >> >>>> > >
>>> >> >>>> > >>
>>> >> >>>> > >> Sanne pointed out that when the
search is done there is
>>> >> >>>> > >> quite a
>>> >> >>>> > >> few
>>> >> >>>> > >> conversion to radians for distance
calculation and
suggested
>>> >> >>>> > >> that
>>> >> >>>> > >> we may
>>> >> >>>> > >> store directly coordinates under
their radians form.
>>> >> >>>> > >>
>>> >> >>>> > >> I have tried a patch to implement
this and as I was coding
>>> >> >>>> > >> it I
>>> >> >>>> > >> feel that
>>> >> >>>> > >> the code was less readable, in the
coordinates
normalisation
>>> >> >>>> > >> mainly
>>> >> >>>> > >> and
>>> >> >>>> > >> that there was as many conversion as
before.
>>> >> >>>> > >> Conversions had moved from search to
import / export of
>>> >> >>>> > >> coordinates
>>> >> >>>> > >> in and
>>> >> >>>> > >> out the spatial module scope to user
scope.
>>> >> >>>> > >
>>> >> >>>> > > I'm sure the amount of points in the
code in which they are
>>> >> >>>> > > converted
>>> >> >>>> > > won't change. I'm concerned
about the cardinality of the
>>> >> >>>> > > collections
>>> >> >>>> > > on which it's applied ;)
>>> >> >>>> > > "Less readable" isn't
nice, but we can work on that I
guess?
>>> >> >>>> > >
>>> >> >>>> > >>
>>> >> >>>> > >> What the docs does not tell (yet),
is that we are waiting
>>> >> >>>> > >> for
>>> >> >>>> > >> WGS
>>> >> >>>> > >> 84 (this
>>> >> >>>> > >> is a coordinate system) decimal
degree coordinates input,
as
>>> >> >>>> > >> these
>>> >> >>>> > >> are
>>> >> >>>> > >> quite a de facto standard (GPS
output this way).
>>> >> >>>> > >
>>> >> >>>> > > How does it affect this?
>>> >> >>>> > >
>>> >> >>>> > >>
>>> >> >>>> > >> Today this is not the purpose of
Hibernate Search spatial
>>> >> >>>> > >> initiative to
>>> >> >>>> > >> handle projections. There are
opensource libs to handle
that
>>> >> >>>> > >> on
>>> >> >>>> > >> user side
>>> >> >>>> > >> very well (Proj4j)
>>> >> >>>> > >>
>>> >> >>>> > >> So. The question is : shall we store
as radians or decimal
>>> >> >>>> > >> degree ?
>>> >> >>>> > >>
>>> >> >>>> > >> Niko
>>> >> >>>> > >>
>>> >> >>>> > >> P.S : Hope it is clear. If not ask
for more.
>>> >> >>>> > >
>>> >> >>>> > > Thanks!
>>> >> >>>> > > Sanne
>>> >> >>>> > >
_______________________________________________
>>> >> >>>> > > hibernate-dev mailing list
>>> >> >>>> > > hibernate-dev(a)lists.jboss.org
>>> >> >>>> > >
https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>> >> >>>> >
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>
>>> >> >
>>> >
>>> >
>>
>>
>