Memory consumption
by Andrej Golovnin
Hi all,
I have today deployed our application for the first time
on JBoss 7 with the latest Hibernate and Hibernate Search versions.
And I was totally shocked when I saw the memory consumption.
I have created two screen shots from the memory dumps. The screen shots
are attached to this mail. Please take a look at them. The screen shot
names contains version numbers of the used Hibernate and Hibernate Search
frameworks. The memory dumps were created for one and the same
application version. I can't believe that the latest versions require nearly
twice as much memory as the old versions to perform the same tasks.
What's happen?
Is this a known problem? Do you have any plans to change this situation,
e.g. to work on reducing memory consumption? Or maybe you already
started to work on this problem?
We have been running our application with 768MB for max heap size and
serving hundreds of users. And now using the same memory settings I can
just deploy the application. :-(
Best regards
Andrej Golovnin
12 years, 7 months
[OGM] Neo4J support (was: Transaction-aware)
by Pawel Kozlowski
Hi!
I was looking into Hibernate OGM for few weeks already and I really
like the project: I think it might help people familiar with JPA to
dive into NoSQL.
Personally I was exploring OGM as a JPA interface to store objects in
git (JGit) but I'm still waiting for permission from my work to push
the code, so can't publish it yet :-(
Anyway, I'm also looking into Neo4J for other use-case and OGM could
be quite a nice fit for Neo4J. I've mentioned it on the list already:
On Mon, Apr 2, 2012 at 5:14 PM, Sanne Grinovero <sanne(a)hibernate.org> wrote:
> Ok JTA should be easy with Neo4J. I'm not aware of anybody already
> working on it, feel free to take the lead on this!
and it looks like no-one else is working on this topic so I would be
happy to give it a try. How one would go about it in practice?
I guess I should start with opening a New Feature JIRA entry, forking
the OGM repo on github, start hacking around and discussing things on
this list, but if there are any particular practicalities before
starting would be glad to know.
Cheers,
Pawel
12 years, 7 months
Re: [hibernate-dev] Memory consumption
by Steve Ebersole
And I am certainly fine with allowing this to be a setting between the 3
approaches:
1) Current approach using multiple loaders
2) Single loader approach using nulls to "fill out" slots
3) Single loader approach re-using keys to "fill out" slots
On 05/17/2012 10:42 AM, Steve Ebersole wrote:
> Besides, how is the plan any different for:
>
> ((id = null) OR (id = null) OR (id = null) OR (id = 4) OR (id = 5))
>
>
> On Thu 17 May 2012 10:40:57 AM CDT, Steve Ebersole wrote:
>> Its an index scan, totally trivial (unless their index hashing algos
>> are awful as well).
>>
>>
>> On Thu 17 May 2012 08:49:06 AM CDT, Guillaume Smet wrote:
>>> On Thu, May 17, 2012 at 3:02 PM, Steve Ebersole<steve(a)hibernate.org>
>>> wrote:
>>>> But just because you have 100 values in no way indicates how many
>>>> rows will
>>>> be returned. And I personally know of no optimizers that make such an
>>>> assumption; its a totally worthless assumption.
>>>
>>> That's why I put "simplified" in front of it.
>>>
>>> The number of clauses is taken into account to calculate the
>>> estimation of the number of rows returned and depending on the plan
>>> chosen to get the results, this might lead to results you don't expect
>>> and don't want.
>>>
>>> See the below example on PostgreSQL (which is far from having the
>>> worst optimizer/planner of the market):
>>> sitra=# explain select id from objettouristique where id=4 or id=4 or
>>> id=4 or id=4 or id=5;
>>> QUERY PLAN
>>> ------------------------------------------------------------------------------------------
>>>
>>>
>>> Bitmap Heap Scan on objettouristique (cost=1.80..2.41 rows=5 width=8)
>>> Recheck Cond: ((id = 4) OR (id = 4) OR (id = 4) OR (id = 4) OR (id =
>>> 5))
>>> -> BitmapOr (cost=1.80..1.80 rows=5 width=0)
>>> -> Bitmap Index Scan on objettouristique_pkey
>>> (cost=0.00..0.36 rows=1 width=0)
>>> Index Cond: (id = 4)
>>> -> Bitmap Index Scan on objettouristique_pkey
>>> (cost=0.00..0.36 rows=1 width=0)
>>> Index Cond: (id = 4)
>>> -> Bitmap Index Scan on objettouristique_pkey
>>> (cost=0.00..0.36 rows=1 width=0)
>>> Index Cond: (id = 4)
>>> -> Bitmap Index Scan on objettouristique_pkey
>>> (cost=0.00..0.36 rows=1 width=0)
>>> Index Cond: (id = 4)
>>> -> Bitmap Index Scan on objettouristique_pkey
>>> (cost=0.00..0.36 rows=1 width=0)
>>> Index Cond: (id = 5)
>>>
>>> It sure looks stupid but the opinion of the PostgreSQL hackers is that
>>> it's stupid to add several times the same clause. The cost of checking
>>> all the conditions can be high and is wasted most of the time.
>>>
>>
>> --
>> steve(a)hibernate.org
>> http://hibernate.org
>
> --
> steve(a)hibernate.org
> http://hibernate.org
12 years, 7 months
Re: [hibernate-dev] Memory consumption
by Guillaume Smet
Hi Steve,
On the list again, my bad...
On Thu, May 17, 2012 at 1:07 PM, Steve Ebersole <steve(a)hibernate.org> wrote:
> I am not following what you are saying about "reusing keys" and optimizers.
> Nor how it might lead to bad execution plans. Can you elaborate on your
> thoughts here?
If you have a lot of (OR primaryKey=value) clauses with the same
value, if the optimizer doesn't deduplicate the clauses, the
statistics used by the planner to build the query plan might be wrong
and lead to bad query plans.
At least some of the optimizers out there don't remove identical
values as it's a lot of cycles usually wasted for nothing as nobody
writes query like that.
Simplified example: if you have 100 (OR primaryKey=1) clauses, the
planner might consider that you'll have 100 values returned which is
plain wrong in this case. It's not important if the query is simple
and you don't have a lot of joins but if you're in the case of a
complex object hierarchy with a lot of left joins, it might lead to
really bad execution plans.
--
Guillaume
12 years, 7 months
JPA merge question about detached entity with lazy fields that haven't been fetched...
by Scott Marlow
According to JPA 2.0 specification section 3.2.7.1, when merging a
detached entity with lazy associations that haven't been fetched:
"
The persistence provider must not merge fields marked LAZY that have not
been fetched: it must ignore such fields when merging.
"
After merging an entity, will accessing one of the entities LAZY fields
(that was previously not fetched) result in a lazy initialization error
or will we try to fetch from the database?
Scott
12 years, 7 months
Re: [hibernate-dev] Coordinates storage in Lucene index for spatial functionality
by Sanne Grinovero
Hi Nicolas, thanks looking better.
Could you now change it for longer runs? If I think of 2K invocations,
each taking 5ms, that's not more than 10 seconds.. It took me several
minutes to load all the data needed for the test, then just some
seconds to run the tests.. not worth the load time ;)
To consider aspects such as performance loss due to garbage generation
you'd need a steady run for 45 minutes at least, and take a look to
the results in the average of the second half of the test run. (and
not using System.out every 4 milliseconds)
On a side note, what do you need System.exit(0); for ? You should
close the SessionFactory.
Cheers,
Sanne
On 15 May 2012 14:04, Nicolas Helleringer <nicolas.helleringer(a)gmail.com> wrote:
> I did the seed on the random generator.
>
> Here are some results:
>
> Degrees 2K calls
> Mean time with Grid : 4.769457488717949 ms. Average number of docs fetched
> : 2524.982564102564
> Mean time with Grid + Distance filter : 6.501712946153845 ms. Average number
> of docs fetched : 426.1876923076923
> Mean time with DoubleRange : 14.336663392307692 ms. Average number of docs
> fetched : 543.6035897435897
> Mean time with DoubleRange + Distance filter : 19.7123163574359 ms. Average
> number of docs fetched : 426.1876923076923
>
> Radians 2K calls
> Mean time with Grid : 4.430686068205128 ms. Average number of docs fetched
> : 2524.982564102564
> Mean time with Grid + Distance filter : 6.717519717948718 ms. Average number
> of docs fetched : 426.1876923076923
> Mean time with DoubleRange : 14.35186034 ms. Average number of docs fetched
> : 543.6035897435897
> Mean time with DoubleRange + Distance filter : 20.073972284102563 ms.
> Average number of docs fetched : 426.1876923076923
>
> Radians 50k calls
> Mean time with Grid : 4.440979528643216 ms. Average number of docs fetched
> : 2459.169386934673
> Mean time with Grid + Distance filter : 6.722681398331658 ms. Average number
> of docs fetched : 416.2335879396985
> Mean time with DoubleRange : 14.532376860201005 ms. Average number of docs
> fetched : 530.2923618090452
> Mean time with DoubleRange + Distance filter : 20.21980649284422 ms. Average
> number of docs fetched : 416.2335879396985
>
> On the random part you can see by looking at the average umber of docs on
> the 2k calls that the seed did its works, the requests are the same.
>
> As you can see there is not such a difference between 2k and 50k calls runs.
>
> What I have investigated too is the overhead of the distance filter over the
> double range approach. I do fear that the wrapping
> of the lat,long range query in a QueryWrapperFilter is costly but i cannnot
> prove it, yet.
>
> Back to the main question : does radian storage gives better performance ? I
> cannot say with my test env. It seems pretty close to me.
> Maybe if someone manages to launch the bench on a different environnement.
>
> Niko
>
> PS : both branches are up to date in my github
> : https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923 & https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923-RADIANS
>
> 2012/5/14 Nicolas Helleringer <nicolas.helleringer(a)gmail.com>
>>>
>>> maybe even simpler set a constant as the seed of your random
>>> generator: should provide a reproducible sequence of values.
>>
>> /facepalm
>> I should have guess that :s
>>
>> Niko
>>
>>>
>>> >>
>>> >> On 11 May 2012 08:40, Nicolas Helleringer
>>> >> <nicolas.helleringer(a)gmail.com>
>>> >> wrote:
>>> >> > There, back and again ...
>>> >> >
>>> >> > After fixing a bug in grid search here are some updated results on
>>> >> > 2k
>>> >> > calls
>>> >> >
>>> >> > Degrees :
>>> >> > Mean time with Grid : 4.4897266425641025 ms. Average number of docs
>>> >> > fetched
>>> >> > : 2506.96
>>> >> > Mean time with Grid + Distance filter : 6.4930799487179485 ms.
>>> >> > Average
>>> >> > number of docs fetched : 425.33435897435896
>>> >> > Mean time with DoubleRange : 14.430638703076923 ms. Average number
>>> >> > of
>>> >> > docs
>>> >> > fetched : 542.0410256410256
>>> >> > Mean time with DoubleRange + Distance filter : 20.483300545128206
>>> >> > ms.
>>> >> > Average number of docs fetched : 425.33435897435896
>>> >> >
>>> >> > Radians :
>>> >> > Mean time with Grid : 5.650845744102564 ms. Average number of docs
>>> >> > fetched
>>> >> > : 5074.830769230769
>>> >> > Mean time with Grid + Distance filter : 8.627138825128204 ms.
>>> >> > Average
>>> >> > number
>>> >> > of docs fetched : 426.7902564102564
>>> >> > Mean time with DoubleRange : 15.337755502564102 ms. Average number
>>> >> > of
>>> >> > docs
>>> >> > fetched : 1087.705641025641
>>> >> > Mean time with DoubleRange + Distance filter : 20.82852138769231 ms.
>>> >> > Average
>>> >> > number of docs fetched : 426.7902564102564
>>> >> >
>>> >> > Next thing I do not explain yet is the distance filter overhead
>>> >> > mismatch
>>> >> > :
>>> >> > It is less on grid search with more docs to test than on
>>> >> > DoubleRange.
>>> >> >
>>> >> > Niko
>>> >> >
>>> >> >
>>> >> > 2012/5/7 Nicolas Helleringer <nicolas.helleringer(a)gmail.com>
>>> >> >>
>>> >> >> Here are some results :
>>> >> >>
>>> >> >> Mean time with Grid : 4.9297471630769225 ms. Average number of docs
>>> >> >> fetched : 2416.373846153846
>>> >> >> Mean time with Grid + Distance filter : 6.48634534 ms. Average
>>> >> >> number
>>> >> >> of
>>> >> >> docs fetched : 425.84
>>> >> >> Mean time with DoubleRange : 15.39593650051282 ms. Average number
>>> >> >> of
>>> >> >> docs
>>> >> >> fetched : 542.72
>>> >> >> Mean time with DoubleRange + Distance filter : 21.158394677435897
>>> >> >> ms.
>>> >> >> Average number of docs fetched : 425.8779487179487
>>> >> >>
>>> >> >> Sounds weird that with distance filter the two results are note the
>>> >> >> same.
>>> >> >> I shall investigate that.
>>> >> >>
>>> >> >> Niko
>>> >> >>
>>> >> >> 2012/5/7 Emmanuel Bernard <emmanuel(a)hibernate.org>
>>> >> >>>
>>> >> >>> Do you know the average amount of POI that were filtered in memory
>>> >> >>> but
>>> >> >>> the DistanceFilter during these runs?
>>> >> >>>
>>> >> >>> Emmanuel
>>> >> >>>
>>> >> >>> On 7 mai 2012, at 10:31, Nicolas Helleringer wrote:
>>> >> >>>
>>> >> >>> Hi all,
>>> >> >>>
>>> >> >>> I have done a radian patch/branch and some benchmarks on geonames
>>> >> >>> french
>>> >> >>> database.
>>> >> >>>
>>> >> >>> Benchs are on 2k calls each run.
>>> >> >>>
>>> >> >>> Radians:
>>> >> >>> run 1
>>> >> >>> Mean time with Grid : 4.808043092820513 ms
>>> >> >>> Mean time with Grid + Distance filter : 6.571108878461538 ms
>>> >> >>> Mean time with DoubleRange : 14.62661525128205 ms
>>> >> >>> Mean time with DoubleRange + Distance filter : 20.143597923076925
>>> >> >>> ms
>>> >> >>>
>>> >> >>> run 2
>>> >> >>> Mean time with Grid : 5.290368523076923 ms
>>> >> >>> Mean time with Grid + Distance filter : 6.706567517435897 ms
>>> >> >>> Mean time with DoubleRange : 14.878960702564102 ms
>>> >> >>> Mean time with DoubleRange + Distance filter : 20.75806591948718
>>> >> >>> ms
>>> >> >>>
>>> >> >>> Degrees:
>>> >> >>> run 1
>>> >> >>> Mean time with Grid : 5.101956610769231 ms
>>> >> >>> Mean time with Grid + Distance filter : 6.548685109230769 ms
>>> >> >>> Mean time with DoubleRange : 14.767478146153845 ms
>>> >> >>> Mean time with DoubleRange + Distance filter : 20.668063972820512
>>> >> >>> ms
>>> >> >>>
>>> >> >>> run 2
>>> >> >>> Mean time with Grid : 4.683360031282051 ms
>>> >> >>> Mean time with Grid + Distance filter : 6.7065247435897435 ms
>>> >> >>> Mean time with DoubleRange : 14.617140157948716 ms
>>> >> >>> Mean time with DoubleRange + Distance filter : 20.074868595897435
>>> >> >>> ms
>>> >> >>>
>>> >> >>> The radian branch is here for review
>>> >> >>>
>>> >> >>>
>>> >> >>> : https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923-RADIANS
>>> >> >>>
>>> >> >>> While moving from degrees to radians I have seen that DSL has
>>> >> >>> still
>>> >> >>> some
>>> >> >>> work to do.
>>> >> >>> I shall focus on that now.
>>> >> >>>
>>> >> >>> Niko
>>> >> >>>
>>> >> >>> 2012/5/3 Sanne Grinovero <sanne(a)hibernate.org>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> On May 3, 2012 10:10 AM, "Emmanuel Bernard"
>>> >> >>>> <emmanuel(a)hibernate.org>
>>> >> >>>> wrote:
>>> >> >>>> >
>>> >> >>>> > How comes the DistanceFilter has to compute the distance for
>>> >> >>>> > the
>>> >> >>>> > whole
>>> >> >>>> > corpus?
>>> >> >>>>
>>> >> >>>> You're right in that's not always the case, but it's possible. If
>>> >> >>>> there
>>> >> >>>> are more filters enabled and they are executed first, our filter
>>> >> >>>> will
>>> >> >>>> need
>>> >> >>>> to do the math only on the matched documents by the previous
>>> >> >>>> filters,
>>> >> >>>> but if
>>> >> >>>> there are no other constraints or filters our DistanceFilter
>>> >> >>>> might
>>> >> >>>> need to
>>> >> >>>> process all documents in all segments. This happens also when a
>>> >> >>>> limit
>>> >> >>>> is
>>> >> >>>> enabled on the collector - although limited to the current index
>>> >> >>>> segment -
>>> >> >>>> when the filter needs to be cached as it needs to evaluate each
>>> >> >>>> document in
>>> >> >>>> the segment.
>>> >> >>>>
>>> >> >>>> In our case this DistanceFilter is only applied after RangeQuery
>>> >> >>>> was
>>> >> >>>> applied on both longitude and latitude, so I'm not sure if this
>>> >> >>>> is a
>>> >> >>>> big
>>> >> >>>> problem; personally I was just wondering but I'd be fine in
>>> >> >>>> keeping
>>> >> >>>> this as
>>> >> >>>> a possible future improvement - but if we go for a separate
>>> >> >>>> issue,
>>> >> >>>> let's
>>> >> >>>> keep in mind that that the index format would not be backwards
>>> >> >>>> compatible.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> > By the way the actual storage (say via Hibernate ORM, or
>>> >> >>>> > Infinispan)
>>> >> >>>> > does not need to store in radian, so we don't need to do a
>>> >> >>>> > conversion when
>>> >> >>>> > reading an entity.
>>> >> >>>>
>>> >> >>>> Right, another reason to index only in whatever format makes
>>> >> >>>> querying
>>> >> >>>> more efficient.
>>> >> >>>>
>>> >> >>>> -- Sanne
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> >
>>> >> >>>> > On 3 mai 2012, at 10:45, Sanne Grinovero wrote:
>>> >> >>>> >
>>> >> >>>> > > The reason for my comment is that the code is doing a
>>> >> >>>> > > conversion
>>> >> >>>> > > to
>>> >> >>>> > > radians in the DistanceFilter, which needs to be extremely
>>> >> >>>> > > efficient
>>> >> >>>> > > as it's not only applied on the resultset but potentially on
>>> >> >>>> > > the
>>> >> >>>> > > whole
>>> >> >>>> > > corpus of all Documents in the index.
>>> >> >>>> > > So even if it's true that conversion would be needed on the
>>> >> >>>> > > final
>>> >> >>>> > > results, we always expect people to retrieve only a limited
>>> >> >>>> > > amount
>>> >> >>>> > > of
>>> >> >>>> > > entities (like with pagination), while the index might need
>>> >> >>>> > > to
>>> >> >>>> > > perform
>>> >> >>>> > > this computation millions of times per query.
>>> >> >>>> > >
>>> >> >>>> > > If I look at the complexity of Point.getDistanceTo(double,
>>> >> >>>> > > double),
>>> >> >>>> > > I
>>> >> >>>> > > get a feeling that that method will hardly provide speedy
>>> >> >>>> > > queries
>>> >> >>>> > > because of the complex computations in it - this is just
>>> >> >>>> > > speculation
>>> >> >>>> > > at this point of course, to be sure we'd need to compare them
>>> >> >>>> > > with a
>>> >> >>>> > > large enough dataset, but it seems quite obvious that storing
>>> >> >>>> > > normalized radians should be more efficient as it would avoid
>>> >> >>>> > > a
>>> >> >>>> > > good
>>> >> >>>> > > deal of math to be executed on each Document in the index.
>>> >> >>>> > >
>>> >> >>>> > > Also if we assume people might want to use radians in their
>>> >> >>>> > > user
>>> >> >>>> > > data
>>> >> >>>> > > (I know some who definitely would never touch decimals for
>>> >> >>>> > > such a
>>> >> >>>> > > use
>>> >> >>>> > > case), there would be no need at all to convert the end
>>> >> >>>> > > result.
>>> >> >>>> > >
>>> >> >>>> > > Some more thoughts inline:
>>> >> >>>> > >
>>> >> >>>> > > On 3 May 2012 09:12, Nicolas Helleringer
>>> >> >>>> > > <nicolas.helleringer(a)gmail.com> wrote:
>>> >> >>>> > >> Hi all,
>>> >> >>>> > >>
>>> >> >>>> > >> Sanne and I have been wondering about the way the spatial
>>> >> >>>> > >> branch/module/functionality for Hibernate Search shall store
>>> >> >>>> > >> its
>>> >> >>>> > >> coordinates in the Lucene index.
>>> >> >>>> > >>
>>> >> >>>> > >> Today it is implemented with decimal degree for :
>>> >> >>>> > >> - easy debugging/readability
>>> >> >>>> > >> - ease of conversion on storage as we want to accept mainly
>>> >> >>>> > >> decimal
>>> >> >>>> > >> degree
>>> >> >>>> > >> from users data
>>> >> >>>> > >
>>> >> >>>> > > Valid points, but consider that "storage" is going to be way
>>> >> >>>> > > slower
>>> >> >>>> > > anyway, and typically you'll process a Document to evaluate
>>> >> >>>> > > it
>>> >> >>>> > > for a
>>> >> >>>> > > hit many many orders of magnitude more frequently than the
>>> >> >>>> > > times
>>> >> >>>> > > you
>>> >> >>>> > > store it.
>>> >> >>>> > >
>>> >> >>>> > >>
>>> >> >>>> > >> Sanne pointed out that when the search is done there is
>>> >> >>>> > >> quite a
>>> >> >>>> > >> few
>>> >> >>>> > >> conversion to radians for distance calculation and suggested
>>> >> >>>> > >> that
>>> >> >>>> > >> we may
>>> >> >>>> > >> store directly coordinates under their radians form.
>>> >> >>>> > >>
>>> >> >>>> > >> I have tried a patch to implement this and as I was coding
>>> >> >>>> > >> it I
>>> >> >>>> > >> feel that
>>> >> >>>> > >> the code was less readable, in the coordinates normalisation
>>> >> >>>> > >> mainly
>>> >> >>>> > >> and
>>> >> >>>> > >> that there was as many conversion as before.
>>> >> >>>> > >> Conversions had moved from search to import / export of
>>> >> >>>> > >> coordinates
>>> >> >>>> > >> in and
>>> >> >>>> > >> out the spatial module scope to user scope.
>>> >> >>>> > >
>>> >> >>>> > > I'm sure the amount of points in the code in which they are
>>> >> >>>> > > converted
>>> >> >>>> > > won't change. I'm concerned about the cardinality of the
>>> >> >>>> > > collections
>>> >> >>>> > > on which it's applied ;)
>>> >> >>>> > > "Less readable" isn't nice, but we can work on that I guess?
>>> >> >>>> > >
>>> >> >>>> > >>
>>> >> >>>> > >> What the docs does not tell (yet), is that we are waiting
>>> >> >>>> > >> for
>>> >> >>>> > >> WGS
>>> >> >>>> > >> 84 (this
>>> >> >>>> > >> is a coordinate system) decimal degree coordinates input, as
>>> >> >>>> > >> these
>>> >> >>>> > >> are
>>> >> >>>> > >> quite a de facto standard (GPS output this way).
>>> >> >>>> > >
>>> >> >>>> > > How does it affect this?
>>> >> >>>> > >
>>> >> >>>> > >>
>>> >> >>>> > >> Today this is not the purpose of Hibernate Search spatial
>>> >> >>>> > >> initiative to
>>> >> >>>> > >> handle projections. There are opensource libs to handle that
>>> >> >>>> > >> on
>>> >> >>>> > >> user side
>>> >> >>>> > >> very well (Proj4j)
>>> >> >>>> > >>
>>> >> >>>> > >> So. The question is : shall we store as radians or decimal
>>> >> >>>> > >> degree ?
>>> >> >>>> > >>
>>> >> >>>> > >> Niko
>>> >> >>>> > >>
>>> >> >>>> > >> P.S : Hope it is clear. If not ask for more.
>>> >> >>>> > >
>>> >> >>>> > > Thanks!
>>> >> >>>> > > Sanne
>>> >> >>>> > > _______________________________________________
>>> >> >>>> > > hibernate-dev mailing list
>>> >> >>>> > > hibernate-dev(a)lists.jboss.org
>>> >> >>>> > > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>> >> >>>> >
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>
>>> >> >
>>> >
>>> >
>>
>>
>
12 years, 7 months
Coordinates storage in Lucene index for spatial functionality
by Nicolas Helleringer
Hi all,
Sanne and I have been wondering about the way the spatial
branch/module/functionality for Hibernate Search shall store its
coordinates in the Lucene index.
Today it is implemented with decimal degree for :
- easy debugging/readability
- ease of conversion on storage as we want to accept mainly decimal degree
from users data
Sanne pointed out that when the search is done there is quite a few
conversion to radians for distance calculation and suggested that we may
store directly coordinates under their radians form.
I have tried a patch to implement this and as I was coding it I feel that
the code was less readable, in the coordinates normalisation mainly and
that there was as many conversion as before.
Conversions had moved from search to import / export of coordinates in and
out the spatial module scope to user scope.
What the docs does not tell (yet), is that we are waiting for WGS 84 (this
is a coordinate system) decimal degree coordinates input, as these are
quite a de facto standard (GPS output this way).
Today this is not the purpose of Hibernate Search spatial initiative to
handle projections. There are opensource libs to handle that on user side
very well (Proj4j)
So. The question is : shall we store as radians or decimal degree ?
Niko
P.S : Hope it is clear. If not ask for more.
12 years, 7 months