[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-886) Provide the ability to configure specific paths to index within @IndexEmbedded as an alternative to depth

Monday, 19 September 2011

    [
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-886?pag...
] 

Zach Kurey commented on HSEARCH-886:
------------------------------------

Hey Davide.  

Not a problem.  I just freed up some time to work on this and was quite happy to get an
email yesterday letting me know it mostly done already.  Thanks!  = )

I gave your changes a try today and saw a 5x improvement in indexing speed on some
problematic entities.  Also load on the system doing the indexing was much lower compared
to the CPU pinning that was happening from traversing cycles, and in general indexing more
than was needed.  

The only thing that was missing was to add path support to 'checkForFields(..)',
as opposed to just 'checkForField(..)' in AbstractDocumentBuilder. 

I did have an issue where I improperly indexed an entity for a non-existent path and it
would have silently failed.  The code just checks if the current path being traversed is
part of the specified paths, but no upfront checking is done to ensure that the paths are
valid in the first place.  It would probably eliminate a lot of configuration mistakes and
confusion to add some validation there. 

I also saw your question about whether or not depth should be automatically set to 0 if it
isn't specified but paths are.  It does seem incorrect to have unlimited depth and
paths together, so I'd lean towards yes.  

...
 Provide the ability to configure specific paths to index within
@IndexEmbedded as an alternative to depth

---------------------------------------------------------------------------------------------------------

                 Key: HSEARCH-886
                 URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-886
             Project: Hibernate Search
          Issue Type: New Feature
          Components: engine, mapping
            Reporter: Zach Kurey
            Assignee: Davide D'Alto
             Fix For: 4.0

 Frequently its desirable to index a particular embedded type differently depending on the
use case of the referencing type that is the primary subject being indexed.  Additionally,
depth in general causes many more paths to be included in a document than necessary for a
particular index.  This makes tuning of indexing to eliminate problem paths difficult, and
sometimes impossible if a particular object model re-uses a lot of types.  
 The proposal/improvement has already been discussed more in depth here: 
http://www.mail-archive.com/hibernate-dev@lists.jboss.org/msg06548.html, and what follows
reflects some of that discussion.  
 As an example of how specific paths could be configured for indexing:
 @Indexed
 class A{
    @IndexEmbedded(
        depth=0,
        @IndexPaths(paths={"d.one", "d.two"})
     )
    private C see;
 }
 @Indexed
 class B{
    @IndexEmbedded(
        depth=0,
        @IndexPaths(paths={"foo"})
    )
    private C see;
 }
 class C{
    @IndexEmbedded
    private Collection<D> d;
    @Field
    private int foo;
 }
 class D{
    @Field
    int one;
    @Field
    int two;
 }
 Index A would contain:  d.one, and d.two
 Index B would contain:  foo, but would NOT contain anything from path 'd'.
 Perhaps indexing path 'd' has a performance impact that is desirable to eliminate
for B, but acceptable or necessary for A.  This ability would also help to eliminate the
bloat of unnecessary fields in lucene documents; which may not itself be a performance
problem, but leaves a lot of things to rule out when tracking down indexing issues(both
performance or content).
 Lastly.  To be clear, the above proposal(which really Sanne came up with in the email
thread) does not conflict with depth.  Here are some further examples of how depth may
interact with explicit paths:
 @IndexEmbedded(depth=3, paths={"a.b.c.d.e"})
 Says to index all paths up to depth 3, but additionally index path 'a.b.c.d.e'.
 @IndexEmbedded(depth=0, paths={"a.b.c.d.e"})
 Says to only index path 'a.b.c.d.e'
 @IndexEmbedded( paths={"a.b.c.d.e"})
 Default behavior, depth is unlimited, specifying a.b.c.d.e is redundant in this case.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-886) Provide the ability to configure specific paths to index within @IndexEmbedded as an alternative to depth