[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-886) Provide the ability to configure specific paths to index within @IndexEmbedded as an alternative to depth

Zach Kurey (JIRA) noreply at atlassian.com
Mon Sep 19 22:56:38 EDT 2011


    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=43643#comment-43643 ] 

Zach Kurey commented on HSEARCH-886:
------------------------------------

Hey Davide.  

Not a problem.  I just freed up some time to work on this and was quite happy to get an email yesterday letting me know it mostly done already.  Thanks!  = )

I gave your changes a try today and saw a 5x improvement in indexing speed on some problematic entities.  Also load on the system doing the indexing was much lower compared to the CPU pinning that was happening from traversing cycles, and in general indexing more than was needed.  

The only thing that was missing was to add path support to 'checkForFields(..)', as opposed to just 'checkForField(..)' in AbstractDocumentBuilder. 

I did have an issue where I improperly indexed an entity for a non-existent path and it would have silently failed.  The code just checks if the current path being traversed is part of the specified paths, but no upfront checking is done to ensure that the paths are valid in the first place.  It would probably eliminate a lot of configuration mistakes and confusion to add some validation there. 

I also saw your question about whether or not depth should be automatically set to 0 if it isn't specified but paths are.  It does seem incorrect to have unlimited depth and paths together, so I'd lean towards yes.  


> Provide the ability to configure specific paths to index within @IndexEmbedded as an alternative to depth
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HSEARCH-886
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-886
>             Project: Hibernate Search
>          Issue Type: New Feature
>          Components: engine, mapping
>            Reporter: Zach Kurey
>            Assignee: Davide D'Alto
>             Fix For: 4.0
>
>
> Frequently its desirable to index a particular embedded type differently depending on the use case of the referencing type that is the primary subject being indexed.  Additionally, depth in general causes many more paths to be included in a document than necessary for a particular index.  This makes tuning of indexing to eliminate problem paths difficult, and sometimes impossible if a particular object model re-uses a lot of types.  
> The proposal/improvement has already been discussed more in depth here:  http://www.mail-archive.com/hibernate-dev@lists.jboss.org/msg06548.html, and what follows reflects some of that discussion.  
> As an example of how specific paths could be configured for indexing:
> @Indexed
> class A{
>    @IndexEmbedded(
>        depth=0,
>        @IndexPaths(paths={"d.one", "d.two"})
>     )
>    private C see;
> }
> @Indexed
> class B{
>    @IndexEmbedded(
>        depth=0,
>        @IndexPaths(paths={"foo"})
>    )
>    private C see;
> }
> class C{
>    @IndexEmbedded
>    private Collection<D> d;
>    @Field
>    private int foo;
> }
> class D{
>    @Field
>    int one;
>    @Field
>    int two;
> }
> Index A would contain:  d.one, and d.two
> Index B would contain:  foo, but would NOT contain anything from path 'd'.
> Perhaps indexing path 'd' has a performance impact that is desirable to eliminate for B, but acceptable or necessary for A.  This ability would also help to eliminate the bloat of unnecessary fields in lucene documents; which may not itself be a performance problem, but leaves a lot of things to rule out when tracking down indexing issues(both performance or content).
> Lastly.  To be clear, the above proposal(which really Sanne came up with in the email thread) does not conflict with depth.  Here are some further examples of how depth may interact with explicit paths:
> @IndexEmbedded(depth=3, paths={"a.b.c.d.e"})
> Says to index all paths up to depth 3, but additionally index path 'a.b.c.d.e'.
> @IndexEmbedded(depth=0, paths={"a.b.c.d.e"})
> Says to only index path 'a.b.c.d.e'
> @IndexEmbedded( paths={"a.b.c.d.e"})
> Default behavior, depth is unlimited, specifying a.b.c.d.e is redundant in this case.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the hibernate-issues mailing list