]
Zach Kurey commented on HSEARCH-886:
------------------------------------
Hey Davide.
Not a problem. I just freed up some time to work on this and was quite happy to get an
email yesterday letting me know it mostly done already. Thanks! = )
I gave your changes a try today and saw a 5x improvement in indexing speed on some
problematic entities. Also load on the system doing the indexing was much lower compared
to the CPU pinning that was happening from traversing cycles, and in general indexing more
than was needed.
The only thing that was missing was to add path support to 'checkForFields(..)',
as opposed to just 'checkForField(..)' in AbstractDocumentBuilder.
I did have an issue where I improperly indexed an entity for a non-existent path and it
would have silently failed. The code just checks if the current path being traversed is
part of the specified paths, but no upfront checking is done to ensure that the paths are
valid in the first place. It would probably eliminate a lot of configuration mistakes and
confusion to add some validation there.
I also saw your question about whether or not depth should be automatically set to 0 if it
isn't specified but paths are. It does seem incorrect to have unlimited depth and
paths together, so I'd lean towards yes.
Provide the ability to configure specific paths to index within
@IndexEmbedded as an alternative to depth
---------------------------------------------------------------------------------------------------------
Key: HSEARCH-886
URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-886
Project: Hibernate Search
Issue Type: New Feature
Components: engine, mapping
Reporter: Zach Kurey
Assignee: Davide D'Alto
Fix For: 4.0
Frequently its desirable to index a particular embedded type differently depending on the
use case of the referencing type that is the primary subject being indexed. Additionally,
depth in general causes many more paths to be included in a document than necessary for a
particular index. This makes tuning of indexing to eliminate problem paths difficult, and
sometimes impossible if a particular object model re-uses a lot of types.
The proposal/improvement has already been discussed more in depth here:
http://www.mail-archive.com/hibernate-dev@lists.jboss.org/msg06548.html, and what follows
reflects some of that discussion.
As an example of how specific paths could be configured for indexing:
@Indexed
class A{
@IndexEmbedded(
depth=0,
@IndexPaths(paths={"d.one", "d.two"})
)
private C see;
}
@Indexed
class B{
@IndexEmbedded(
depth=0,
@IndexPaths(paths={"foo"})
)
private C see;
}
class C{
@IndexEmbedded
private Collection<D> d;
@Field
private int foo;
}
class D{
@Field
int one;
@Field
int two;
}
Index A would contain: d.one, and d.two
Index B would contain: foo, but would NOT contain anything from path 'd'.
Perhaps indexing path 'd' has a performance impact that is desirable to eliminate
for B, but acceptable or necessary for A. This ability would also help to eliminate the
bloat of unnecessary fields in lucene documents; which may not itself be a performance
problem, but leaves a lot of things to rule out when tracking down indexing issues(both
performance or content).
Lastly. To be clear, the above proposal(which really Sanne came up with in the email
thread) does not conflict with depth. Here are some further examples of how depth may
interact with explicit paths:
@IndexEmbedded(depth=3, paths={"a.b.c.d.e"})
Says to index all paths up to depth 3, but additionally index path 'a.b.c.d.e'.
@IndexEmbedded(depth=0, paths={"a.b.c.d.e"})
Says to only index path 'a.b.c.d.e'
@IndexEmbedded( paths={"a.b.c.d.e"})
Default behavior, depth is unlimited, specifying a.b.c.d.e is redundant in this case.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: