[hibernate-dev] [Search] proposing an alternative to depth in @IndexedEmbedded

Zach Kurey pushedbytime at gmail.com
Wed Aug 24 15:28:24 EDT 2011


On Aug 24, 2011, at 8:26 AM, Sanne Grinovero wrote:

> This complicates things. First of all it means that the "subPaths"
> property should now be named "includeSubPaths" instead, as opposing to
> "excludeSubPaths". 

Yes, if 'excludeSubPaths' is provided, then 'subPaths' should be renamed to 'includeSubPaths', for cleanliness/symmetry sake.  

> Also with such names I would expect the additional
> paths to work *in addition to* normal depth.

I think wasn't exact enough.  I would expect 'includeSubPaths' to be incompatible with both 'depth' and 'excludeSubPaths'.  However, I would expect 'depth' and 'excludeSubPaths' to be compatible.  Which basically says to index using the default approach, and only stop at max depth, but exclude indexing of the paths specified.

given:
class C{
    @IndexEmbedded
    private Collection<D> d;
    @Field
    private int foo;
}
Illegal configuration: can't specify depth and includeSubPaths simultaneously:
class A{
    @IndexEmbedded(
        includeSubPaths={"d.one", "d.two"}, depth=5 
     )
    private C see;
}
Illegal configuration: specifying includeSubPaths and excludeSubPaths is nonsense, since absence of specifying in includeSubPaths means the path won't be indexed anyway:
class A{
    @IndexEmbedded(
        includeSubPaths={"d.one", "d.two"}, excludeSubPaths {"d.three"}  
     )
    private C see;
}
Valid configuration:  Excludes indexing of d.  Maybe D leads to cycles, or expensive nested joins, and it isn't used when searching index A, so we want to exclude it.  
class A{
    @IndexEmbedded( depth=5, excludeSubPaths {"d"}  )
    private C see;
}
Also what validly constitutes a path is different for excludeSubPaths.  Anywhere in a 'path' can be a termination point where the user can express that they don't want indexing to go down that path any further; and that could potentially go down to a leaf.  While 'includeSubPaths' must be composed of leaf nodes.
> So to implement your original suggestion we should have thought of a
> mapping algorithm which would use either the _depth_ approach or the
> _subPaths_ approach, but you say that in practice you would apply them
> both?
> In this case if I wanted to use the subPaths strategy only I should
> use depth=0 and then add what I want to add? Just checking if we're on
> the same page.

No, that wasn't what I meant.  I'd expect the annotation processing to basically look like:

IndexEmbedded embeddedConfig = (IndexEmbedded) node.getAnnotation(IndexEmbedded.class);

if(embeddedConfig.includeSubPaths() != null 
&& embeddedConfig.depth() != null || embeddedConfig.excludeSubPaths() != null){
  throw new IllegalArgumentException("Invalid configuration:  Cannot specify includeSubPaths and depth(nor excludeSubPaths), simultaneously");
}

Hopefully it would be understood that if only includeSubPaths is provided, then the default depth is irrelevant and is explicitly expressed per path.

> 
> Do you have a great example to support the more complex option? We
> have to start somewhere, but the property names should be final and
> the meaning should not change in future if we then want to add the
> exclusions in future.

I think the complex option you thought I was implying was a mixed bag approach.  Which I'm not advocating for.  My only purpose for suggesting the 'exclude' option is that if I have 100 properties I want to index for a particular entity, then listing 100 properties explicitly in 'includeSubPaths' could be laborious(and some might think messy).  Those 100 properties could be directly on the entity, or they could be through associated entities.  However, because of my desire to have those 100 properties, because of 'depth' I might end up with 1000 values indexed(mostly waste and potentially costly).  In that case maybe those 900 other values come from a particular unused path, or a path I can prune a bit through via 'excludeSubPaths'.

Overall I think the options of:  default approach, default + excludesSubPaths, or includeSubPaths(but no default depth or excludes), gives users 3 good options for how they want to go about indexing, and they can choose the least painful approach for their particular use case.  Most cases are going to be simple and for a particular entity only a very limited subset of properties are needed for search, and I'd probably go with 'includeSubPaths' most of the time in our particular object model.

Hope that clarifies things?

Zach






More information about the hibernate-dev mailing list