Re: [hibernate-dev] DocumentBuilder refactoring in Hibernate Search: how to deal (internally) with metadata

Friday, 31 May 2013

Hi Hardy,

great proposal for the meta-data API. I've added some comments inline.

--Gunnar

2013/5/30 Hardy Ferentschik <hardy(a)hibernate.org&gt;

...
 Gee, that's an email ;-)
 Before getting too much into it I think it would be useful to talk about
 what I am actually doing.
 I am trying to expose a meta data API for Search which allows users to
 determine which entities are
 indexed and which fields are available for each entity. I am trying to do
 a similar approach to
 Bean Validation where all metadata is exposed via descriptors. The entry
 point into the API is the
 SearchFactory. I am basically thinking about something like this (feedback
 welcome):

 /**
  * Top level descriptor of the metadata API. Giving access to the indexing
 information for a single entity.
  *
  * @author Hardy Ferentschik
  */
 public interface IndexedEntityDescriptor {

I find the name "IndexedEntityDescriptor" in conjunction with isIndexed()
potentially returning "false" a bit irritating. Maybe just
EntityDescriptor? Or SearchableEntityDescriptor?

...
         /**
          * @return Returns {@code true} if the entity for this descriptor
 is indexed, {@code false} otherwise
          */
         boolean isIndexed();

Maybe return an enum if this can potentially be more than a simple yes/no?
I don't know how likely that is, but an enum would allow for evolvement.

...
         /**
          * @return Returns the class boost value, 1 being the default.
          */
         float getClassBoost();

         /**
          * @return Returns the names of the indexes instances of the
 entity are indexed into. Generally this will
          *         be just one index, however, when sharding is applied
 multiple indexes per entity can be used.
          */
         Set<String> getIndexNames();

Would something like Set<IndexDescriptor> getIndexes() make sense?

...
         /**
          * @return Returns a set of {@code FieldDescriptor}s for the
 indexed fields of the entity.
          */
         // TODO does this include the id field descriptor or should that
 be a separate descriptor?

At least for my case I think it would be easier if this contained all field
descriptors so I can handle them uniformly. Maybe FieldDescriptor#isId() or
if there are more id specific things something like this could be added:

    if ( fieldDescriptor.getType = DescriptorType.ID ) {
        fieldDescriptor.as( IdDescriptor.class ).somethingIdSpecific();
    }

...
         // TODO should OBJECT_CLASS be considered?
         Set<FieldDescriptor> getIndexedFields();

Could you also add FieldDescriptor getIndexedField(String fieldName);

...
 }

 /**
  * Metadata related to a single indexed field.
  *
  * @author Hardy Ferentschik
  */
 public interface FieldDescriptor {
         /**
          * Returns the Lucene {@code Document} field name for this indexed
 property.
          *
          * @return Returns the field name for this index property
          */
         String getFieldName();

I'd call it just "getName()", not repeating the type's name.

...

         /**
          * @return Returns an {@code Analyze} enum instance defining the
 type of analyzing applied to
          *         this field.
          */
         Analyze getAnalyzeType();

         /**
          * @return Returns a {@code Store} enum instance defining whether
 the index value is stored in the index itself.
          */
         Store getStoreType();

         /**
          * @return Returns a {@code TermVector} enum instance defining
 whether and how term vectors are stored for this
          *         field
          */
         TermVector getTermVectorType();

         /**
          * @return Returns a {@code Norms} enum instance defining whether
 and how norms are stored for this
          *         field
          */
         Norms getNormType();

         /**
          * @return Returns the boost value for this field. 1 being the
 default value.
          */
         float getBoost();

         /**
          * @return Returns the string used to index {@code null} values.
 {@code null} in case null values are not indexed.
          */
         String nullIndexedAs();

         /**
          * @return Returns the field bridge instance used to convert the
 property value into a string based field value
          */
         FieldBridge getFieldBridge();

         /**
          * @return Returns the analyzer used for this field, {@code null}
 if the field is not analyzed
          */
         Analyzer getAnalyzer();
 }

 On top of this I am planning to add (addressing HSEARCH-903):

 public interface FieldNameReportingBridge {
         Iterable<String> getGeneratedFieldNames(String baseFieldName);
 }

Not better a Set? Returning Iterable makes it harder for users (e.g. no
contains()) and also hides set vs. list semantics.

...
 The latter I need to allow custom bridges to report which fields they
add.
 Most of the information I need to implement all this is in
 AbstractDocumentBuilder.PropertiesMetadata. The plan so far
 was to extract the information from there and while working in this making
 PropertiesMetadata a proper object (instead of the
 parallel arrays thingy). 

+1

...
 Maybe some other minor refactorings along the way. I was not going
to
 touch the processing of annotations
 for now. As discussed that, there we would need yet another level of
 abstraction (similar to EntitySource in ORM or BeanConfiguration
 in HV). Something which can be populated by either annotation processing
 (be it Jandex or reflection) or by the the programmatic API.
 Different story though.

 For what I can tell I don't need a Visitor pattern for what I have planned
 to do so far. If you think I am on the wrong track let me know
 and let me see the light.

 One thing I was wondering about after your email, however, was whether the
 API needs to provide information which field/getter/class
 is responsible for creating a given Lucene Document Field. Do we have a
 use case for that?

 On 29 Jan 2013, at 6:39 PM, Sanne Grinovero <sanne(a)hibernate.org&gt; wrote:

 > We're starting a series of refactorings in Hibernate Search to improve
 > how we handle the entity mapping to the index; to summarize goals:
 >
 > 1# Expose the Metadata as API
 >
 > We need to expose it because:
 > a - OGM needs to be able to read this metadata to produce appropriate
 queries

 @gunnar, does the API above address your needs?

Yes, from what I'm aware of atm. I think so.

...

 >  Personally I think we end up needing this just as an SPI: that might
 > be good for cases {a,b}, and I have an alternative proposal for {c}
 > described below.

 -1 why SPI. I think this is a very general purpose API useful for any
 users.
 For example, you could image to build auto field suggesting query field
 which
 makes suggestions on which fields you can search on (a little like the
 Jira queries).
 In this case you could get the available fields via this API. Just to
 mention one use case.

 >  However we expose it, I think we agree this should be a read-only
 > structure built as a second phase after the model is consumed from
 > (annotations / programmatic API / jandex / auto-generated by OGM).

 +1

 > It
 > would also be good to keep it "minimal" in terms of memory cost, so to
 > either:
 > - drop references to the source structure
 > - not holding on it at all, building the Metadata on demand (!)
 > (Assuming we can build it from a more obscure internal representation
 > I'll describe next).

 Given that I am going to build it from required runtime information it
 could for sure
 be lazily loaded. However, right now I think I will just go for the
 straight forward approach.

 > 3# MutableSearchFactory
 >
 > Let's not forget we also have a MutableSearchFactory to maintain: new
 > entities could be added at any time so if we drop the original
 > metadata we need to be able to build a new (read-only) one from the
 > current state.

 Good point

 > Things we wanted but where too hard to do so far:
 > - Separate annotation reading from Document building. Separate
 > validity checks too.

 +1 See above. I want to address this in another issue. We will need
 another intermediate
 model for that. With this in place we can remove commons-annotaiotns and
 easily
 consume a Jandex index as well

 > - It checks for JPA @Id using reflection as it might not be available
 > -> pluggable?

 Not sure what you mean here. That's just a very specific JPA/ORM based use
 case.

 > - LuceneOptionsImpl are built at runtime each time we need one ->
 > reuse them, coupling them to their field

 +1

 >  - We need a reliable way to track which field names are created, and
 > from which bridge they are originating (including custom bridges:
 > HSEARCH-904)

 See above and the FieldNameReportingBridge I am suggesting

 > == Solution ? ==
 >
 > Now let's assume that we can build this as a recursive structure which
 > accepts a generic visitor. …

 that's where you loose me. I think I am a little like Emmanuel here. Where
 does a
 Visitor pattern help here?

 --Hardy

 _______________________________________________
 hibernate-dev mailing list
 hibernate-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] DocumentBuilder refactoring in Hibernate Search: how to deal (internally) with metadata