[hibernate-dev] Hibernate Search Metadata API: Numeric and other special Field types in Hibernate Search

Hardy Ferentschik hardy at hibernate.org
Tue Jul 16 05:18:44 EDT 2013

To give some more context. The split of FieldSettingsDescriptor and FieldDescriptor is driven by the 
discussion we had regarding HSEARCH-904 and the extension of the bridge interface, allowing it
to report the fields it creates. Custom bridges need to create or at least provide the information for the
metadata. The idea is that they can create FieldSettingsDescriptors. The reasons of the split if that
some information a bridge does not have access to, so it cannot and should not create it. 

Regarding FieldDescriptor#Type, my gut feeling is also that we should get rid of  
FieldSettingsDescriptor#isNumeric and create instead FieldDescriptor#Type.NUMERIC. This also
implies though that the Type enum and its getter should move into FieldSettingsDescriptor as well.
I guess that makes sense. 

The one thing I am wondering about is, is whether we we are not starting to mix different type concepts. 
Type.ID is not really a Lucene specific field encoding. It just says that this field has a special meaning for
Hibernate Search, as it is the unique document id. NUMERIC, however, is a type of Lucene encoding and
maybe there will be more. In this context, I was ordering whether SPATIAL should be another enum type.
Initially I also thought that the Type enum could be used as well to express the opaqueness idea Emmanuel
mentioned. An emum type of OPAQUE would then mean that this field gets generated by the bridge, but is
opaque to the application/user. Something like this would for sure overload the current meaning of the Type enum. 

So maybe we need multiple enum types, but that of course increases the complexity of the API and 
already the FieldSettingsDescriptor and FieldDescriptor split is on a first glance hard to understand. 

Leaves the problem of additional properties based on a specific field type, e.g. precisionStep.
I go with Gunnar and Emmanuel on this one preferring the unwrapping approach. The question is just 
wether we want to do it already now. How likely is it that we get other types?

To sum up, here is what I think we need to decide on.

Regarding isNumeric
1) Move  FieldDescriptor#Type into FieldSettingsDescriptor and add a NUMERIC type (leaving us with ID, BASIC, NUMERIC)
2) Create a new enum (called Type or maybe better Encoding) and have NUMERIC hosted there, together with BASIC ;-)

Regarding precisionsStep
1) Leave it as is under the assumption that there won't be many (if any) new type/encoding specific properties
2) Create FieldSettingsDescriptor subtypes like NumericFieldSettingsDescriptor and use the unwrap approach to
host additional properties. 



On 16 Jan 2013, at 9:27 AM, Gunnar Morling <gunnar at hibernate.org> wrote:

> Hi,
>> I won't mention my favorite Vattern. I've considered adding subtypes
> but not liking it as their usage would not be clear from the API.
> How would you use your vavorite Vattern without subtypes? And which other
> option would you prefer then?
> I think the field-type specific information can either be on
> a) FieldSettingsDescriptor itself (as is today)
> b) specific subtypes of FSD
> c) specific delegates of FSD, safely accessible via a type parameter of FSD
> a) would IMO be the simplest but could lead to a proliferation of
> attributes on FSD; So as you say it depends on the number of specific
> attributes whether its feasible or not. But even if sticking to this
> approach, we might consider to replace boolean isNumeric() with FieldType
> getFieldType(). This would avoid adding a new isXy() method for each
> specific type and be used like so:
>    if ( desc.fieldType = NUMERIC) {
>        doSomething( desc.precisionStep() );
>    else if ( desc.fieldType = FOO ) {
>        doSomething( desc.fooAttrib() );
>    }
> For b), you need a way to narrow down to the subtype, either via Visitor or
> some kind of cast. I still find this pattern as used in BV reads quite
> nicely:
>    Object result = null;
>    if ( desc.fieldType = NUMERIC) {
>        result = doSomething( desc.as( NumericDescriptor.class).precisionStep()
> );
>    else if ( desc.fieldType = FOO ) {
>        result doSomething( desc.as( FooDescriptor.class).fooAttrib() );
>    }
> In particular, as() would only accept subtypes of Descriptor and be thus a
> bit safer than a plain downcast.
> Btw., the annotation processing API (as e.g. used by the Meta model
> generator or the AP in Hibernate Validator), offers both ways for that
> purpose, i.e. a visitor approach and, getKind()  + downcast. Having worked
> with both, I find the usually simpler to use.
> For a comparison, the last example would look like this with a visitor
> design similar to the annotation processing API (the type parameters are
> for parameter and return type passed to/retrieved from the visitor):
>    Object result = desc.accept(
>        new FieldDescriptorVisitor<Void, Object>() {
>            @Override
>            Object visitAsNumber(NumberDescriptor descriptor, Void p) {
>                return doSomething( descriptor.precisionStep() );
>            }
>            @Override
>            Object visitAsFoo(FooDescriptor descriptor, Void p) {
>                return doSomething( descriptor.fooAttrib() );
>            }
>        }
>    );
> Personally, I find this reads and writes not as nice as the other approach.
> Regarding c), one could think of something like this:
>    class FieldSettingsDescriptor<T extends DescriptorSpecifics> {
>        public T getSpecifics();
>        ...
>    }
> and
>    FieldSettingsDescriptor<NumberSpecifics> numberDescriptor = ...;
>    doSomething( numberDescriptor.getSpecifics().precisionStep() );
> But the question is how one would obtain a properly typed descriptor. E.g.
> from a collection with mixed fields, one would only get
> FieldSettingsDescriptor<?>, making this quite pointless.
> I think, I'd like the getType()/as() approach best. Or do you have yet
> another approach in mind?
> --Gunnar
> 2013/7/15 Sanne Grinovero <sanne at hibernate.org>
>> The new FieldSettingsDescriptor [1] has a couple of methods meant for
>> Numeric fields:
>> /**
>> * @return the numeric precision step in case this field is indexed as
>> a numeric value. If the field is not numeric
>> *         {@code null} is returned.
>> */
>> Integer precisionStep();
>> /**
>> * @return {@code true} if this field is indexed as numeric field,
>> {@code false} otherwise
>> *
>> * @see #precisionStep()
>> */
>> boolean isNumeric();
>> Today we have specific support for the
>> org.apache.lucene.document.NumericField type from Lucene, so these are
>> reasonable (and needed to build queries) but this specific kind is
>> being replaced by a more general purpose encoding so that you don't
>> have "just" NumericField but can have a wide range of special fields.
>> So today for simplicity it would make sense to expose these methods
>> directly on the FieldSettingsDescriptor as it makes sense for our
>> users, but then also the #isNumeric() is needed as not all fields are
>> numeric: we're having these extra methods to accommodate for the needs
>> of some special cases.
>> Considering that we might get more "special cases" with Lucene4, and
>> that probably they will have different options, would we be able to
>> both decouple from these specific options and also expose the needed
>> precisionStep ?
>> I won't mention my favorite Vattern. I've considered adding subtypes
>> but not liking it as their usage would not be clear from the API.
>> Cheers,
>> Sanne
>> 1 - as merged two minutes ago
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev

More information about the hibernate-dev mailing list