[hibernate-dev] Hibernate Search Metadata API: Numeric and other special Field types in Hibernate Search

Tue Jul 16 09:06:08 EDT 2013

I agree with the unwrap proposal, but I could not get it to work if we
involve Enums. It seems like the user needs to know how to cast, which
is quite ugly.
I don't like the Visitor approach in this case as it requires the user
code to implement a method per type, which means we would break
backwards compatibility any time we where adding a new custom
"encoding" of fields.

On Hardy's further comments:

I was also wondering if we should not get rid of ID. As you say it's
not special to the user; we might also want to add
org.hibernate.search.ProjectionConstants.OBJECT_CLASS .. but this is
assuming we want to expose the Lucene structure accurately rather than
our property metadata.

On OPAQUE : didn't you say that the bridges should provide the
metadata relating to the fields they plan to add? In such a case I
would then expect to be able to inspect the fields.

On 16 July 2013 10:19, Hardy Ferentschik <hardy at hibernate.org> wrote:
> Ahh, and I forgot to ask Gunnar whether he thinks the API fits the OGM use case now?
>
> --Hardy
>
> On 16 Jan 2013, at 11:18 AM, Hardy Ferentschik <hardy at hibernate.org> wrote:
>
>> To give some more context. The split of FieldSettingsDescriptor and FieldDescriptor is driven by the
>> discussion we had regarding HSEARCH-904 and the extension of the bridge interface, allowing it
>> to report the fields it creates. Custom bridges need to create or at least provide the information for the
>> metadata. The idea is that they can create FieldSettingsDescriptors. The reasons of the split if that
>> some information a bridge does not have access to, so it cannot and should not create it.
>>
>> Regarding FieldDescriptor#Type, my gut feeling is also that we should get rid of
>> FieldSettingsDescriptor#isNumeric and create instead FieldDescriptor#Type.NUMERIC. This also
>> implies though that the Type enum and its getter should move into FieldSettingsDescriptor as well.
>> I guess that makes sense.
>>
>> The one thing I am wondering about is, is whether we we are not starting to mix different type concepts.
>> Type.ID is not really a Lucene specific field encoding. It just says that this field has a special meaning for
>> Hibernate Search, as it is the unique document id. NUMERIC, however, is a type of Lucene encoding and
>> maybe there will be more. In this context, I was ordering whether SPATIAL should be another enum type.
>> Initially I also thought that the Type enum could be used as well to express the opaqueness idea Emmanuel
>> mentioned. An emum type of OPAQUE would then mean that this field gets generated by the bridge, but is
>> opaque to the application/user. Something like this would for sure overload the current meaning of the Type enum.
>>
>> So maybe we need multiple enum types, but that of course increases the complexity of the API and
>> already the FieldSettingsDescriptor and FieldDescriptor split is on a first glance hard to understand.
>>
>> Leaves the problem of additional properties based on a specific field type, e.g. precisionStep.
>> I go with Gunnar and Emmanuel on this one preferring the unwrapping approach. The question is just
>> wether we want to do it already now. How likely is it that we get other types?
>>
>> To sum up, here is what I think we need to decide on.
>>
>> Regarding isNumeric
>> 1) Move  FieldDescriptor#Type into FieldSettingsDescriptor and add a NUMERIC type (leaving us with ID, BASIC, NUMERIC)
>> 2) Create a new enum (called Type or maybe better Encoding) and have NUMERIC hosted there, together with BASIC ;-)
>>
>> Regarding precisionsStep
>> 1) Leave it as is under the assumption that there won't be many (if any) new type/encoding specific properties
>> 2) Create FieldSettingsDescriptor subtypes like NumericFieldSettingsDescriptor and use the unwrap approach to
>> host additional properties.
>>
>> Thoughts?
>>
>> --Hardy
>>
>>
>>
>> On 16 Jan 2013, at 9:27 AM, Gunnar Morling <gunnar at hibernate.org> wrote:
>>
>>> Hi,
>>>
>>>> I won't mention my favorite Vattern. I've considered adding subtypes
>>> but not liking it as their usage would not be clear from the API.
>>>
>>> How would you use your vavorite Vattern without subtypes? And which other
>>> option would you prefer then?
>>>
>>> I think the field-type specific information can either be on
>>>
>>> a) FieldSettingsDescriptor itself (as is today)
>>> b) specific subtypes of FSD
>>> c) specific delegates of FSD, safely accessible via a type parameter of FSD
>>>
>>> a) would IMO be the simplest but could lead to a proliferation of
>>> attributes on FSD; So as you say it depends on the number of specific
>>> attributes whether its feasible or not. But even if sticking to this
>>> approach, we might consider to replace boolean isNumeric() with FieldType
>>> getFieldType(). This would avoid adding a new isXy() method for each
>>> specific type and be used like so:
>>>
>>>   if ( desc.fieldType = NUMERIC) {
>>>       doSomething( desc.precisionStep() );
>>>   else if ( desc.fieldType = FOO ) {
>>>       doSomething( desc.fooAttrib() );
>>>   }
>>>
>>> For b), you need a way to narrow down to the subtype, either via Visitor or
>>> some kind of cast. I still find this pattern as used in BV reads quite
>>> nicely:
>>>
>>>   Object result = null;
>>>   if ( desc.fieldType = NUMERIC) {
>>>       result = doSomething( desc.as( NumericDescriptor.class).precisionStep()
>>> );
>>>   else if ( desc.fieldType = FOO ) {
>>>       result doSomething( desc.as( FooDescriptor.class).fooAttrib() );
>>>   }
>>>
>>> In particular, as() would only accept subtypes of Descriptor and be thus a
>>> bit safer than a plain downcast.
>>>
>>> Btw., the annotation processing API (as e.g. used by the Meta model
>>> generator or the AP in Hibernate Validator), offers both ways for that
>>> purpose, i.e. a visitor approach and, getKind()  + downcast. Having worked
>>> with both, I find the usually simpler to use.
>>>
>>> For a comparison, the last example would look like this with a visitor
>>> design similar to the annotation processing API (the type parameters are
>>> for parameter and return type passed to/retrieved from the visitor):
>>>
>>>   Object result = desc.accept(
>>>       new FieldDescriptorVisitor<Void, Object>() {
>>>
>>>           @Override
>>>           Object visitAsNumber(NumberDescriptor descriptor, Void p) {
>>>               return doSomething( descriptor.precisionStep() );
>>>           }
>>>
>>>           @Override
>>>           Object visitAsFoo(FooDescriptor descriptor, Void p) {
>>>               return doSomething( descriptor.fooAttrib() );
>>>           }
>>>       }
>>>   );
>>>
>>> Personally, I find this reads and writes not as nice as the other approach.
>>>
>>> Regarding c), one could think of something like this:
>>>
>>>   class FieldSettingsDescriptor<T extends DescriptorSpecifics> {
>>>       public T getSpecifics();
>>>       ...
>>>   }
>>>
>>> and
>>>
>>>   FieldSettingsDescriptor<NumberSpecifics> numberDescriptor = ...;
>>>   doSomething( numberDescriptor.getSpecifics().precisionStep() );
>>>
>>> But the question is how one would obtain a properly typed descriptor. E.g.
>>> from a collection with mixed fields, one would only get
>>> FieldSettingsDescriptor<?>, making this quite pointless.
>>>
>>> I think, I'd like the getType()/as() approach best. Or do you have yet
>>> another approach in mind?
>>>
>>> --Gunnar
>>>
>>>
>>>
>>> 2013/7/15 Sanne Grinovero <sanne at hibernate.org>
>>>
>>>> The new FieldSettingsDescriptor [1] has a couple of methods meant for
>>>> Numeric fields:
>>>>
>>>> /**
>>>> * @return the numeric precision step in case this field is indexed as
>>>> a numeric value. If the field is not numeric
>>>> *         {@code null} is returned.
>>>> */
>>>> Integer precisionStep();
>>>>
>>>> /**
>>>> * @return {@code true} if this field is indexed as numeric field,
>>>> {@code false} otherwise
>>>> *
>>>> * @see #precisionStep()
>>>> */
>>>> boolean isNumeric();
>>>>
>>>> Today we have specific support for the
>>>> org.apache.lucene.document.NumericField type from Lucene, so these are
>>>> reasonable (and needed to build queries) but this specific kind is
>>>> being replaced by a more general purpose encoding so that you don't
>>>> have "just" NumericField but can have a wide range of special fields.
>>>>
>>>> So today for simplicity it would make sense to expose these methods
>>>> directly on the FieldSettingsDescriptor as it makes sense for our
>>>> users, but then also the #isNumeric() is needed as not all fields are
>>>> numeric: we're having these extra methods to accommodate for the needs
>>>> of some special cases.
>>>>
>>>> Considering that we might get more "special cases" with Lucene4, and
>>>> that probably they will have different options, would we be able to
>>>> both decouple from these specific options and also expose the needed
>>>> precisionStep ?
>>>>
>>>> I won't mention my favorite Vattern. I've considered adding subtypes
>>>> but not liking it as their usage would not be clear from the API.
>>>>
>>>> Cheers,
>>>> Sanne
>>>>
>>>> 1 - as merged two minutes ago
>>>> _______________________________________________
>>>> hibernate-dev mailing list
>>>> hibernate-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>>
>>> _______________________________________________
>>> hibernate-dev mailing list
>>> hibernate-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>
>
>
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev