Hi Hardy,
great proposal for the meta-data API. I've added some comments inline.
--Gunnar
2013/5/30 Hardy Ferentschik <hardy(a)hibernate.org>
Gee, that's an email ;-)
Before getting too much into it I think it would be useful to talk about
what I am actually doing.
I am trying to expose a meta data API for Search which allows users to
determine which entities are
indexed and which fields are available for each entity. I am trying to do
a similar approach to
Bean Validation where all metadata is exposed via descriptors. The entry
point into the API is the
SearchFactory. I am basically thinking about something like this (feedback
welcome):
/**
* Top level descriptor of the metadata API. Giving access to the indexing
information for a single entity.
*
* @author Hardy Ferentschik
*/
public interface IndexedEntityDescriptor {
I find the name "IndexedEntityDescriptor" in conjunction with isIndexed()
potentially returning "false" a bit irritating. Maybe just
EntityDescriptor? Or SearchableEntityDescriptor?
/**
* @return Returns {@code true} if the entity for this descriptor
is indexed, {@code false} otherwise
*/
boolean isIndexed();
Maybe return an enum if this can potentially be more than a simple yes/no?
I don't know how likely that is, but an enum would allow for evolvement.
/**
* @return Returns the class boost value, 1 being the default.
*/
float getClassBoost();
/**
* @return Returns the names of the indexes instances of the
entity are indexed into. Generally this will
* be just one index, however, when sharding is applied
multiple indexes per entity can be used.
*/
Set<String> getIndexNames();
Would something like Set<IndexDescriptor> getIndexes() make sense?
/**
* @return Returns a set of {@code FieldDescriptor}s for the
indexed fields of the entity.
*/
// TODO does this include the id field descriptor or should that
be a separate descriptor?
At least for my case I think it would be easier if this contained all field
descriptors so I can handle them uniformly. Maybe FieldDescriptor#isId() or
if there are more id specific things something like this could be added:
if ( fieldDescriptor.getType = DescriptorType.ID ) {
fieldDescriptor.as( IdDescriptor.class ).somethingIdSpecific();
}
// TODO should OBJECT_CLASS be considered?
Set<FieldDescriptor> getIndexedFields();
Could you also add FieldDescriptor getIndexedField(String fieldName);
}
/**
* Metadata related to a single indexed field.
*
* @author Hardy Ferentschik
*/
public interface FieldDescriptor {
/**
* Returns the Lucene {@code Document} field name for this indexed
property.
*
* @return Returns the field name for this index property
*/
String getFieldName();
I'd call it just "getName()", not repeating the type's name.
/**
* @return Returns an {@code Analyze} enum instance defining the
type of analyzing applied to
* this field.
*/
Analyze getAnalyzeType();
/**
* @return Returns a {@code Store} enum instance defining whether
the index value is stored in the index itself.
*/
Store getStoreType();
/**
* @return Returns a {@code TermVector} enum instance defining
whether and how term vectors are stored for this
* field
*/
TermVector getTermVectorType();
/**
* @return Returns a {@code Norms} enum instance defining whether
and how norms are stored for this
* field
*/
Norms getNormType();
/**
* @return Returns the boost value for this field. 1 being the
default value.
*/
float getBoost();
/**
* @return Returns the string used to index {@code null} values.
{@code null} in case null values are not indexed.
*/
String nullIndexedAs();
/**
* @return Returns the field bridge instance used to convert the
property value into a string based field value
*/
FieldBridge getFieldBridge();
/**
* @return Returns the analyzer used for this field, {@code null}
if the field is not analyzed
*/
Analyzer getAnalyzer();
}
On top of this I am planning to add (addressing HSEARCH-903):
public interface FieldNameReportingBridge {
Iterable<String> getGeneratedFieldNames(String baseFieldName);
}
Not better a Set? Returning Iterable makes it harder for users (e.g. no
contains()) and also hides set vs. list semantics.
The latter I need to allow custom bridges to report which fields they
add.
Most of the information I need to implement all this is in
AbstractDocumentBuilder.PropertiesMetadata. The plan so far
was to extract the information from there and while working in this making
PropertiesMetadata a proper object (instead of the
parallel arrays thingy).
+1
Maybe some other minor refactorings along the way. I was not going
to
touch the processing of annotations
for now. As discussed that, there we would need yet another level of
abstraction (similar to EntitySource in ORM or BeanConfiguration
in HV). Something which can be populated by either annotation processing
(be it Jandex or reflection) or by the the programmatic API.
Different story though.
For what I can tell I don't need a Visitor pattern for what I have planned
to do so far. If you think I am on the wrong track let me know
and let me see the light.
One thing I was wondering about after your email, however, was whether the
API needs to provide information which field/getter/class
is responsible for creating a given Lucene Document Field. Do we have a
use case for that?
On 29 Jan 2013, at 6:39 PM, Sanne Grinovero <sanne(a)hibernate.org> wrote:
> We're starting a series of refactorings in Hibernate Search to improve
> how we handle the entity mapping to the index; to summarize goals:
>
> 1# Expose the Metadata as API
>
> We need to expose it because:
> a - OGM needs to be able to read this metadata to produce appropriate
queries
@gunnar, does the API above address your needs?
Yes, from what I'm aware of atm. I think so.
> Personally I think we end up needing this just as an SPI: that might
> be good for cases {a,b}, and I have an alternative proposal for {c}
> described below.
-1 why SPI. I think this is a very general purpose API useful for any
users.
For example, you could image to build auto field suggesting query field
which
makes suggestions on which fields you can search on (a little like the
Jira queries).
In this case you could get the available fields via this API. Just to
mention one use case.
> However we expose it, I think we agree this should be a read-only
> structure built as a second phase after the model is consumed from
> (annotations / programmatic API / jandex / auto-generated by OGM).
+1
> It
> would also be good to keep it "minimal" in terms of memory cost, so to
> either:
> - drop references to the source structure
> - not holding on it at all, building the Metadata on demand (!)
> (Assuming we can build it from a more obscure internal representation
> I'll describe next).
Given that I am going to build it from required runtime information it
could for sure
be lazily loaded. However, right now I think I will just go for the
straight forward approach.
> 3# MutableSearchFactory
>
> Let's not forget we also have a MutableSearchFactory to maintain: new
> entities could be added at any time so if we drop the original
> metadata we need to be able to build a new (read-only) one from the
> current state.
Good point
> Things we wanted but where too hard to do so far:
> - Separate annotation reading from Document building. Separate
> validity checks too.
+1 See above. I want to address this in another issue. We will need
another intermediate
model for that. With this in place we can remove commons-annotaiotns and
easily
consume a Jandex index as well
> - It checks for JPA @Id using reflection as it might not be available
> -> pluggable?
Not sure what you mean here. That's just a very specific JPA/ORM based use
case.
> - LuceneOptionsImpl are built at runtime each time we need one ->
> reuse them, coupling them to their field
+1
> - We need a reliable way to track which field names are created, and
> from which bridge they are originating (including custom bridges:
> HSEARCH-904)
See above and the FieldNameReportingBridge I am suggesting
> == Solution ? ==
>
> Now let's assume that we can build this as a recursive structure which
> accepts a generic visitor. …
that's where you loose me. I think I am a little like Emmanuel here. Where
does a
Visitor pattern help here?
--Hardy
_______________________________________________
hibernate-dev mailing list
hibernate-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev