[hibernate-dev] API changes in FieldBridge

Wed Sep 7 13:35:32 EDT 2011

Summary: we won't use the annotations, but we will add additional
Interfaces in future, which will be optional.
The only change needed for 4.0 to make this possible later is to make
the FieldBridge implementations stateful: they will need an initialize
method
to receive the field name and LuceneOptions. These same parameters are
currently passed to the set(...) method and will be removed from
there.

Why, and future plan:
Edited IRC log, I actually tried to shorten it a bit, taking out the
fun parts..:)

<sannegrinovero> FIELD_PREFIXES might make sense too :) a
FieldSelector can make good use of that.
<sannegrinovero> RE "No, the mapping annotation accepts StringBridge /
FieldBridge and their TwoWay counterparts."
<sannegrinovero> really didn't know that, I'm glad you had time to
inspect my rants. I still have something to learn from Search.
<emmanuel> :)
<emmanuel> I like the annotation approach too though I'm not fan of
the names yet
<hardy> talking about bridges, was there not an issue once to sort out
some of the inheritance used in these classes
<sannegrinovero> RE: "That made me realize. Most of our bridges are
stateless wrt the field name. ie the field name is passed as a
parameter and we use it."
<hardy> if I remember correctly there was some sort of "problem/issue"
<sannegrinovero> that leads me to think we should otherwise inject the
field name and options to the bridge, and have it statefull
<sannegrinovero> hardy, you mean with the inheritance of interfaces of
the FieldBridge & co, or the annotated entities ?
<hardy> of the FieldBridge
<hardy> I am pretty sure that used to be a Jira or a todo somewhere
<sannegrinovero> I don't remember. I might have brought up that we
have too many and that I get confused on them :)
<hardy> / FIXME rework the interface inheritance there are some common
concepts with StringBridge
<hardy> in TwoWayFieldBridge
* emmanuel has quit (Ping timeout: 246 seconds)
<sannegrinovero> right, that's the one I referred to, and why it
scared me a bit to add more variants to them.
<hardy> sure
<hardy> but as I said I was not talking about interface inhertiance
<hardy> my suggestion was a new independet interface
<hardy> maybe we should figure out what this comment actually means
and address the issue and/or remove the comment
<sannegrinovero> yes got that during lunch :) I'm not super happy but
it's clearly the best proposal so far.
<hardy> imo it is a good solution
<sannegrinovero> yes well as I said it just scares me a bit, mostly
because as I just have proven on the mailing list I'm not very
familiar with this area and gets me confused. might well be that if I
happen to code this I'll love it after the deep-dive
<sannegrinovero> hardy, did you find an issue for that FIXME you referred to?
<hardy> I haven't looked
<sannegrinovero> because an annotation can be added later, but
breaking the interfaces should be done asap if we in fact get a good
grasp on this
<sannegrinovero> and to answer on your last mail: we really need to
push out of the schedule just everything which is not needed for 4.0,
there's way too much already.
<sannegrinovero> so what I'm trying to do now is to clarify as much as
possible how exactly the API has to change, in such a way that we will
be able to add new cool features and additional optimisations in
future, we have so many ideas but they won't fit all.
<hardy> sure
<hardy> given our tight schedule i think it makes sense to not break
the interface
<hardy> we seem to have a backward compatible solution for a 4.x version
<sannegrinovero> of course if there is no need to break it we won't,
but if we later figure out we where needing it where' stuck again
<sannegrinovero> so for now my plan was that
<sannegrinovero> if we need to mandate a new method, we can as well
ignore it for 4.0 but we have to declare it in the interface
<sannegrinovero> just that this is incredibly hard to do in practice :)
<sannegrinovero> can we boil down a proposal to send to ML for when
emmanuel returns?
<hardy> so what's the options?
<hardy> #1 new interface a bridge can implement
<hardy> #2 a new @TargetedFields annotation
<hardy> is this what it boils down to?
<sannegrinovero> yes but I'm not sure if they are alternatives, it
looks more like 1# solves problem one and 2# solved problem 2 :)
<sannegrinovero> as how I had numbered them in the first email
<hardy> got you
<sannegrinovero> yes maybe it was wrong to push in so many issues in
the same thread, but they are all related
<hardy> of course
<sannegrinovero> on @TargetedFields, how do you see it relate the
actual field names to the name passed as a parameter? an annotation is
static ..
<sannegrinovero> looks like the @TargetedFields options will need a
placeholder token for the fieldname they are used on
<sannegrinovero> but this quite complicates the re-use of such an
instance on different properties, looks like we will need to make it a
stateful component.
<hardy> I don't follow
<sannegrinovero> example, assuming you have Search's src open:
<hardy> open now
<sannegrinovero> actually this will work better:
http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#example-field-bridge
<sannegrinovero> you see the "set(String name ... other parameters)"
<sannegrinovero> then the implementation forges the "[name].year",
"[name].month" fields
<hardy> yes
<sannegrinovero> so this implementation should be annotated as using
which fields?
<sannegrinovero> when annotating it with  @TargetedFields I won't know
the value of the name parameter
<sannegrinovero> it should be something like
<sannegrinovero> @TargetedFields(value=FIELD_LIST,
fieldList={"bornAt.year", ""bornAt.month"})
<sannegrinovero> but the "bornAt" is not known when you create the
DateSplitBridge implementation, or if you know it, then the bridge
won't be reusable
<hardy> ahh
<sannegrinovero> or it would need to have:
<sannegrinovero> @TargetedFields(value=FIELD_PREFIXES, fieldList="bornAt.*")
<sannegrinovero> which doesn't work
<sannegrinovero> did I clarify the problem?
<hardy> jupp
<sannegrinovero> so what I was saying above is that we could use a marker token
<sannegrinovero> as in practice:
<sannegrinovero> @TargetedFields(value=FIELD_LIST,
fieldList={"$FIELD$.year", "$FIELD$.month"})
<sannegrinovero> but it's just not very nice to have, and hard to
implement as we would have to do string replacements at each
invocation. better to pass $FIELD$ to the constructor and have this
thing statefull
<sannegrinovero> so since we will want to keep the no-arg constructor,
maybe we should add an initialize method, which sets some stuff:
<sannegrinovero> a) Field name
<sannegrinovero> b) Index options (Store, tokenized, etc..)
<sannegrinovero> etc.. basically passing the LuceneOptions to be
stored in a field.
<hardy> you mean on the bridge
<sannegrinovero> yes
<sannegrinovero> this breaks API, but it's not out of question now; at
this point we should strife for the best technical solution as we'll
have to live with it for some time
<hardy> well, an initialize method would work
<sannegrinovero> but then we still need a method returning us the
information, instead of the annotations.
<hardy> and the LuceneOption are fixed for a given bridge, right?
<sannegrinovero> yes the LuceneOption is the in-memory representation
of the options used to annotate the @Field.
<sannegrinovero> I see a benefit in the fact that we won't have to
pass-in the LuceneOption anymore, but write to the bridge instead. is
a good start to simplify the DocumentBuilder
<hardy> i agree
<hardy> sounds good
<hardy> let me think for a while. trying to answer another question in
parallel ;-)
* epbernard (~Adium at redhat/jboss/epbernard) has joined #hibernate-dev
<hardy> sannegrinovero: still there?
<sannegrinovero> got some time to think about it?
<hardy> not so much, but at least I could finish the other conversation
<hardy> regarding the FeildBrdige interface, the latest was we have
<hardy> FieldBridgeMeta initalize(String name. LuceneOptions)
<sannegrinovero> we where discussing the suitability of annotations vs
store some state in the instance
<hardy> void set(String name. Object value. Document document)
<hardy> right and then you suggested an initialize methoid
<sannegrinovero> yes but taking this path will also yet-again
introduce the need for getters to ask the entity about which fields
it's going to need when loading the document, or if these
optimisations should be disabled (possibly by having this method
missing)
<sannegrinovero> we could move the annotation down to this getter, and
have it "mark" which method should be used without defining it in an
interface
<sannegrinovero> so it could be:
<sannegrinovero> @TargetedFields(value=FIELD_PREFIXES) String
getPrefix() { return this.fieldName + "."; }
<sannegrinovero> or
<sannegrinovero> @TargetedFields(value=FIELD_LIST) Iterable<String>
listFieldsToLoad() { ... return a list; }
<sannegrinovero> but having no interface mandating the method names.
<sannegrinovero> yes that's #2
<hardy> and at loading optimizations we need to know which fields in a
Document news to be loaded to re-hydrate the entity
<hardy> that's problem #1
<sannegrinovero> yes ve're trying to create an Object / Index mapper
<sannegrinovero> :)
<sannegrinovero> and one important point is striking me:
<sannegrinovero> we should really stop assuming the Fields added to
documents are Strings
<sannegrinovero> and start abstracting to Fieldable
<hardy> right, that would solve some isses around numeric fields
<sannegrinovero> and also LazyFields, I've noticed two users asking
about them lately
<sannegrinovero> so.. did you manage to understand what's the purpose
of objectToString ?
<hardy> ok, did not come across these, but yes I see the point
<hardy> I think you are right that we could remove it
<hardy> FieldBridge defines the set method
<hardy> and TwoWayFieldBridge the get
<sannegrinovero> ok, that makes sense :)
<sannegrinovero> so let the rule "A" be the fact we work with Fieldables
<sannegrinovero> B: we favor one-way, all return transformations are
the "TwoWayX"
<sannegrinovero> C: we favor 1:1 relations between properties and Fields
<sannegrinovero> D: we need a way to define multiple Fieldables -> one object
<sannegrinovero> We agree on these right?
<hardy> why do we need the last point
<sannegrinovero> because in the current situation it's supported for a
TwoWayFieldBridge to "extract" the object out of the Document
<sannegrinovero> (current situation = Hibernate Search 3.4)
<sannegrinovero> and this is supposed to work on more than one field
<sannegrinovero> the whole discussion on #1 is about the fact that we
need to know which Fieldsables should be loaded out of the Document to
satisfy these TwoWayFieldBridge cases :)
<sannegrinovero> hardy to be straight to the point, the problem is in
: org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.checkAllowFieldSelection()
<sannegrinovero> just have a look into
org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.checkAllowFieldSelection()
<sannegrinovero> the problem is that we return false very often, we
need to be able to return true as often as possible
<hardy> right and here an additonal interface would help
<sannegrinovero> +1
<hardy> where we check whether the bridge implements it and if so it
returns the name of fieldables
<sannegrinovero> yes, of course it won't be super-easy as it would
change from a yes/no to a "which ones" answer, but still quite
straight forward to implement
<hardy> which brings me to @TargetedFields which was supposed to solve case #2
<hardy> the thing which confuses me now is that is also seems to make
use of fieldnames
<sannegrinovero> well as I said, the "which ones" answer can't be
answered without knowing on which field name the bridge is applied
<sannegrinovero> and keep in mind this additional complexity:
<sannegrinovero> currently that method checkAllowFieldSelection() is
global for the entity. It disables FieldSelection forever. On all
kinds of queries.
<sannegrinovero> while when we do a projection, the user selects some
properties to be loaded, not all of them
<hardy> but is projection not different?
<sannegrinovero> so we should be able to figure out at query time,
taken the property names, to answer both questions a) if a
FieldSelection will be allowed and b) which Fieldables will the SUBSET
of bridges need
<hardy> I mean there we can optimize based on the projected field names, right?
<sannegrinovero> exactly, just saying it won't be as simple as this
method which is currently a static boolean coupled to the type, it's
going to be dynamic according to the choosen projection (faceting?)
<hardy> in the projection case, can we not always use field selection
<hardy> the user tells us which fields to load
<sannegrinovero> of course not always, but the decision will be based
on a subset on all defined bridges, not all of them.
<sannegrinovero> and this optimisation can be huge, the difference
between using a FieldSelector and projecting on all values is not a
small one with big indexes
<sannegrinovero> we're on the same page about the why? and what we need right?
<hardy> getting there
<hardy> in the end we kept talking about case #1
<sannegrinovero> well both #1 and #2 are about the Bridges anyway
<hardy> just talking about case #2 now
<hardy> I am not so sure whether #2 is about the bridges
<sannegrinovero> mm why? can't we finish #1 ?
<hardy> I need to understand why #2 is about the bridges
<sannegrinovero> I'm assuming we're on the same page on what we need
for #1, but are we agreeing on the solution?
<sannegrinovero> ah ok
<sannegrinovero> well because the ClassBridge is a bridge too
<hardy> yes, we agree on what we need for #1
<sannegrinovero> and it will produce an ouput by "reading" the entity
<sannegrinovero> so from core we receive a list of which properties
are dirty, and which ones didn't change.
<hardy> but let's step back for a second
<sannegrinovero> sure, ask me whatever you want :)
<hardy> so independent from the bridges what we want to know is which
property of the entity affects the Document
<hardy> it is fair enough to assume that all properties annotated with
@Field do so
<sannegrinovero> exactly
<hardy> and the tricky one is the class bridge since we don't know what it reads
<sannegrinovero> so today we apply dirty-ness optimisation based on @Field
<sannegrinovero> exactly
<hardy> but why could not not have another marker annotation which
could be used to mark properties which are not annotated with @Field,
but are needed to the Document due to a class bridge
<sannegrinovero> technically, we're not totally safe today as even the
field bridge could "navigate" to the parent object and read unexpected
values, but that's not our problem :) and there's a global option to
disable it anyway.
<sannegrinovero> because of two reasons:
<sannegrinovero> 1) it depends on the Bridge implementation, it's
bound to a type and should be reusable
<sannegrinovero> 2) because an entity has different bridges and can
get messy with @IndexedEmbedded, different depths, etc..
<sannegrinovero> so to reuse the Bridge independenly from the type, it
should be defined on the bridge, or the bridge respponsibility to tell
<sannegrinovero>  Maybe now it's clear why I suggested to intercept
invocations to the getters of the entity?
<hardy> it made sense from the beginning as a potential solution anyway
<sannegrinovero> ah, just found a reason to NOT want the interception approach:
<hardy> that's good, because I am not a fan of it anyways
<sannegrinovero> a bridge will always initialize lazy collections if
we allow it to run first, to "listen" which getters it's going to
invoke
<sannegrinovero> so the interception would allow us to know if
indexing is needed or not, but won't prevent us to needlessly
initialize unloaded collections
<sannegrinovero> while when we know the property names which are
expected to be used by the Bridge, we can consider all non-initialized
collections as non-dirty.
<hardy> couldn't we solve #2 not similar to #2 with an additional
interface - one that does not return field names, but property names
<sannegrinovero> of course
<sannegrinovero> String[] is what we need@
<sannegrinovero>
org.hibernate.search.event.impl.FullTextIndexEventListener.getDirtyPropertyNames(PostUpdateEvent)
<hardy> and would that not be consistent and easy?
<sannegrinovero> (property names)
<sannegrinovero> yes :)
<sannegrinovero> from my initial email:
<sannegrinovero> #2 : Proposal: Iterable<String> getInputPropertyNames()
<hardy> so why this whole discussion around @TargetedFields etc. It
completely confused me :-)
<sannegrinovero> as usual we all toss many ideas on the table :)
<sannegrinovero> and it's a good thing
<hardy> of course
<sannegrinovero> I also mentioned this method:
<sannegrinovero> boolean allowSkipOnUnchangedFields()
<sannegrinovero> but this is because I was assuming I was going to add
the  Iterable<String> getInputPropertyNames() to the main interface,
so I was mandating it for all.
<hardy> exactly
<sannegrinovero> so is the plan now to add  Iterable<String>
getInputPropertyNames() only, in a new interface?
<hardy> and I am proposing separate interfaces
<hardy> +1 for separate interface
<sannegrinovero> "separate" you mean a subclass of the other
interface, or a top level one which just mentions this method only?
<hardy> top level
<sannegrinovero> great
<hardy> a trait
<sannegrinovero> so we can reuse the same trait for the implementors
of DynamicBoost, AnalyserDiscriminator, and whatever else will come,
correct?
<hardy> indeed
<hardy> are you working on this?
<sannegrinovero> nice, I think this settles #2 for the moment, we'll
send the log to emmanuel later to see if he has some energy left to
find what's wrong
<sannegrinovero> I think we're working on this right now :)
<hardy> what I meant is are you planing to implement it or have you
even started.
<hardy> Otherwise I might be able to have a go at it
<sannegrinovero> no I didn't start. In fact I think we shouldn't
untile 4.0.0.Final is published, since it seems we won't change any
API.
<sannegrinovero> your choice. It's always better to do these things
when we have a very clear idea, as we might have now, but really the
list of other issues is long, and we should check which ones can be
postponed and which ones can't.
<hardy> works for me
<sannegrinovero> so, let's get back to #1 ?
<hardy> I will have a look at the other Jira issues to see what's more important
<hardy> is this not settles as well then?
<sannegrinovero> yes I've created 4.0.0.Beta1 and Beta2, for super
urgent and less urgent, we have 4.0 as a general placeholder, and 4.1
for those which are definitely not so urgent.
<hardy> sweet
<sannegrinovero> "4.0 as a general placeholder" -- should read "stuff
which needs to be in 4.0 but is not a drama if we do it later in the
cycle"
<sannegrinovero> everything you spot not for 4.0 please get it out the way :)
<sannegrinovero> I'm not sure we decided the details for #1 ?
<sannegrinovero> Also with the initial proposal "Iterable<String>
getNeededFieldNames()" defined in an optional, new trait?
<sannegrinovero> I don't think, I really liked Emmanuel's idea to
alternatively provide a pattern for the needed fieldnames
<hardy> Also with the initial proposal "Iterable<String>
getNeededFieldNames()" defined in an optional, new trait? - that's
what I thought
<sannegrinovero> wait wait we're missing pieces we already discussed before
<sannegrinovero> for one, the instance need to be stateful and receive
the field name by an initialization
<sannegrinovero> or I won't be able to implement such a
getNeededFieldNames() method
<hardy> you could also pass the base field name to getNeededFieldNames()
<sannegrinovero> What about using the (stateful, initialized) bridge
instance as an implementor of org.apache.lucene.document.FieldSelector
?
<sannegrinovero> right, that's a simple option. But - generally -
doesn't it make more sense to store the LuceneOptions in the bridge?
<hardy> how do you make sure it is stored?
<sannegrinovero> sorry?
<hardy> FieldBridge is just an interface
<sannegrinovero> well our implementation would not be an interface of course :)
<hardy> say you add an intialize(String name, LuceneOption) and change
the set method
<sannegrinovero> yes
<hardy> I am thinking about custom bridges
<sannegrinovero> well custom bridges will have to keep both name and
LuceneOption as fields.
<hardy> right
<sannegrinovero> And then a subset of them could implement our magic
additional method to get improved performance
<sannegrinovero> This method could be either:
<sannegrinovero> a- Iterable<String> getNeededFieldNames()
<sannegrinovero> b- FieldSelectorResult accept(String fieldName);
<hardy> right
<sannegrinovero> MM no let me remove option b, since there are many
bridges involved (potentially), it would be very unsafe to expose
LOAD_AND_BREAK as a valid return, while we could figure that kind of
optimisations out with a)
<hardy> interesting idea. FieldSelectorResult is a Lucene class, right
<hardy> so let me sum this up
<hardy> for both cases #1 and #2 we introduce new interfaces. one
would return property names, the other field names
<sannegrinovero> yes but we should do it in two passes: collect all
rules together, then build an efficient "joint venture" FieldSelector
which applies the merged needs of each bridge
<hardy> for #2 that is enough
<sannegrinovero> right
<hardy> for #1 we also want to change the FieldBrdige contract an make
bridges stateful
<hardy> by adding an initialize method
<hardy> this will also make the set method a little simpler
<sannegrinovero> yes, so while we can wait with the "add a trait"
issue, we should make it stateful before 4.0
<hardy> #1 and #2 should be addressed in two separate issues
<hardy> and we should switch to the state fullness maybe even in a thirs issue
<sannegrinovero> actually more if we're interested in considering the
DynamicBosst & other tricky annotations
<hardy> and this should be done pretty much now
<sannegrinovero> yes
<hardy> are we still missing something?
<sannegrinovero> one last detail, maybe better to take decisions now
<sannegrinovero> since I removed my own proposal b-FieldSelectorResult
accept(String fieldName);
<sannegrinovero> that leaves a- Iterable<String> getNeededFieldNames()
<sannegrinovero> But I initially thought of b) as a way to implement
the pattern match strategy emmanuel suggested
<sannegrinovero> which is now left out
<sannegrinovero> unless those strings are interpreted as a
"startsWith" ? doesn't sound intuitive
<hardy> do you think this is a problem?
<sannegrinovero> well there are problem classes, like the map, for
which the FieldBridge might have added many fields all starting with a
common prefix, but having the exact field name unknown (like the key
conversion of the map)
<sannegrinovero> now assuming you have an entity with several maps,
and projecting one of them, you might end up loading much more fields
than needed
<hardy> instead of simple string we could return some object which
contains the string and how to treat it
<hardy> as in a prefix string or full match
<sannegrinovero> right
<hardy> that would solve also this usecase
<sannegrinovero> or we add this famed annotation on top to mark how
these strings should be interpreted
<hardy> or that
<sannegrinovero> ok, it seems there are enough solutions for that that
we can safely define which changes are next
<hardy> +1
<hardy> gtg now, dog needs to get out. if you come up with something
we forgot shoot me an email
<hardy> do you want to sort out Jira for that or shall I?
<sannegrinovero> I hope not :)
<sannegrinovero> sending this to the ML and Emmanuel, maybe he's
interested. tomorrow we'll open JIRAs.