Adding a discussion between Emmanuel Bernard and Hardy Ferentschik on IRC:

hardy: did you follow the metadata API changes?
[2:17pm] hardy: most is ready now. just needs to be merged in from my latest pull request
[2:18pm] hardy: https://github.com/hibernate/hibernate-search/pull/444
[2:18pm] jbossbot: git pull req [hibernate-search] (open) hferentschik HSEARCH-436 Part II - the public metadata api https://github.com/hibernate/hibernate-search/pull/444
[2:18pm] jbossbot: jira [HSEARCH-436] Redirected to: https://hibernate.atlassian.net/si/jira.issueviews:issue-xml/HSEARCH-436/HSEARCH-436.xml
[2:18pm] hardy: I would like to follow up with HSEARCH-904
[2:18pm] jbossbot: jira [HSEARCH-904] Redirected to: https://hibernate.atlassian.net/si/jira.issueviews:issue-xml/HSEARCH-904/HSEARCH-904.xml
[2:18pm] emmanuel: hardy: bout the template, I was about to point you to http://design.jboss.org
[2:18pm] emmanuel: but there does nto seem to beany generic template deck
[2:19pm] emmanuel: hardy: but I can send you the community deck I use
[2:19pm] hardy: ok
[2:19pm] emmanuel: hardy: about the metadata API I have not follow
[2:19pm] emmanuel: ed
[2:19pm] hardy: no problem
[2:19pm] hardy: the questions I have a mainly unrelated
[2:19pm] gmorling_ joined the chat room.
[2:20pm] hardy: just one class might be relevant
[2:20pm] hardy: one sec
[2:20pm] hardy: https://github.com/hferentschik/hibernate-search/blob/3aa3bdf166d13bf3c9933b3b70c88f79af87bc72/engine/src/main/java/org/hibernate/search/metadata/FieldDescriptor.java
[2:21pm] jbossbot: git [hibernate-search] 3aa3bdf.. Hardy Ferentschik HSEARCH-1355 Renaming EntityIndexBinder into EntityIndexBinding...
[2:21pm] jbossbot: jira [HSEARCH-1355] Redirected to: https://hibernate.atlassian.net/si/jira.issueviews:issue-xml/HSEARCH-1355/HSEARCH-1355.xml
[2:21pm] hardy: this is the public metadata interface for a single field
[2:21pm] hardy: thanks for the template 
[2:22pm] hardy: the FieldDescriptor is accessed via IndexedTypeDescriptor
[2:22pm] hardy: https://github.com/hferentschik/hibernate-search/tree/3aa3bdf166d13bf3c9933b3b70c88f79af87bc72/engine/src/main/java/org/hibernate/search/metadata
[2:22pm] hardy: so far, so good
[2:22pm] hardy: all is implemented
[2:22pm] gmorling left the chat room. (Ping timeout: 256 seconds)
[2:22pm] gmorling_ is now known as gmorling.
[2:22pm] hardy: what's missing is HSEARCH-904
[2:22pm] jbossbot: jira [HSEARCH-904] Redirected to: https://hibernate.atlassian.net/si/jira.issueviews:issue-xml/HSEARCH-904/HSEARCH-904.xml
[2:22pm] hardy: but I am not so sure how much this is needed still
[2:23pm] hardy: btw, just interrupt me if I "talk" too fast
[2:23pm] hardy: a lot of context
[2:23pm] emmanuel: IndexedTypeDescriptor is for what a property?
[2:23pm] emmanuel: or a Java type?
[2:23pm] hardy: #904 is about the possibility of Bridges to report which fields they are adding
[2:24pm] hardy: for the whole type
[2:24pm] hardy: mind you these classes are the public interfaces
[2:24pm] hardy: there is a set of internal metadata claases
[2:25pm] hardy: which contain the runtime configured metadata
[2:25pm] hardy: basically the parallel array thing refactored
[2:26pm] hardy: https://github.com/hibernate/hibernate-search/tree/master/engine/src/main/java/org/hibernate/search/engine/metadata/impl
[2:27pm] emmanuel: ok
[2:27pm] hardy: right now the public API reflects what we can know
[2:28pm] hardy: obviously there could be e.g. class bridges which add other fields we don't know about
[2:28pm] hardy: that's where it ties into HSEARCH-904
[2:28pm] jbossbot: jira [HSEARCH-904] Redirected to: https://hibernate.atlassian.net/si/jira.issueviews:issue-xml/HSEARCH-904/HSEARCH-904.xml
[2:28pm] hardy: if the bridges would report which fields they are adding, it could also be exposed in the metadata api
[2:29pm] hardy: and then there is the optimisation point of view for this issue
[2:29pm] hardy: but tbh I am not so sure how useful this information really would be for us in terms of further optimizations
[2:29pm] emmanuel: BTW for me to understand why do you need a FieldDescriptor.isId
[2:30pm] emmanuel: it's nto a notion present in Lucene AFAIR
[2:30pm] hardy: good point
[2:30pm] hardy: it is our document id
[2:30pm] hardy: I had one a todo item whether all fields should be returned
[2:30pm] hardy: including the id field
[2:31pm] hardy: sanne thought it would be good to return all
[2:31pm] emmanuel: I remember that todo / question
[2:31pm] hardy: right
[2:31pm] emmanuel: yes
[2:31pm] hardy: and the the isId is a way to determine wether it is the document id
[2:31pm] hardy: but I see why this is confiusung
[2:31pm] emmanuel: something surprises me a bit
[2:31pm] hardy: maybe it is not needed!?
[2:31pm] emmanuel: there is no notion of object and property then in this metamodel
[2:32pm] hardy: in the public api no
[2:32pm] emmanuel: ie you say I want index A
[2:32pm] emmanuel: and then you navigate the Lucene "schema"
[2:32pm] hardy: there was a todo for that as well
[2:32pm] hardy: I have the information
[2:32pm] emmanuel: but you don't publically link this schema to the object model
[2:32pm] emmanuel: ok
[2:32pm] hardy: the question is do we want to expose it
[2:33pm] hardy: the internal metamodel is "keyed" against properties
[2:33pm] emmanuel: is that useful still in this flat structure approach
[2:33pm] hardy: I guess it depends where we see the use cases for this public API
[2:33pm] emmanuel: I imagine it helps to write pure Lucene queries
[2:34pm] emmanuel: right
[2:34pm] hardy: for pure Lucene queries you are interested in actual field names
[2:34pm] emmanuel: it could very well be that we need both pure and the "keyed one" as you say
[2:34pm] hardy: it would also allow you to create some smart query parser which maybe suggests field names
[2:34pm] emmanuel: I know gmorling probably needs the keyed one
[2:34pm] emmanuel: when it does JP-QL to Lucene
[2:35pm] hardy: I guess each FieldDescriptor could have some sort of source
[2:35pm] hardy: or maybe SourceDescriptor
[2:35pm] emmanuel: hum not sure
[2:36pm] emmanuel: I mean you navigate the other way around
[2:36pm] hardy: do you?
[2:36pm] emmanuel: no from field to property when you build a query
[2:36pm] emmanuel: s/no/not/
[2:36pm] emmanuel: you wan to target property User.name
[2:36pm] emmanuel: and from there you want to know what's avaialble to you
[2:36pm] emmanuel: name, name_facet etc
[2:37pm] emmanuel: (as field)
[2:37pm] hardy: maybe
[2:37pm] hardy: this is definitely still open for discussion
[2:37pm] emmanuel: right
[2:38pm] emmanuel: it looks like as soon as sanne is back we should do an IRC meeting and discuss that
[2:38pm] emmanuel: mon or tues
[2:38pm] hardy: and how would be "reference" the properties? as java.lang.reflectMemnbers?
[2:38pm] hardy: +1
[2:38pm] emmanuel: hardy: ahh well that's the big question
[2:39pm] hardy: as you can see the internal metadata is basically structured the way you suggest
[2:39pm] emmanuel: I had in mind something like BV but I am biaised 
[2:39pm] hardy: but I had a more Document centric approach for the public api
[2:39pm] hardy: also in terms of say a Solr integration
[2:39pm] hardy: really interesting in this case are the actual fields
[2:40pm] emmanuel: depends what you cann Solr integration
[2:40pm] emmanuel: ah well yes in this case form field to "source" makes sense
[2:40pm] emmanuel: s/form/from/
[2:41pm] emmanuel: back to the original question, there are a few situations
[2:41pm] emmanuel: one where you generate a static set of fields but each one can have analyzer / store etc
[2:42pm] hardy: which is the "original" question?
[2:42pm] emmanuel: one where you generate the fields dynamically depending on the value or even the wheather
[2:42pm] emmanuel: in this case listing them in the metamodel is doomed
[2:42pm] hardy: sure, if the fields are that dynamic
[2:42pm] emmanuel: a map where keys represent the field name is  the example I have in mind
[2:42pm] hardy: but, often you know the field names
[2:43pm] emmanuel: yes in many cases it is the proposed field name indeed
[2:43pm] hardy: even if you add new ones
[2:43pm] hardy: you know it is for example
[2:43pm] emmanuel: a 1-1 binding between the fieldbridge and the Lucene field
[2:43pm] hardy: taxt_en, text_de, etc
[2:44pm] hardy: you cannot generate complete random names
[2:44pm] emmanuel: assuming you will never ever accept french, that's true 
[2:44pm] hardy: at some stage you need to target these fields in a search
[2:44pm] emmanuel: that's another category where you have a pattern
[2:44pm] emmanuel: text_*
[2:44pm] hardy: some sort of fixed list of pattern you need
[2:45pm] emmanuel: not sure we want to model this info though
[2:45pm] hardy: no, not me eitther
[2:45pm] emmanuel: anyways, it looks orthogonal enough to start the metamodel without it
[2:45pm] hardy: and yes, this would be a best effort approach to cover some of the cases which escape the current api
[2:45pm] hardy: +1
[2:46pm] hardy: there are a few more questions around this though
[2:46pm] hardy: e.g., the issue suggests something like
[2:46pm] hardy: public interface FieldNameReportingBridge {
[2:46pm] hardy: 	Iterable<String> getGeneratedFieldNames(String baseFieldName);
[2:46pm] hardy: }
[2:46pm] hardy: it only returns field names
[2:47pm] hardy: this is only sub optimal for us
[2:47pm] emmanuel: yes I see that now
[2:47pm] emmanuel: you lose the type of field etc
[2:47pm] hardy: to properly report the metadada we would also need the Lucene options
[2:47pm] hardy: right
[2:47pm] hardy: first I thought I can use LuceneOptions
[2:47pm] hardy: but this is not possible
[2:47pm] emmanuel: you should update the issue with your input
[2:48pm] hardy: in fact LuceneOption is a thorn in my eyes anyways
[2:48pm] emmanuel: I't more that we did not think much about it 
[2:48pm] hardy: will do
[2:48pm] hardy: it used to contains just options
[2:49pm] hardy: but it became then an actual interface with methods to implement, which is confusing given its name
[2:49pm] hardy: it would be nice to get rid of LuceneOptions in the bridge api
[2:49pm] hardy: that is probably only possible for SEARCH 5 so
[2:50pm] hardy: it might also solve Sanne's issues that we create LuceneOptions for each field
[2:50pm] hardy: including a new Lucene Field instance
[2:50pm] hardy: according to Sanne we should reuse the Fieldable instance
[2:50pm] hardy: again, that's on a tangent
[2:50pm] emmanuel: but LuceneOptions is used
[2:51pm] emmanuel: you would replace it with what?
[2:51pm] emmanuel: the raw Lucene calls are insane
[2:51pm] emmanuel: ah
[2:51pm] emmanuel: right the object reuse mandated by Lucene next 
[2:51pm] hardy: well, first we could do the initialise bridge apporach
[2:51pm] hardy: where we pass in the actual options (might be LuceneOption)
[2:52pm] hardy: making FieldBridges in fact stateful
[2:52pm] hardy: and then we offer a helper class to handle the Lucene calls
[2:52pm] emmanuel: but my list of fields might be dynamic
[2:52pm] emmanuel: so passing the Fieldable won't be enough for example
[2:52pm] hardy: you are adding fields via this helper
[2:52pm] emmanuel: I need to be able to create new ones in *some* cases
[2:53pm] hardy: or we provide a IndexConext
[2:53pm] hardy: sure
[2:53pm] hardy: I get that
[2:53pm] hardy: my problem is with the name of the class and that it combines two distinct things
[2:53pm] hardy: also if you look into the implementation, we could have provided the same functionality in a static helper class
[2:53pm] hardy: leaving LuceneOptions as it was
[2:54pm] emmanuel: yes it was for the sake of not breaking things
[2:54pm] hardy: right
[2:54pm] hardy: and it was sub-optimal form the beginning
[2:54pm] hardy: I think it is time to rectify that
[2:55pm] hardy: and as said, using stateful bridges might halve with this Lucene rubber stamp apporach
[2:55pm] hardy: and whether we offer a static helper class or an IndexContext which is passed to the set method is a different isuee
[2:55pm] hardy: I think we need to also discuss this with Sanne
[2:55pm] emmanuel: it is very tied with the Lucene 4 migration
[2:56pm] hardy: he also has the best idea on where is is going in relation to Lucene 4
[2:56pm] hardy: +1
[2:56pm] emmanuel: and if we have to break things anyways, yes that looks like a reasonable option
[2:56pm] hardy: so, back to the issue
[2:56pm] emmanuel: as long as we offer some migration tips
[2:56pm] hardy: I cannot just return strings
[2:56pm] hardy: LuceneOptions is not viable either
[2:57pm] hardy: then I was wondering whether I could return FieldDescriptors
[2:57pm] hardy: you would do something like
[2:57pm] hardy: Set<FieldDescriptor> getGeneratedFields(FieldDescriptor baseFieldDescriptor);
[2:58pm] hardy: you pass the FieldDescriptor as you generate it just form annotations (with the base name and the appropriate options)
[2:59pm] hardy: for our built-in bridges and for many custom bridges you would then just stick this descriptor into a set and return it
[2:59pm] hardy: but you can also create new ones depending on what your bridge does
[2:59pm] emmanuel: hardy: you know what is sad. With a high enough API (say the action moethds of LuceneOptions), we know what the bridge creates (name, type whether it is stored etc)
[2:59pm] hardy: to generate new ones we could offer a FieldDescriptorFactyory
[3:00pm] emmanuel: and I always found it sad that we had to ask this data again statically with getGeneratedFields
[3:00pm] emmanuel: but we do need this data outside the actual field creation
[3:01pm] emmanuel: so the extra method looks necessary
[3:01pm] hardy: right
[3:01pm] hardy: but what do you think about the FieldDescriptor thingy
[3:01pm] emmanuel: your proposal makes sense
[3:02pm] hardy: what feels strange to me is to use a metadata class here
[3:02pm] emmanuel: returning an empty set might mean dymanic
[3:02pm] hardy: right
[3:02pm] emmanuel: or maybe null whatever
[3:02pm] emmanuel: and the default impl would return a Set with basedFieldDescriptor
[3:02pm] hardy: right
[3:03pm] hardy: there is one thing which I am not so happy about so
[3:03pm] hardy: right now, the FieldDescriptor contains getFieldBridge and getAnalyzer
[3:04pm] hardy: the rest of the information is based on the name and the lucene options
[3:04pm] hardy: not sure what to do with the other two
[3:04pm] hardy: and whether we really need them in the public API
[3:04pm] hardy: I guess getFieldBridge could returns "this"
[3:05pm] hardy: if you really create new FieldDescriptors in your custom bridge, the bridge doing so is of course known
[3:05pm] emmanuel: today can FB use a custom analyzer?
[3:05pm] hardy: right analysers are the problem
[3:05pm] emmanuel: in a way you have two notions
[3:05pm] hardy: there is no link between them and the bridge
[3:05pm] emmanuel: FieldInfoDescriptor
[3:05pm] emmanuel: and FieldDescriptor
[3:06pm] hardy: even though we had feature requests to access analysers in field bridges
[3:06pm] emmanuel: the former does nto have analyzer and fieldBridge
[3:06pm] hardy: right, that was a thought of mine as well
[3:06pm] hardy: one could split up FieldDescriptor
[3:06pm] hardy: into pure Lucene Document field info and the rest
[3:06pm] emmanuel: split or superclassing
[3:07pm] hardy: the field budge method would then return a set of FieldInfoDescriptors
[3:07pm] hardy: sure
[3:07pm] hardy: actually the more I think about it, the more i Iike it
[3:08pm] hardy: hmm, I got some more ideas now
[3:08pm] hardy: to sum things up a little
[3:09pm] hardy: #1 we need to discuss whether the public metadata api should expose property data (aka which property creates the field)
[3:10pm] hardy: #2 if so, we need to decide how to add this information to the APi. Use a PropertyDescriptor (having a name and access type I guess)? Where to add the info (as part of the FieldDescriptor or more type centric where you navigate type -> property--> field)
[3:11pm] hardy: ## Regarding #904 returning just a set of field names is not sufficient. We really need also the Lucene specific options, aka a et of FieldDescriptors
[3:12pm] hardy: #4 FieldDescriptor should potentially be split up into FieldInfoDecriptor and "rest"
[3:13pm] emmanuel: back (got caught by the mkt)
[3:13pm] hardy: #5 LuceneOptions is sub-optimal and we might consider removing it for Search 5. Maybe making bridges stateful!? Need to discuss with Sanne regarding the new Lucene 4 way of creating fields
[3:14pm] emmanuel: good sum up
[3:14pm] emmanuel: about 5
[3:14pm] emmanuel: what's suboptimal really is the non action methods right?
[3:14pm] emmanuel: ie the state like Compress and co
[3:14pm] hardy: right, the mixing of the two
[3:15pm] emmanuel: then we agree
[3:15pm] hardy: if you remove these it is the name which bugs me
[3:15pm] emmanuel: and yes 5b. consider making bridges stateful to reuse instances ala lunce e4
[3:15pm] hardy: then we have two methods addFieldToDocument and addNumericFieldToDocument
[3:15pm] hardy: but the class is called LuceneOptions
[3:15pm] emmanuel: ok
[3:15pm] hardy: in this case we need at least a rename to IndexContext
[3:16pm] hardy: indexContext#addFieldToDocument makes so much more sense
[3:16pm] hardy: luceneOptionst#addFieldToDocument just keeps you windering
[3:16pm] hardy: how can options do anything
[3:17pm] hardy: anyways, thanks for the discussion
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira