I'm sorry I now realize I made it hard to understand as I
didn't
actually explain how I mean this to be recursive.
First, note that another frequently requested feature is to be able to
add some fields to a Document *in addition* to what we would normally
do.
Today this flexibility is granted by defining a class level bridge,
but by doing so this disables processing of @Field annotations, so
people can't decorate but have to hardcode the full transformation in
their bridge.
How does a class bridge "disables processing of @Field annotations".
To solve that - and the design mentioned above - I was thinking of
compositing bridges recursively to match the metadata.
"compositing bridges recursively to match the metadata"? Even by
replacing compositing with composing I am not quite sure what you are after.
To express that in pseudo-functional code, an entity:
@ClassBridge(impl=CustomAnimals.class)
@Indexed
class Animal {
@Id long id;
@IndexedEmbedded Color skinColor;
@Field name;
}
would generate a reusable transformation function which gets
associated to the Animal.class
class ObjectToDocument {
Document transform(Entity e);
}
[not exactly like that, bear with me a moment]
and how would the user hook in there?
IIRC some users just wanted the ability to get hold of the Lucene document before it gets
indexed.
I think we can make this happen without changing anything around the bridges etc.
We just need to hook somewhere into DocumentBuilderIndexedEntity#getDocument
I guess by now you start seeing the problem of defining the exact
signature of such composite transformation blocks:
I mentioned Visitor in my first email as I think it could help, but it
doesn't have to be strictly a Visitor.
Sorry, I still don't fully understand. That's not to say that I am against a new
way of getting from entity
to document, but I don't see what of the things you mentioned is not possible today.
The problem is that we will want to navigate the internal metadata
for
different purposes, as I had outlined in the next paragraph; generally
a Visitor allows to decouple the metadata graph from the purpose,
while also preserving a good level of typesafety and performance:
let's not forget this is one of the hottest areas of the Search
codebase (CPU wise), and at the same time the place where we trigger
the more important optimisations, like opportunities to skip network
operations or disk IO.
Fair enough. However, imo the visitor pattern adds also quite some complexity
and becomes useful where you have to an object structure with many different
types. In our case the metadata for a single indexed type is quite "simple".
>> - We need a reliable way to track which field names are
created, and
>> from which bridge they are originating (including custom bridges:
>> HSEARCH-904)
I am working on this. As mentioned before, my idea for this was to add another
interface a bridge can implement. This interface reports the required meta information
(the fields which are getting added plus maybe their Lucene index settings). For the
built-in bridges we implement this interface. Implementors of custom bridges will need to
implement this interface.
>> - If we could know in advance which properties of the
entities need
>> to be initialized for a complete Document to be created we could
>> generate more efficient queries at entity initialization time, or at
>> MassIndexing select time. I think users really would expect such a
>> clever integration with ORM (HSEARCH-1235)
But this is all a question on which metadata we collect not on how we process it.
Seems in the list above I forgot my favorite one: dump the metadata
as
simple text on bootup; this should greatly simplify query writing, and
doesn't need to open the index with Luke to figure out the field names
/ options.
+1 I like this idea. Maybe instead of dumping the internal metadata we should dump
the public metadata information once I am finished with HSEARCH-436
--Hardy