[hibernate-dev] The problem of cascading database reads

Fri Nov 23 04:44:39 EST 2012

Thanks Mark for this feedback.
That's a very long multi-proposal email :) We probably should try and
split it into several elements. Let me continue in line.

On Thu 2012-11-22 20:10, Marc Schipperheyn wrote:
> Network
> @OneToMany
> @IndexedEmbedded(includePaths={"id"})
>  List<User> users;
> 
> When a new User is added to the Network, all the existing Users have to be
> read from the database to recreate the Lucene document.
> 
> Another headache example is when a stored property that is used for
> selection purposes changes
> 
> LinkedInGroup
>  @Field(index=Index.YES)
> boolean hidden;
> 
> @OneToMany
>  @ContainedIn
> List<LinkedInGroupPost> posts;
>  @OneToMany
> @IndexedEmbedded(includePaths={"id"})
> Set<Tag> tags
>  LinkedInGroupPost
> 
> @ManyToOne
> @IndexedEmbedded(includePaths={"id","hidden"})
>  Group group;
> Assuming there can be hundreds of thousands of Posts, a change of hidden to
> true would trigger a read of all those records.

One approach that might work much better in this case is to use filters
rather than indexing hidden and using it in the query as restriction. I
imagine hidden is not selective enough which does not make for the best
use of an inverted index.

> * In the Network example, the includedPaths only contains the id. Looking
> at my own work, I often find that IndexedEmbedded references just stores
> the id and I believe we should think about optimizing this use case. In
> that case an optimized read from the database could be executed that just
> reads that value in stead of initializing the entire entity.

I forgot what we said around includePaths and class level bridges but
that looks like a good idea. We might be able to look at the paths and
check if any of them contains an association. If not, we could use a
projection to query the meaningful data. That's not at all how Hibernate
Search works today so I imagine that could be a significant work but
this does not look impossible.

Can you open a JIRA issue for this.

> 
> This kind of "projection read" could be an optional setting even when
> includePaths contains non identifier values, assuming the developer knows
> which limitations this might entail (e.g. no FieldBridges, no Hibernate

Why do you say no FieldBridge?

> * Lucene Document update support is at an alpha stage right now
> LUCENE-3837. This effort could be supported by the Hibernate team or
> implemented at the earliest viable moment.

We are keeping an eye on it. Lucene 4 is a major departure from Lucene 3
so the conversion won't be easy and worse won't be fully transparent for
Hibernate Search users unfortunately.

> * A kind of JoinFilter is conceivable where the join filter would be able
> to exclude results based on selection results from another index.
> E.g. one queries the LinkedInGroupPost but the JoinFilter reads
> group.idreferences from the Group index (just reading the ones needed
> and storing
> them during the read) and excludes LinkedInGroupPosts based on the value of
> "hidden". I wonder if this approach could patterned or documented.

I am pretty sure fitlers is what you are looking at
http://docs.jboss.org/hibernate/search/4.2/reference/en-US/html_single/#query-filter

> 
> * The documentation could contain some suggestions for dealing with the
> issue of cascading initialization and how to deal with this in a smart way.

Sure, let's identify what we consider smart and update the doc.
Can you create a JIRA issue for that?

> * In the tests I have done, saving a LinkedInPostGroup where the
> indexedEmbedded attributes (id,hidden) are *not* dirty, all the posts are
> reinitialized anyway. The reason for this is that with a Set<Tag> the set
> elements are deleted and reinserted on each save even when they haven't
> changed. 

Hum, I believe it's true for the bag semantic but I'm surprised it's
true for Set. Besides, from what you are saying, you don't add nor
remove elements from the Set, you just change some non id value.

> It looks like Hibernate Search is not optimized to deal with this
> "semi-dirty" situation (Hibernate ORM treats a field as dirty when it
> really isn't). Nothing really changed in the relevant values for the
> document but because Hibernate needs to reinsert the set, it thinks so. I
> wonder if this use case can or should be optimized. If not, documentation
> should warn against using Sets.

Can you create a minimal test case and open a JIRA / pull request, this
needs to be investigated.

> 
> * When a document is recreated because one attribute is changed leading to
> all sorts of cascading database reads I often wonder: why? The reason is
> that the Index segments cannot e recreated for the indexed attributes. So
> we need to read them again. But what if those attributes are actually
> Stored in the original document and not dirty? Why not just read these
> values straight from the document with a single read in stead of executing
> a slew of database reads?

That might be true in some situations but FieldBridges are not
guaranteed to be non destructive in their stored data. So we cannot
generalize that necessarily.
We could explore this idea in a prototype. Again can you open a JIRA
issue?

Emmanuel