Hey Gail,
comments inline..
Am 15.03.2017 um 23:21 schrieb Gail Badner:
Hi Christian,
More comments below...
On Fri, Mar 3, 2017 at 4:38 PM, Christian Beikov
<christian.beikov(a)gmail.com <mailto:christian.beikov@gmail.com>> wrote:
Thanks for the comments, I have updated the PR.
I think that it is important to have a complete fix and I am
already working on that, but this issue was specifically about the
L2 cache, so I wanted to keep that stuff separate.
Here are some other polymorphism related issues:
*
https://hibernate.atlassian.net/browse/HHH-9845
<
https://hibernate.atlassian.net/browse/HHH-9845>
*
https://hibernate.atlassian.net/browse/HHH-11280
<
https://hibernate.atlassian.net/browse/HHH-11280>
*
https://hibernate.atlassian.net/browse/HHH-4742
<
https://hibernate.atlassian.net/browse/HHH-4742>
*
https://hibernate.atlassian.net/browse/HHH-2927
<
https://hibernate.atlassian.net/browse/HHH-2927>
Thanks for looking into this. It is definitely and area that
can/should be improved.
The fix provided by Josh Landin for HHH-11280 would break
reference equality and I think that is something we all want to
keep. Which brings us to the actual problems.
I agree. We definitely do not want to break reference equality.
1. Whenever we encounter an association that is of a polymorphic
entity type, we should actually internally, regardless of what the
user configures, always fetch that association with subtypes.
Otherwise we could run into that same problem again, that we
"pollute" the L1 cache with a proxy of the wrong type. Since we
want to keep reference equality, we can't replace that instance.
IIUC, this is only a problem if the association is with a polymorphic
entity Class that has a subclass. Since a fix would probably affect
performance, I think it would be best to only deal with this
particular case. Maybe you are saying the same thing; I just wanted to
clarify.
Yep saying the same thing.
What you are suggesting is to eagerly fetch the association. IIUC, a
workaround is to map such associations as eager. That way a proxy
would not be created, avoiding the problem altogether.
Correct.
IMO, if the entity is mapped as lazy, it really should be
lazily-loaded. That said, I think that it would be worthwhile for
Hibernate to determine the concrete subclass for the associated entity
and create the appropriate proxy. I can think of some alternatives,
but I think they would all affect performance. Maybe there should be a
property for indicating the strategy to use. Some strategies that come
to mind are:
1) by default, create an uninitialized proxy using the type indicated
in the mapping (as is done currently);
2) try loading from second-level cache (as done in your fix for
HHH-10162); if the entity is not found in the cache, then execute a
query to determine the associated entity's concrete subclass;
3) add a join to the original query to determine the associated
entity's concrete subclass, and create the appropriate uninitialized
proxy.
I roughly thought of the same strategies, but I'd like to propose a 1b)
strategy that would also check the L2 cache in case of a polymorphic
association. So maybe the config for whether a L2 lookup should be done
should be a "separate" and independent config.
You mentioned that you are working on something. Is tihs along the
lines of what you are thinking? Did you have other ideas?
Just what I wrote up so
far.
2. When using bytecode enhancement, the loading of *ToOne
instances could be deferred to the first access.
Yes, this is another workaround.
I wouldn't say workaround but performance
improvement :)
3. Actually it would also be possible if property access is used
without bytecode enhancement, but that would require to use a
special proxy for every object that might contain a
non-initialized polymorphic association. The speciality about it
is that it loads the association on first access.
I'm not sure I'm following this.
When querying for an entity A like
"FROM A", currently we always return
A instances although we could return special proxy A instances. The
special thing about the proxy is, that it has all state initialized
except for polymorphic associations. These associations are only loaded
from the DB when accessing them through their respective property
accessors. Note that this proxy is also something that we can use for
optional "toParent" one-to-one associations.
The only proxies that we have right now are for uninitialized
associations which trigger loading the full entity on non-id property
access. Either we extend the existing proxy implementation by the
explained functionality, or we create a separate proxy type.
4. Another thing that we should consider is "merging" of fetched
state into existing instances of the L1 cache. Imagine a user
loads an entity X and some of it's associations are lazy loaded.
Then the user sets of a query that would fetch that association.
Currently the associations will be left untouched because we skip
any further processing as soon as we encounter the instance for X
in the L1 cache. What we should actually do is, merge any fetched
state into the instance if possible.
+1; I think this is a good improvement.
Note that this improvement kind of requires
that there are no wrongly
typed proxy instance in the L1 cache. We could actually also "just"
merge the state that we can i.e. ignoring subtype state.
Since 1. will probably affect performance, it might only make
sense to do it along with 2. or 3.
Regardless of these improvements, I think we should apply the L2
cache fix and handle the rest later.
What do you say?
I think it would be better to come up with a hibernate property and
strategies, and allow an application to opt into a strategy, rather
than making a change that can affect performance by default.
I agree that users
should have the possibility to configure this. The
question is, should this be a global config or per-entity or maybe even
per-association? When all the performance optimizations are implemented
I see no reason for adding a per-entity or per-association config, a
global default should be enough. After all, the additional query for
loading the subtype will only happen when a user tries to get his hands
on the instance.
What do you say, how should we proceed? I'd start by creating/collecting
JIRAs for the issues and posting them here so you can track them.
Next I'd like to improve the proxy implementation as described in 3) as
that would improve performance in other scenarios too.
Thought? Feedback?