[hibernate-dev] Re: About HSEARCH-310

Wed Jan 14 09:57:22 EST 2009

On  Jan 13, 2009, at 18:35, Sanne Grinovero wrote:

> thank you, you helped me a lot; still I would like to propose an
> evolution to cover all use cases correctly.
> I've already implemented and tested this, in roughly three steps:
>
> 1)
> when moving the cursor outside of the loaded window boundaries I
> extract the new EntityInfos from the index, then verify if any
> matching entity is already contained in the persistence context: if it
> is, I discard the EntityInfo and I "batch load" the entities
> identified with the remaining EntityInfos
> ( I am doing this by building the EntityKey to look at
> session.getPersistenceContext().containsEntity( key ) ).

>
>
> 2)
> Immediately after batch loading, I evict them all right away and
> discard the references:
> this may look useless but the goal was to put data in the second level
> cache, useful for later and possibly living in a more efficient data
> structure than a simple soft map.

In a scrollable usage, the 2nd level cache is typically a bad idea (ie  
we want to load more that what's available in memory). Plus you don't  
know if the entity will be marked for caching.

>
> Please note that I'm evicting only those objects which were not
> attached before the loading, so nothing changed in the persistence
> context from a client-code-point-of-view.
> The original implementation also did a one-by-one load on each entity
> as a last initialization step; this is
> not needed here as I use load on each entity later on, when I'm sure
> the clients actually needs that data.
>
> 3)
> After that when a get() is called I use load( entityInfo ) to get a
> managed instance to return to the user.
>
> The end result is:
> * I don't put stuff in the persistence context the user doesn't expect

well you do if the EVICT cascade policy is different that the EAGER  
fetching policy

>
> * I don't evict stuff that was attached before

well you do if the EVICT cascade policy is different that the EAGER  
feching policy

>
> * no memory leaks
>
> it works correctly even without a second level cache, but of course
> you only get good performance when using one.

I really don't like this idea. I would rather get the 80% scenario  
work good (as described in the previous email) and not rely on the 2nd  
level cache.
You can attach the patch in JIRA but I think we should go for the  
previous version.

>
>
> Do you think this looks reasonable? Besides the EntityKey creation the
> code is much simplified, and the lines of code are reduced.
> Would you prefer me to keep a soft reference to the loaded entities to
> re-attach to the session somehow?
> I'll prefer to avoid the soft map for the entities: IMHO having an
> additional (inefficient) caching layer on top of object loading, which
> has probably lots of other caching layers, is just making the code
> more complex.
>
> Sanne
>
> 2009/1/13 Emmanuel Bernard <emmanuel at hibernate.org>:
>> Clearly we cannot solve all use cases, so let's focus on the most  
>> useful
>> one.
>> People use scrollable resultset to walk through all objects in a  
>> window one
>> after the other wo skipping them.
>>
>> while( rs.hasNext() ) {
>> Address a = (Address) rs.get(0)[0];
>> doSomething(a);
>> if (i++ % 100 == 0) s.clear();
>> }
>>
>> So we should consider that:
>> - eviction is done by the user
>> - eviction can happen behind the impl's back
>>
>> =>
>> place objects and EntityInfo in a Soft reference (if we need to keep
>> EntityInfo, don't remember).
>> when we give an entity, make sure session.contains(entity) == true.
>> Otherwise, discard the detached one and reload it.
>>
>> That should help for most cases.
>>
>>
>> On  Jan 11, 2009, at 18:21, Sanne Grinovero wrote:
>>
>>> Hello,
>>> I'm in need of some help about how to fix this memory leak at  
>>> HSEARCH-310
>>> in ScrollableResultsImpl of Search, there currently is:
>>>
>>> * an internal cache of all loaded EntityInfo
>>> * an internal cache of all loaded managed Entities
>>>
>>> Both caches are never emptied during the scrolling; the  
>>> implementation
>>> is loading a batch-sized list each time an entity is requested which
>>> was not loaded before and adds new objects to the caches.
>>>
>>> We had said to use some Soft ReferenceMap to fix this, but  
>>> actually we
>>> should evict the loaded entities which where not requested from the
>>> client code, or I'll still be leaking memory in Hibernate's
>>> persistence context.
>>>
>>> I expect the most frequent use case for a ScrollableResults should  
>>> be
>>> to iterate at will and periodically evict objects in the client  
>>> code;
>>> Still the client could skip some entities we loaded by batching.
>>> At first I thought to keep track of which references were used and
>>> evict the other ones, but this isn't a good solution as the client
>>> could be loading some entities by other means (i.e. another query)  
>>> and
>>> won't be happy if I'm evicting some random entities: I don't have
>>> enough information to know for sure what to evict and what is  
>>> going to
>>> be used later.
>>>
>>> Also a bug, consequence of current implementation, is that if you  
>>> get
>>> an object, then you evict it, and then try scrollable.get() you  
>>> don't
>>> get an attached object but again the evicted instance; as a client  
>>> of
>>> the Hibernate library I expect all objects I ask for to be  
>>> attached by
>>> default.
>>>
>>> My idea to fix this should be something like:
>>> 1) load the batch
>>> 2) evict them all
>>> 3) and then load one by one as requested, relying on the second  
>>> level
>>> cache (I suppose point 1 should have warmed it)
>>>
>>> But this has other flaws.. like I should not evict (2) those objects
>>> already existing in the persistence context.
>>> Ideally I should avoid to preload objects already present in the
>>> persistence context; I am considering to look in the
>>> session.persistenceContext.entitiesByKey map to only load+evict by
>>> batch those entities not already managed.
>>>
>>> Is there some way to populate the second level cache skipping the
>>> actual load in the persistence context?
>>> Should I keep my own "soft map cache" in the scroll implementation
>>> anyway, and somehow detach and reattach them as needed?
>>> Am I missing some trivial better solution, or is my solution not  
>>> going
>>> to work at all?
>>>
>>> thanks,
>>> Sanne
>>
>>