[hibernate-issues] [Hibernate-JIRA] Commented: (HHH-5300) Configurable QueryPlanCache reference counts (Manuel Dominguez Sarmiento)

Fri Jan 21 19:00:05 EST 2011

    [ http://opensource.atlassian.com/projects/hibernate/browse/HHH-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=39742#action_39742 ] 

Nick Hristov commented on HHH-5300:
-----------------------------------

We also have this problem. In our case we have a huge portion of the server memory taken up by this cache ~ 77%, yet most of our queries are parameterized. With a very large heap, servers go down for a while in order to clean up all soft references.

How does this happen? Well, several issues are present:

 - first, queries which contain parameter as a collection get expanded and that resulting expanded query is the key. For example, if we have the following query:

SELECT {u.*} FROM users u WHERE userid IN (:userIds)

it will get expanded by hibernate into (if we give two elements in the set to):

SELECT {u.*} FROM users u WHERE userid IN (:userId0_, :userId1_)

Caching happens against the second query.

 - the second, and more severe problem in my opinion is that NativeSQLQuerySpecification uses the queryReturns as a part of the hashCode() and equals() method... now whether that should be so, that something that you need to decide. But the underlying array objects are NativeSQLQueryRootReturn which does not implement equals() and hashCode()... which means that every single time we get a cache miss.

> Configurable QueryPlanCache reference counts (Manuel Dominguez Sarmiento)
> -------------------------------------------------------------------------
>
>                 Key: HHH-5300
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HHH-5300
>             Project: Hibernate Core
>          Issue Type: Patch
>          Components: core
>    Affects Versions: 3.5.0-Final, 3.5.1, 3.5.2
>         Environment: N/A
>            Reporter: Manuel Dominguez Sarmiento
>            Assignee: Steve Ebersole
>             Fix For: 3.6.0.Beta3
>
>         Attachments: diffs.zip, Environment.java, LRUMap.java, QueryPlanCache.java, SessionFactoryImpl.java, SimpleMRUCache.java, SoftLimitMRUCache.java
>
>   Original Estimate: 2h
>          Time Spent: 53m
>  Remaining Estimate: 1h 7m
>
> Some of our production servers (Hibernate-based apps) have been hanging on full GC sporadically, most of the time after running normally for a few days, sometimes even over a week.
> We suspected a memory leak. We used the Eclipse MAT tool to analyze a live heap dump, and found that most of the heap was being used by QueryPlanCache, most specifically, by the soft references held by SoftLimitMRUCache.
> We use very large heaps (up to 30 GB in some cases). Since memory is plentiful and the SoftLimitMRUCache is unbounded, the heap eventually fills up until a major stop-the-world GC is necessary to cleanse the SoftLimitMRUCache soft references. We performed several live tests configuring the Concurrent-Mark-Sweep (CMS) GC collector in order to avoid the full GC caused by concurrent mode failures. We played around with the following settings available in the Sun JVM:
> -XX:+CMSIncrementalMode
> -XX:+CMSIncrementalPacing
> -XX:CMSIncrementalDutyCycle=<PCT>
> -XX:CMSIncrementalDutyCycleMin=<PCT>
> -XX:CMSInitiatingOccupancyFraction=<PCT>
> -XX:CMSMarkStackSize=<SIZE>
> -XX:CMSMarkStackSizeMax=<SIZE>
> -XX:SoftRefLRUPolicyMSPerMB=<MSECS>
> -XX:+ParallelRefProcEnabled
> Most of these options helped somewhat by allowing soft-reference GC to be performed in parallel, ahead of time before tipping the scales and requiring full GC. However, this did not avoid all problems, and the servers still periodically hang upon concurrent mode failures. These are high-load web servers which process hundreds of hits per second, so full GC is disastrous as garbage cannot be collected fast enough. Full GC would sometimes take over 15 minutes, sometimes it would not even finish doing its job requiring a manual app restart.
> Before anyone cries out "well, it's probably the application's fault, why do you have so many different queries? Aren't you using parameterized queries / PreparedStatements?" - the application does in fact produce many, many different queries, but most of them are not reused. Even if the QueryPlanCache is highly effective, most of the absolute number of queries are issued only once. Some use cases are the following:
> - Our system allows ad-hoc reporting and searching capabilities. Each query is typically issued once and never reused.
> - Many of our parameterized queries use IN clauses with variable-length collection/array parameters. I'm unsure whether this affects the cache hit ratio for HQL query plans, but it sure affects native SQL queries, since multiple collection lengths means multiple "?" in the query string, and thus, multiple similar, if slightly different queries polluting the corresponding plan cache.
> I'm sure there are other legitimate cases in which an unbounded query plan cache is also a problem, I'm just enumerating the ones we've run into.
> This issue is not new. The same problem is described, in one way or another, by HHH-2431, HHH-3191, and HHH-4627. I created a new issue since we've produced a working patch and it would be made more visible, rather than as comments in the previous ones.
> The solution involves giving up the unbounded soft-reference-based cache. We introduced two new configuration options:
> - hibernate.query.plan_cache_max_strong_references -> defaults to 128
> - hibernate.query.plan_cache_max_soft_references -> defaults to 2048
> Entries are evicted using an LRU policy, or by memory pressure from the GC in the case of soft references. We used 2048 as a default for the soft size since it seems pretty reasonable, but of course it can be tuned to suit the user's needs. Users looking to emulate previous behaviour (we don't see the point, but who knows ...) can set this option to Integer.MAX_VALUE
> The patch introduces no new dependencies. It uses Apache Commons Collections' LRUMap (just as the released version does), and does away with the ReferenceMap (which does not support LRU eviction) in order to manage soft references manually on top of an LRUMap.
> We see this issue as a top priority and should be applied to the trunk ASAP. SoftLimitMRUCache has seen the most "radical" changes. SimpleMRUCache, QueryPlanCache and Environment only contain minor changes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira