]
Manuel Dominguez Sarmiento commented on HHH-5300:
-------------------------------------------------
No problem, Steve. The file is just a ZIP because it packs together all the diffs plus two
full source files (one is a new LRUMap, the other is SoftLimitMRUCache.java which has
changed so much from the current version that the diff wouldn't make much sense
anyway), and also so that the diffs don't get mixed up with the full source files I
had posted before.
Configurable strong and soft reference QueryPlanCache sizes
-----------------------------------------------------------
Key: HHH-5300
URL:
http://opensource.atlassian.com/projects/hibernate/browse/HHH-5300
Project: Hibernate Core
Issue Type: Patch
Components: core
Affects Versions: 3.5.0-Final, 3.5.1, 3.5.2
Environment: N/A
Reporter: Manuel Dominguez Sarmiento
Fix For: 3.6.0.Beta3
Attachments: diffs.zip, Environment.java, LRUMap.java, QueryPlanCache.java,
SessionFactoryImpl.java, SimpleMRUCache.java, SoftLimitMRUCache.java
Original Estimate: 2h
Remaining Estimate: 2h
Some of our production servers (Hibernate-based apps) have been hanging on full GC
sporadically, most of the time after running normally for a few days, sometimes even over
a week.
We suspected a memory leak. We used the Eclipse MAT tool to analyze a live heap dump, and
found that most of the heap was being used by QueryPlanCache, most specifically, by the
soft references held by SoftLimitMRUCache.
We use very large heaps (up to 30 GB in some cases). Since memory is plentiful and the
SoftLimitMRUCache is unbounded, the heap eventually fills up until a major stop-the-world
GC is necessary to cleanse the SoftLimitMRUCache soft references. We performed several
live tests configuring the Concurrent-Mark-Sweep (CMS) GC collector in order to avoid the
full GC caused by concurrent mode failures. We played around with the following settings
available in the Sun JVM:
-XX:+CMSIncrementalMode
-XX:+CMSIncrementalPacing
-XX:CMSIncrementalDutyCycle=<PCT>
-XX:CMSIncrementalDutyCycleMin=<PCT>
-XX:CMSInitiatingOccupancyFraction=<PCT>
-XX:CMSMarkStackSize=<SIZE>
-XX:CMSMarkStackSizeMax=<SIZE>
-XX:SoftRefLRUPolicyMSPerMB=<MSECS>
-XX:+ParallelRefProcEnabled
Most of these options helped somewhat by allowing soft-reference GC to be performed in
parallel, ahead of time before tipping the scales and requiring full GC. However, this did
not avoid all problems, and the servers still periodically hang upon concurrent mode
failures. These are high-load web servers which process hundreds of hits per second, so
full GC is disastrous as garbage cannot be collected fast enough. Full GC would sometimes
take over 15 minutes, sometimes it would not even finish doing its job requiring a manual
app restart.
Before anyone cries out "well, it's probably the application's fault, why do
you have so many different queries? Aren't you using parameterized queries /
PreparedStatements?" - the application does in fact produce many, many different
queries, but most of them are not reused. Even if the QueryPlanCache is highly effective,
most of the absolute number of queries are issued only once. Some use cases are the
following:
- Our system allows ad-hoc reporting and searching capabilities. Each query is typically
issued once and never reused.
- Many of our parameterized queries use IN clauses with variable-length collection/array
parameters. I'm unsure whether this affects the cache hit ratio for HQL query plans,
but it sure affects native SQL queries, since multiple collection lengths means multiple
"?" in the query string, and thus, multiple similar, if slightly different
queries polluting the corresponding plan cache.
I'm sure there are other legitimate cases in which an unbounded query plan cache is
also a problem, I'm just enumerating the ones we've run into.
This issue is not new. The same problem is described, in one way or another, by HHH-2431,
HHH-3191, and HHH-4627. I created a new issue since we've produced a working patch and
it would be made more visible, rather than as comments in the previous ones.
The solution involves giving up the unbounded soft-reference-based cache. We introduced
two new configuration options:
- hibernate.query.plan_cache_max_strong_references -> defaults to 128
- hibernate.query.plan_cache_max_soft_references -> defaults to 2048
Entries are evicted using an LRU policy, or by memory pressure from the GC in the case of
soft references. We used 2048 as a default for the soft size since it seems pretty
reasonable, but of course it can be tuned to suit the user's needs. Users looking to
emulate previous behaviour (we don't see the point, but who knows ...) can set this
option to Integer.MAX_VALUE
The patch introduces no new dependencies. It uses Apache Commons Collections' LRUMap
(just as the released version does), and does away with the ReferenceMap (which does not
support LRU eviction) in order to manage soft references manually on top of an LRUMap.
We see this issue as a top priority and should be applied to the trunk ASAP.
SoftLimitMRUCache has seen the most "radical" changes. SimpleMRUCache,
QueryPlanCache and Environment only contain minor changes.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: