]
Manuel Dominguez Sarmiento commented on HHH-5300:
-------------------------------------------------
We use SoftReferences pretty heavily in other caching scenarios (outside Hibernate) and
all I can say is that they are not reliable at all if you care about the precise instance
on which they are cleared. The JVM makes no hard guarantees and the only configurable
option (at least in HotSpot) is -XX:SoftRefLRUPolicyMSPerMB. Even if the clearing policy
is LRU, time based, it is global to ALL SoftReferences in the JVM. If you only use
SoftReferences for one purpose then it doesn't reallly make any difference, but once
you have many different kinds of SoftReferences serving different purposes then memory
pressure from a completely different part of the application (or even another application
in the same JVM) might push out freshly put, recently used SoftReferences that might be
useful to Hibernate in the near term.
What I'm getting at is that the strong references are necessary (even if the number is
small, such as the current default of 128) in order to guarantee that the QueryPlanCache
will at least be minimally effective in all deployment scenarios. If you search around a
bit, you'll find plenty of deployments fixing several GC problems related to
SoftReference/WeakReference processing by using -XX:SoftRefLRUPolicyMSPerMB=0 or otherwise
very low values. This pretty much means that SoftReferences are cleared almost
immediately, defeating the purpose of using them at all.
So I see SoftReferences as a safety net to be used if available, but we should not rely on
them exclusively. I see the pain in maintaing two LRUMaps in parallel, but I don't see
a clear alternative for keeping the strong references around that would give any kind of
performance benefits over the current approach.
Configurable strong and soft reference QueryPlanCache sizes
-----------------------------------------------------------
Key: HHH-5300
URL:
http://opensource.atlassian.com/projects/hibernate/browse/HHH-5300
Project: Hibernate Core
Issue Type: Patch
Components: core
Affects Versions: 3.5.0-Final, 3.5.1, 3.5.2
Environment: N/A
Reporter: Manuel Dominguez Sarmiento
Attachments: Environment.java, QueryPlanCache.java, SessionFactoryImpl.java,
SimpleMRUCache.java, SoftLimitMRUCache.java
Original Estimate: 2h
Remaining Estimate: 2h
Some of our production servers (Hibernate-based apps) have been hanging on full GC
sporadically, most of the time after running normally for a few days, sometimes even over
a week.
We suspected a memory leak. We used the Eclipse MAT tool to analyze a live heap dump, and
found that most of the heap was being used by QueryPlanCache, most specifically, by the
soft references held by SoftLimitMRUCache.
We use very large heaps (up to 30 GB in some cases). Since memory is plentiful and the
SoftLimitMRUCache is unbounded, the heap eventually fills up until a major stop-the-world
GC is necessary to cleanse the SoftLimitMRUCache soft references. We performed several
live tests configuring the Concurrent-Mark-Sweep (CMS) GC collector in order to avoid the
full GC caused by concurrent mode failures. We played around with the following settings
available in the Sun JVM:
-XX:+CMSIncrementalMode
-XX:+CMSIncrementalPacing
-XX:CMSIncrementalDutyCycle=<PCT>
-XX:CMSIncrementalDutyCycleMin=<PCT>
-XX:CMSInitiatingOccupancyFraction=<PCT>
-XX:CMSMarkStackSize=<SIZE>
-XX:CMSMarkStackSizeMax=<SIZE>
-XX:SoftRefLRUPolicyMSPerMB=<MSECS>
-XX:+ParallelRefProcEnabled
Most of these options helped somewhat by allowing soft-reference GC to be performed in
parallel, ahead of time before tipping the scales and requiring full GC. However, this did
not avoid all problems, and the servers still periodically hang upon concurrent mode
failures. These are high-load web servers which process hundreds of hits per second, so
full GC is disastrous as garbage cannot be collected fast enough. Full GC would sometimes
take over 15 minutes, sometimes it would not even finish doing its job requiring a manual
app restart.
Before anyone cries out "well, it's probably the application's fault, why do
you have so many different queries? Aren't you using parameterized queries /
PreparedStatements?" - the application does in fact produce many, many different
queries, but most of them are not reused. Even if the QueryPlanCache is highly effective,
most of the absolute number of queries are issued only once. Some use cases are the
following:
- Our system allows ad-hoc reporting and searching capabilities. Each query is typically
issued once and never reused.
- Many of our parameterized queries use IN clauses with variable-length collection/array
parameters. I'm unsure whether this affects the cache hit ratio for HQL query plans,
but it sure affects native SQL queries, since multiple collection lengths means multiple
"?" in the query string, and thus, multiple similar, if slightly different
queries polluting the corresponding plan cache.
I'm sure there are other legitimate cases in which an unbounded query plan cache is
also a problem, I'm just enumerating the ones we've run into.
This issue is not new. The same problem is described, in one way or another, by HHH-2431,
HHH-3191, and HHH-4627. I created a new issue since we've produced a working patch and
it would be made more visible, rather than as comments in the previous ones.
The solution involves giving up the unbounded soft-reference-based cache. We introduced
two new configuration options:
- hibernate.query.plan_cache_max_strong_references -> defaults to 128
- hibernate.query.plan_cache_max_soft_references -> defaults to 2048
Entries are evicted using an LRU policy, or by memory pressure from the GC in the case of
soft references. We used 2048 as a default for the soft size since it seems pretty
reasonable, but of course it can be tuned to suit the user's needs. Users looking to
emulate previous behaviour (we don't see the point, but who knows ...) can set this
option to Integer.MAX_VALUE
The patch introduces no new dependencies. It uses Apache Commons Collections' LRUMap
(just as the released version does), and does away with the ReferenceMap (which does not
support LRU eviction) in order to manage soft references manually on top of an LRUMap.
We see this issue as a top priority and should be applied to the trunk ASAP.
SoftLimitMRUCache has seen the most "radical" changes. SimpleMRUCache,
QueryPlanCache and Environment only contain minor changes.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: