The short version:
I claim that in Init.instance()
this line:
| Init init = (Init) Contexts.getApplicationContext().get(Init.class);
|
should be replaced with :
| Init init = (Init)
Contexts.getApplicationContext().get("org.jboss.seam.core.init")
|
because the current version calles Class.getAnnotation which in turns calls synchronized
method and this results in poor behavior under heavy load.
The long version:
First some history (and a similar previous problem):
Over the past few months I've been working with a team converting an existing webapp
of moderate complexity to a JSF / Seam stack w/ Facelets.
It's been in production now for about a month and a half.
Running on jboss 4.0.5.GA w/ seam 1.2.0.patch1 Sun JSF RI 1.2 b4 (?)
Before we put it in production had to do some performance tuning. What we started out
with was about 10% as fast as the original app , which wasn't acceptable, but
eventually we got to about 60%, which was fine.
One undesirable behavior was that under moderately heavy load, when requests were slowing
down from their normal ~300ms to 3-4 seconds, the distribution of request response time
got very wide with some requests taking 5 seconds while other, identical, requests were
timing out at 5 minutes.
With some testing we tracked this down to the fact that
org/apache/catalina/core/ApplicationContext.get() is being called something like 50K -
100K times per request in our app. Internally this call is synchronized.
The solution we tried and eventually adopted was to edit the tomcat ApplicationContext
code and replace the HashMap with a ConcurrentHashMap and to remove the synchronization
in the get().
This resulted in something like a 30-40% performance improvement (if memory serves), but
more importantly it significantly narrowed the response time curve. That is, the response
time for requests ,under load, was a lot more consistent.
(note 1: this makes sense since having lots of threads contending for a lock can lead to
starvation. In this case, the fact that any request has to "win" the lock 100K
times before it succeeds makes it more likely that starvation will occur despite any
anti-starvation attempts by the OS/VM)
(Note 2: This change not strictly speaking a safe thing to do, but it works for us.
(Tomcat 6 also takes this approach but presumably they've gone further in validating
that the rest of the app is safe with the concurrent hash map semantics)).
THE CURRENT ISSUE:
Basically, in production under heavy load, we saw the same distribution of request
response time. (measured by mod-log-firstbyte in apache). Bascially once the per-page
response got to ~7-10 seconds / page some requests start to take 100 - 300 seconds
(there's some timeout in the system that caps things at 300 seconds). The situation
then often spirals out of control with active connections climbing to the 100s and
inevitably resulting in an OOM.
This time a thread-dump of jboss showed that many threads were waiting in to lock a
specific lock in java.lang.Class.initAnnotationsIfNecessary().
Looking at the stack traces it became clear that the Class in question was Init.class and
that all these threads were calling Init.instance().
( Init.instance() is called ... everywhere... all the time).
The specific line that leads to the eventual lock contention is this one:
Init init = (Init) Contexts.getApplicationContext().get(Init.class);
Which internally leads to a call to get the value of the @Name annotation for the
Init.class so that it can then look up the instance by that name.
It is this lookup of the Name of the Init component that causes the contention.
Replacing the get(Init.class) with get("org.jboss.seam.core.init") bypasses this
lookup and also that synchronized method.
I've been running this modification on half of the servers for the past two days and
they are showing MUCH more linear performance under heavy load. Basically when subjected
to a stressor (GC pause, DB related pause, traffic spike etc ) the modified machines
return to normal much more quickly (say... 20 -30 seconds compared to 5-10 minutes) and
they haven't shown a tendency to spiral out of control towards OOM.
I generally like the compile time checking that one gets from using the get(Class)
version of this method, however in this specific case i think the loss of safety is worth
it given the deep loop nature of this call.
I'm curious as to what you guys think.
Thanks,
radu
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4042697#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...