----- Original Message -----
From: "Jason Greene" <jason.greene(a)redhat.com>
To: "Andrig Miller" <anmiller(a)redhat.com>
Cc: "Bill Burke" <bburke(a)redhat.com>, wildfly-dev(a)lists.jboss.org
Sent: Wednesday, August 6, 2014 11:08:02 AM
Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default
On Aug 6, 2014, at 10:49 AM, Andrig Miller <anmiller(a)redhat.com>
wrote:
>
>
> ----- Original Message -----
>> From: "Bill Burke" <bburke(a)redhat.com>
>> To: wildfly-dev(a)lists.jboss.org
>> Sent: Wednesday, August 6, 2014 9:30:06 AM
>> Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default
>>
>>
>>
>> On 8/6/2014 10:50 AM, Andrig Miller wrote:
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Radoslaw Rodak" <rodakr(a)gmx.ch>
>>>> To: wildfly-dev(a)lists.jboss.org
>>>> Sent: Tuesday, August 5, 2014 6:51:03 PM
>>>> Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default
>>>>
>>>>
>>>> Am 06.08.2014 um 00:36 schrieb Bill Burke <bburke(a)redhat.com>:
>>>>
>>>>>
>>>>>
>>>>> On 8/5/2014 3:54 PM, Andrig Miller wrote:
>>>>>>> Its a horrible theory. :) How many EJB instances of a give
>>>>>>> type
>>>>>>> are
>>>>>>> created per request? Generally only 1. 1 instance of one
>>>>>>> object
>>>>>>> of
>>>>>>> one
>>>>>>> type! My $5 bet is that if you went into EJB code and
>>>>>>> started
>>>>>>> counting
>>>>>>> how many object allocations were made per request, you'd
lose
>>>>>>> count
>>>>>>> very
>>>>>>> quickly. Better yet, run a single remote EJB request
>>>>>>> through
>>>>>>> a
>>>>>>> perf
>>>>>>> tool and let it count the number of allocations for you.
It
>>>>>>> will
>>>>>>> be
>>>>>>> greater than 1. :)
>>>>>>>
>>>>>>> Maybe the StrictMaxPool has an effect on performance
because
>>>>>>> it
>>>>>>> creates
>>>>>>> a global synchronization bottleneck. Throughput is less
and
>>>>>>> you
>>>>>>> end
>>>>>>> up
>>>>>>> having less concurrent per-request objects being allocated
>>>>>>> and
>>>>>>> GC'd.
>>>>>>>
>>>>>>
>>>>>> The number per request, while relevant is only part of the
>>>>>> story.
>>>>>> The number of concurrent requests happening in the server
>>>>>> dictates the object allocation rate. Given enough
>>>>>> concurrency,
>>>>>> even a very small number of object allocations per request can
>>>>>> create an object allocation rate that can no longer be
>>>>>> sustained.
>>>>>>
>>>>>
>>>>> I'm saying that the number of concurrent requests might not
>>>>> dictate
>>>>> object allocation rate. There are probably a number of
>>>>> allocations
>>>>> that
>>>>> happen after the EJB instance is obtained. i.e. interception
>>>>> chains,
>>>>> contexts, etc. If StrictMaxPool blocks until a new instance
>>>>> is
>>>>> available, then there would be less allocations per request as
>>>>> blocking
>>>>> threads would be serialized.
>>>>>
>>>>
>>>> Scenarion 1 )
>>>> ------------------
>>>> Let say we have a pool of 100 Stateless EJBs and a constant Load
>>>> of
>>>> 50 Requests per second proceeded by 50 EJBs from the pool in
>>>> one
>>>> second.
>>>> After 1000 seconds how many new EJB Instances will be created
>>>> having
>>>> a pool? answer 0 new EJBs worst case 100 EJB’s in pool… of
>>>> course
>>>> object allocation is much higher as of course 1 EJB call leads
>>>> to
>>>> many Object from one EJB but…let see situation without pool.
>>>>
>>>> 50 Request/s * 1000 seconds = worst case 50’ 000 EJB Instances
>>>> on
>>>> Java heap where 1 EJB might have many objects… as long as
>>>> Garbage
>>>> Collection was not triggered… which sounds to me like faster
>>>> filling
>>>> JVM heap and having ofter GC probable depending on GC Strategy.
>>>>
>>>> Scenarion 2)
>>>> ------------------
>>>> Same as before, Load is still 50 Requests per second BUT EJB
>>>> Method
>>>> call takes 10s.
>>>> after 10s we have 500 EJB Instances without pool, after 11s 550
>>>> -
>>>> 10
>>>> = 540EJB Instances , after 12s 580 EJBs … after some time very
>>>> bad
>>>> perf…full GC …and mabe OutOfMemory..
>>>>
>>>> So… performance advantage could also turn in to disadvantage :-)
>>>>
>>>>
>>>>> Whoever is investigating StrictMaxPool, or EJB pooling in
>>>>> general
>>>>> should
>>>>> stop. Its pointless.
>>>>
>>>> Agree, pools are outdated…. but something like WorkManager for
>>>> min,
>>>> max Threads or even better always not less the X idle Threads
>>>> would
>>>> be useful :-)
>>>>
>>>> Radek
>>>>
>>>
>>> The scenarios above are what is outddated. Fifty requests per
>>> second isn't any load at all! We have 100's of thousands of
>>> clients that we have to scale to, and lots more than 50 requests
>>> per second.
>>>
>> What you mean to say is that you need to scale to 100's of
>> thousands
>> of
>> clients on meaningless no-op benchmarks. :) I do know that that
>> old
>> SpecJ Java EE benchmarks artifically made EJB pooling important as
>> process intensive calculation results were cached in these
>> instances.
>> But real-world apps don't use this feature/anti-pattern.
>>
>
> I am not talking about a meaningless no-op benchmark, but a
> benchmark that does lots of work. We don't use meaningless no-op
> benchmarks on the performance team, with some exception for
> microbenchmarks that we have carefully crafted that model the
> interactions for a specific component within the context of how it
> is actually used for a real application.
>
>> Also however crappy it was, I did implement an EJB container at
>> one
>> time
>> in my career. :) I know for a fact that there are a number of
>> per-request internal support objects that need to be allocated.
>> Let's
>> count:
>>
>> * The argument array (for reflection)
>> * Each argument of the method call
>> * The response object
>> * Interceptor context object
>> * The interceptor context attribute map
>> * EJBContext
>> * Subject, Principal, role mappings
>> * Transaction context
>> * The message object(s) specific to the remote EJB protocol
>>
>> Starts to add up huh? I'm probably missing a bunch more. Throw
>> in
>> interaction with JPA and you end up with even more per-request
>> objects
>> being allocated. You still believe pooling one EJB instance
>> matters?
>>
>
> See John O'Hara's post which shows our non-meaningless benchmark
> and the difference that pooling makes vs. non-pooling. It is a
> dramatic difference to say the least.
There is certainly a correlation identified between the results of
this benchmark and the use of pooling. However the underlying cause
of the resulting difference is still unknown. If we knew
definitively how and why this happens it would help in optimizing
this further. As an example, if it turned out to be some secondary
factor, like the throttling aspect of the pool, then eliminating
these allocations (and others) with a zero-tuning approach, like
thread local pooling would offer little to no improvement. If
discovered it is indeed extreme object allocation, and that it came
from thousands of nested calls in a request, then having a temporary
per-request thread local cache would dramatically improve the
results, and be cheap/quick to implement vs a full thread local
solution. If there is a bug in our code somewhere where under
certain situations we create hundreds of objects, when we should be
creating 10s, and the pool covers that up, fixing that bug and
removing the pool could lead to better results. If it turns out
there is only 3% extra churn but that extra churn causes a 10x perf
reduction in GC, then we better understand those limits and
potentially work with the openjdk team in that area.
>
> This conversation is a perfect example of misinformation that
> causes us performance and scalability problems within our code
> bases.
It’s just a surprising result. The pool saves a few allocations, but
it also has the cost of concurrency usage which can trigger
blocking, additional barriers, and busy looping on CAS. You also
still have object churn in the underlying pool data structures that
occurs per invocation since every invocation is a check-out and a
check-in (requires a new node object instance), and if the semaphore
blocks you have additional allocation for the entry in the wait
queue. You factor in the remaining allocation savings relative to
other allocations that are required for the invocation, and it
should be a very small percentage. For that very small percentage to
lead to several times a difference in performance to me hints at
other factors being involved.
All logically thought through. At a 15% lower transaction rate than we are doing now, we
saw 4 Gigabytes per second of object allocation. We, with Sanne doing most of the work,
managed to get that down to 3 Gigabytes per second (I would have loved to get it to 2).
Much of that was Hibernate allocations, and of course that was with pooling on. We have
not spent the time to pinpoint the exact differences, memory and other, between having
pooling on vs. off. Our priority has been continue to scale the workload and fix any
problems we see as a result. We have managed to increase the transaction rate another 15%
in the last couple of months, but still have another 17+% to go on a single JVM before we
start looking at two JVM's for the testing.
Once we get to our goal, I would love to put this on our list of tasks, so we can get the
specific facts, and instead of talking theory, we will no exactly what can and cannot be
done, and whether no pooling could ever match pooled.
Andy
--
Jason T. Greene
WildFly Lead / JBoss EAP Platform Architect
JBoss, a division of Red Hat