On Aug 6, 2014, at 10:49 AM, Andrig Miller <anmiller(a)redhat.com> wrote:
----- Original Message -----
> From: "Bill Burke" <bburke(a)redhat.com>
> To: wildfly-dev(a)lists.jboss.org
> Sent: Wednesday, August 6, 2014 9:30:06 AM
> Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default
>
>
>
> On 8/6/2014 10:50 AM, Andrig Miller wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Radoslaw Rodak" <rodakr(a)gmx.ch>
>>> To: wildfly-dev(a)lists.jboss.org
>>> Sent: Tuesday, August 5, 2014 6:51:03 PM
>>> Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default
>>>
>>>
>>> Am 06.08.2014 um 00:36 schrieb Bill Burke <bburke(a)redhat.com>:
>>>
>>>>
>>>>
>>>> On 8/5/2014 3:54 PM, Andrig Miller wrote:
>>>>>> Its a horrible theory. :) How many EJB instances of a give
>>>>>> type
>>>>>> are
>>>>>> created per request? Generally only 1. 1 instance of one
>>>>>> object
>>>>>> of
>>>>>> one
>>>>>> type! My $5 bet is that if you went into EJB code and started
>>>>>> counting
>>>>>> how many object allocations were made per request, you'd
lose
>>>>>> count
>>>>>> very
>>>>>> quickly. Better yet, run a single remote EJB request through
>>>>>> a
>>>>>> perf
>>>>>> tool and let it count the number of allocations for you. It
>>>>>> will
>>>>>> be
>>>>>> greater than 1. :)
>>>>>>
>>>>>> Maybe the StrictMaxPool has an effect on performance because it
>>>>>> creates
>>>>>> a global synchronization bottleneck. Throughput is less and
>>>>>> you
>>>>>> end
>>>>>> up
>>>>>> having less concurrent per-request objects being allocated and
>>>>>> GC'd.
>>>>>>
>>>>>
>>>>> The number per request, while relevant is only part of the
>>>>> story.
>>>>> The number of concurrent requests happening in the server
>>>>> dictates the object allocation rate. Given enough concurrency,
>>>>> even a very small number of object allocations per request can
>>>>> create an object allocation rate that can no longer be
>>>>> sustained.
>>>>>
>>>>
>>>> I'm saying that the number of concurrent requests might not
>>>> dictate
>>>> object allocation rate. There are probably a number of
>>>> allocations
>>>> that
>>>> happen after the EJB instance is obtained. i.e. interception
>>>> chains,
>>>> contexts, etc. If StrictMaxPool blocks until a new instance is
>>>> available, then there would be less allocations per request as
>>>> blocking
>>>> threads would be serialized.
>>>>
>>>
>>> Scenarion 1 )
>>> ------------------
>>> Let say we have a pool of 100 Stateless EJBs and a constant Load
>>> of
>>> 50 Requests per second proceeded by 50 EJBs from the pool in
>>> one
>>> second.
>>> After 1000 seconds how many new EJB Instances will be created
>>> having
>>> a pool? answer 0 new EJBs worst case 100 EJB’s in pool… of course
>>> object allocation is much higher as of course 1 EJB call leads to
>>> many Object from one EJB but…let see situation without pool.
>>>
>>> 50 Request/s * 1000 seconds = worst case 50’ 000 EJB Instances on
>>> Java heap where 1 EJB might have many objects… as long as
>>> Garbage
>>> Collection was not triggered… which sounds to me like faster
>>> filling
>>> JVM heap and having ofter GC probable depending on GC Strategy.
>>>
>>> Scenarion 2)
>>> ------------------
>>> Same as before, Load is still 50 Requests per second BUT EJB
>>> Method
>>> call takes 10s.
>>> after 10s we have 500 EJB Instances without pool, after 11s 550 -
>>> 10
>>> = 540EJB Instances , after 12s 580 EJBs … after some time very
>>> bad
>>> perf…full GC …and mabe OutOfMemory..
>>>
>>> So… performance advantage could also turn in to disadvantage :-)
>>>
>>>
>>>> Whoever is investigating StrictMaxPool, or EJB pooling in general
>>>> should
>>>> stop. Its pointless.
>>>
>>> Agree, pools are outdated…. but something like WorkManager for
>>> min,
>>> max Threads or even better always not less the X idle Threads
>>> would
>>> be useful :-)
>>>
>>> Radek
>>>
>>
>> The scenarios above are what is outddated. Fifty requests per
>> second isn't any load at all! We have 100's of thousands of
>> clients that we have to scale to, and lots more than 50 requests
>> per second.
>>
> What you mean to say is that you need to scale to 100's of thousands
> of
> clients on meaningless no-op benchmarks. :) I do know that that old
> SpecJ Java EE benchmarks artifically made EJB pooling important as
> process intensive calculation results were cached in these instances.
> But real-world apps don't use this feature/anti-pattern.
>
I am not talking about a meaningless no-op benchmark, but a benchmark that does lots of
work. We don't use meaningless no-op benchmarks on the performance team, with some
exception for microbenchmarks that we have carefully crafted that model the interactions
for a specific component within the context of how it is actually used for a real
application.
> Also however crappy it was, I did implement an EJB container at one
> time
> in my career. :) I know for a fact that there are a number of
> per-request internal support objects that need to be allocated.
> Let's
> count:
>
> * The argument array (for reflection)
> * Each argument of the method call
> * The response object
> * Interceptor context object
> * The interceptor context attribute map
> * EJBContext
> * Subject, Principal, role mappings
> * Transaction context
> * The message object(s) specific to the remote EJB protocol
>
> Starts to add up huh? I'm probably missing a bunch more. Throw in
> interaction with JPA and you end up with even more per-request
> objects
> being allocated. You still believe pooling one EJB instance matters?
>
See John O'Hara's post which shows our non-meaningless benchmark and the
difference that pooling makes vs. non-pooling. It is a dramatic difference to say the
least.
There is certainly a correlation identified between the results of this benchmark and the
use of pooling. However the underlying cause of the resulting difference is still unknown.
If we knew definitively how and why this happens it would help in optimizing this further.
As an example, if it turned out to be some secondary factor, like the throttling aspect of
the pool, then eliminating these allocations (and others) with a zero-tuning approach,
like thread local pooling would offer little to no improvement. If discovered it is indeed
extreme object allocation, and that it came from thousands of nested calls in a request,
then having a temporary per-request thread local cache would dramatically improve the
results, and be cheap/quick to implement vs a full thread local solution. If there is a
bug in our code somewhere where under certain situations we create hundreds of objects,
when we should be creating 10s, and the pool covers that up, fixing that bug and removing
the pool could lead to better results. If it turns out there is only 3% extra churn but
that extra churn causes a 10x perf reduction in GC, then we better understand those limits
and potentially work with the openjdk team in that area.
This conversation is a perfect example of misinformation that causes us performance and
scalability problems within our code bases.
It’s just a surprising result. The pool saves a few allocations, but it also has the cost
of concurrency usage which can trigger blocking, additional barriers, and busy looping on
CAS. You also still have object churn in the underlying pool data structures that occurs
per invocation since every invocation is a check-out and a check-in (requires a new node
object instance), and if the semaphore blocks you have additional allocation for the entry
in the wait queue. You factor in the remaining allocation savings relative to other
allocations that are required for the invocation, and it should be a very small
percentage. For that very small percentage to lead to several times a difference in
performance to me hints at other factors being involved.
--
Jason T. Greene
WildFly Lead / JBoss EAP Platform Architect
JBoss, a division of Red Hat