On 6 Oct 2009, at 15:59, Galder Zamarreno wrote:
This is a rather long but please bear with me:
So, I've been looking at the number of threads created in our
testsuite
because quite often I'm getting OOM "unable to create thread" and note
that tests have evolved to be quite complex with several
CacheManagers,
Caches...etc.
In summary, I see plenty of maxThread=1 ThreadPoolExecutor (TPE)
instances created that then are either useless (i.e. they get
overriden
after rewiring) or are there even though they're not gonna be used in
the test (i.e. async listener notifier). Side note: Note that
maxThread
is used for the core pool size which means that this means that
there's
1 thread available.
For the useless TPEs, we need to avoid creating them in the 1st place.
True. We could do this lazily via a layer of indirection.
For the useful but not used in test, we should have an option that
allows for such executors to be created lazily in a testsuite env. I
see
the point of creating these in advance in a prod or user env but when
running the testsuite, it'd be better and would safe resources,
specially wrt to threads if we created them lazily.
+1. I'll have a think about this and re-post later today.
Examples:
For each CacheManager created, 2 x TPEs for
ASYNC_NOTIFICATION_EXECUTOR
are created. The second one during the re-wiring process. The result
is
that the 1st one is never used any more and it's left sitting there.
Why
is a 2nd one created? Something to do with non-volatility? Indeed it
is.
The j.u.e.ExecutorService instance is a j.u.c.TPE which has no
NonVolatile annotation and hence it gets recreated. Note that this
happens for all TPE usages (ASYNC_NOTIFICATION_EXECUTOR,
ASYNC_TRANSPORT_EXECUTOR, EVICTION_SCHEDULED_EXECUTOR,
ASYNC_REPLICATION_QUEUE_EXECUTOR)! For each one created, there's
another
one left there taking up resources!! So, what are the options? Two:
- A question that seems to come up more and more, get rid of the
Volatility/NonVolatility difference and treat everything as
NonVolatile?
Apart from simplyfying the code, it'd make easier to debug cos
figuring
out these type of issues are a royal PITA. I remember hearing Manik
that
since create and start phases were gone, this should be no problem.
Again, I will revisit this today - about time. :-)
- Or provide our own ExecutorService and TPE facades that allows use
to
annotate TPE as @NonVolatile.
For each Cache created, the same thing, 2 x TPE for
ASYNC_NOTIFICATION_EXECUTOR are created.
The lazyness I was refering about earlier comes into play here in the
sense that, why not start the TPE the 1st time we register an async
Listener? If we did so, we'd be saving resources if no async listeners
are enabled (This looks to me the most common case right now). I can
see
the point of having this ready for normal use but certainly for
testsuite use, this is waste when we're creating so many
CacheManagers,
Caches instances and have several threads running different tests at
the
same time.
Would it make sense to apply such lazyness to
ASYNC_TRANSPORT_EXECUTOR?
Hmmmm, even in a configured SYNC configuration, there might be stuff
we
wanna send asynchronously, i.e. PFERs, so don't think so.
With regards to EVICTION_SCHEDULED_EXECUTOR, this is a cache
component.
Currently, a TPE is created regardless of whether eviction is
enabled or
not and eviction is not a dynamic property. Now, if eviction is set to
NONE, neither eviction nor expiration happens, correct? In that
case, I
don't see why TPE should be maintained if eviction is disabled.
To finish with the lazy TPE startup topic, with
ASYNC_REPLICATION_QUEUE_EXECUTOR, we're certainly doing a good job. In
ReplicationQueueFactory, we only construct a ReplicationQueue and
corresponding TPE if replication queue is enabled
More comments now. Since there are 4 different usages of
ThreadPoolExecutor each should have, *by default*, a different thread
name prefix. Currently, there's no such thing which means that
figuring
out what a particular thread belongs to is difficult unless
threadNamePrefix are set in each set of properties.
Yes, there should be a way of handling this - perhaps related to the
component name if it isn't overridden with a property. I'll have a
look at this too.
Another thing that is very confusing is the
DefaultExecutorFactory.counter. What is it trying to represent?
Restarts of the cache in the same VM. It would look confusing now,
but with proper naming (as suggested above) it should be clearer.
For
example, in the testsuite thread dumps, I was seeing thread names
like:
asyncThread-150 and this is very confusing when taking in account that
maxThreads in each ThreadPoolExecutor in the testsuite is 1.
Finally, KnownComponentNames should probably be an enum, shouldn't it?
+1.
If you made it this far, thx!!
Thanks for the detailed analysis - useful stuff. :)
Cheers
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org