Galder,
I agree. I am also experiencing similar problems and looking into thread
dumps does not help a lot. We have gone overboard with TPE creation,
should be created only if needed. TPE threads names should be composed
as: node name + thread pool name + counter.
I can see what can be done about JGroups threads.
Cheers
On 09-10-06 10:59 AM, Galder Zamarreno wrote:
This is a rather long but please bear with me:
So, I've been looking at the number of threads created in our testsuite
because quite often I'm getting OOM "unable to create thread" and note
that tests have evolved to be quite complex with several CacheManagers,
Caches...etc.
In summary, I see plenty of maxThread=1 ThreadPoolExecutor (TPE)
instances created that then are either useless (i.e. they get overriden
after rewiring) or are there even though they're not gonna be used in
the test (i.e. async listener notifier). Side note: Note that maxThread
is used for the core pool size which means that this means that there's
1 thread available.
For the useless TPEs, we need to avoid creating them in the 1st place.
For the useful but not used in test, we should have an option that
allows for such executors to be created lazily in a testsuite env. I see
the point of creating these in advance in a prod or user env but when
running the testsuite, it'd be better and would safe resources,
specially wrt to threads if we created them lazily.
Examples:
For each CacheManager created, 2 x TPEs for ASYNC_NOTIFICATION_EXECUTOR
are created. The second one during the re-wiring process. The result is
that the 1st one is never used any more and it's left sitting there. Why
is a 2nd one created? Something to do with non-volatility? Indeed it is.
The j.u.e.ExecutorService instance is a j.u.c.TPE which has no
NonVolatile annotation and hence it gets recreated. Note that this
happens for all TPE usages (ASYNC_NOTIFICATION_EXECUTOR,
ASYNC_TRANSPORT_EXECUTOR, EVICTION_SCHEDULED_EXECUTOR,
ASYNC_REPLICATION_QUEUE_EXECUTOR)! For each one created, there's another
one left there taking up resources!! So, what are the options? Two:
- A question that seems to come up more and more, get rid of the
Volatility/NonVolatility difference and treat everything as NonVolatile?
Apart from simplyfying the code, it'd make easier to debug cos figuring
out these type of issues are a royal PITA. I remember hearing Manik that
since create and start phases were gone, this should be no problem.
- Or provide our own ExecutorService and TPE facades that allows use to
annotate TPE as @NonVolatile.
For each Cache created, the same thing, 2 x TPE for
ASYNC_NOTIFICATION_EXECUTOR are created.
The lazyness I was refering about earlier comes into play here in the
sense that, why not start the TPE the 1st time we register an async
Listener? If we did so, we'd be saving resources if no async listeners
are enabled (This looks to me the most common case right now). I can see
the point of having this ready for normal use but certainly for
testsuite use, this is waste when we're creating so many CacheManagers,
Caches instances and have several threads running different tests at the
same time.
Would it make sense to apply such lazyness to ASYNC_TRANSPORT_EXECUTOR?
Hmmmm, even in a configured SYNC configuration, there might be stuff we
wanna send asynchronously, i.e. PFERs, so don't think so.
With regards to EVICTION_SCHEDULED_EXECUTOR, this is a cache component.
Currently, a TPE is created regardless of whether eviction is enabled or
not and eviction is not a dynamic property. Now, if eviction is set to
NONE, neither eviction nor expiration happens, correct? In that case, I
don't see why TPE should be maintained if eviction is disabled.
To finish with the lazy TPE startup topic, with
ASYNC_REPLICATION_QUEUE_EXECUTOR, we're certainly doing a good job. In
ReplicationQueueFactory, we only construct a ReplicationQueue and
corresponding TPE if replication queue is enabled
More comments now. Since there are 4 different usages of
ThreadPoolExecutor each should have, *by default*, a different thread
name prefix. Currently, there's no such thing which means that figuring
out what a particular thread belongs to is difficult unless
threadNamePrefix are set in each set of properties.
Another thing that is very confusing is the
DefaultExecutorFactory.counter. What is it trying to represent? For
example, in the testsuite thread dumps, I was seeing thread names like:
asyncThread-150 and this is very confusing when taking in account that
maxThreads in each ThreadPoolExecutor in the testsuite is 1.
Finally, KnownComponentNames should probably be an enum, shouldn't it?
If you made it this far, thx!!