RejectedExecutionException for Teiid during high query load
-----------------------------------------------------------
Key: TEIID-3099
URL:
https://issues.jboss.org/browse/TEIID-3099
Project: Teiid
Issue Type: Bug
Components: Query Engine, Server
Affects Versions: 8.7
Environment: Windows (x64) and Mainframe z/OS (x64)
Reporter: Mark Ackert
Assignee: Steven Hawkins
Labels: queryengine, rejectedexecutionexception, teiid-engine, threads
Fix For: 8.7.1, 8.9
Occasionally, when a standalone Teiid server is under high load from concurrent queries,
a RejectedExecutionException is thrown. The relevant part of the stacktrace is below. I
investigated the ThreadReuseExecutor source, and I believe the issue's cause is based
around the sycnhronized(poolLock) code - see below the stack trace for info, and
psuedo-code snippet with my analysis below the stacktrace.
Caused by: java.util.concurrent.RejectedExecutionException: Task
org.teiid.dqp.internal.process.ThreadReuseExecutor$3@c75a8114 rejected from
org.teiid.dqp.internal.process.ThreadReuseExecutor$2@8b1166ec[Running, pool size = 128,
active threads = 128, queued tasks = 0, completed tasks = 692637]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
[rt.jar:1.7.0]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
[rt.jar:1.7.0]
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
[rt.jar:1.7.0]
at
org.teiid.dqp.internal.process.ThreadReuseExecutor.executeDirect(ThreadReuseExecutor.java:200)
[teiid-engine-8.7.0.FinalCAFix-SNAPSHOT.jar:8.7.0.FinalCAFix-SNAPSHOT]
at
org.teiid.dqp.internal.process.ThreadReuseExecutor.execute(ThreadReuseExecutor.java:177)
[teiid-engine-8.7.0.FinalCAFix-SNAPSHOT.jar:8.7.0.FinalCAFix-SNAPSHOT]
In the below executeDirect, the race condition that appears plausible to me is: given
"activeCount" is at the thread pool size limit and a single thread is finishing
execution...Let activeCount be reduced by one in the tpe.execute runnable's
synchronized block. The synchronized block exits, and goes to the warnWait/logging code -
at this point a thread context switch occurs, and a new PrioritizedRunnable (one which
isn't waiting on the poolLock at the beginning of the executeDirect method) comes
through executeDirect and proceeds forward because "poolLock" has been released,
"activeCount" is now sizeLimit-1. The tpe tries to execute a new Runnable
wrapping this PrioritizedRunnable, but the previous thread which hasn't completed
it's logging code yet is still present in the the threadpool, and as such we get a
RejectedExecutionException due to too many threads trying to be executed in a fixed size
thread pool.
private void executeDirect(final PrioritizedRunnable command) {
boolean atMaxThreads = false;
synchronized (poolLock) {
.... if activeCount!=max_limit; activeCount++
}
.......
tpe.execute(.....
finally {
synchronized (poolLock) {
.......
activeCont--;
}
if (success) {
......some log code
}
}