[teiid-issues] [JBoss JIRA] (TEIID-3099) RejectedExecutionException for Teiid during high query load

Steven Hawkins (JIRA) issues at jboss.org
Fri Aug 22 10:37:00 EDT 2014


    [ https://issues.jboss.org/browse/TEIID-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995101#comment-12995101 ] 

Steven Hawkins commented on TEIID-3099:
---------------------------------------

Yes, I agree that can occur.  Even if we expand the scope of the poolLock there is still a potential for a timing issue.  It seems best to not rely on a SynchronousQueue and instead use a LinkeBlockingQueue, but we'll also have to check that queue as part of the reuse logic as to keep scheduling more fair.

> RejectedExecutionException for Teiid during high query load
> -----------------------------------------------------------
>
>                 Key: TEIID-3099
>                 URL: https://issues.jboss.org/browse/TEIID-3099
>             Project: Teiid
>          Issue Type: Bug
>      Security Level: Public(Everyone can see) 
>          Components: Query Engine, Server
>    Affects Versions: 8.7
>         Environment: Windows (x64) and Mainframe z/OS (x64)
>            Reporter: Mark Ackert
>            Assignee: Steven Hawkins
>              Labels: queryengine, rejectedexecutionexception, teiid-engine, threads
>
> Occasionally, when a standalone Teiid server is under high load from concurrent queries, a RejectedExecutionException is thrown. The relevant part of the stacktrace is below. I investigated the ThreadReuseExecutor source, and I believe the issue's cause is based around the sycnhronized(poolLock) code - see below the stack trace for info, and psuedo-code snippet with my analysis below the stacktrace.
> Caused by: java.util.concurrent.RejectedExecutionException: Task org.teiid.dqp.internal.process.ThreadReuseExecutor$3 at c75a8114 rejected from org.teiid.dqp.internal.process.ThreadReuseExecutor$2 at 8b1166ec[Running, pool size = 128, active threads = 128, queued tasks = 0, completed tasks = 692637]
> 	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) [rt.jar:1.7.0]
> 	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) [rt.jar:1.7.0]
> 	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) [rt.jar:1.7.0]
> 	at org.teiid.dqp.internal.process.ThreadReuseExecutor.executeDirect(ThreadReuseExecutor.java:200) [teiid-engine-8.7.0.FinalCAFix-SNAPSHOT.jar:8.7.0.FinalCAFix-SNAPSHOT]
> 	at org.teiid.dqp.internal.process.ThreadReuseExecutor.execute(ThreadReuseExecutor.java:177) [teiid-engine-8.7.0.FinalCAFix-SNAPSHOT.jar:8.7.0.FinalCAFix-SNAPSHOT]
> In the below executeDirect, the race condition that appears plausible to me is: given "activeCount" is at the thread pool size limit and a single thread is finishing execution...Let activeCount be reduced by one in the tpe.execute runnable's synchronized block. The synchronized block exits, and goes to the warnWait/logging code - at this point a thread context switch occurs, and a new PrioritizedRunnable (one which isn't waiting on the poolLock at the beginning of the executeDirect method) comes through executeDirect and proceeds forward because "poolLock" has been released, "activeCount" is now sizeLimit-1. The tpe tries to execute a new Runnable wrapping this PrioritizedRunnable, but the previous thread which hasn't completed it's logging code yet is still present in the the threadpool, and as such we get a RejectedExecutionException due to too many threads trying to be executed in a fixed size thread pool.
> private void executeDirect(final PrioritizedRunnable command) {
> 		boolean atMaxThreads = false;
> 		synchronized (poolLock) {
>                      .... if activeCount!=max_limit; activeCount++
>                 }
> .......
>                 tpe.execute(.....
>                             finally {
> 						synchronized (poolLock) {
>                                                    .......
>                                                                activeCont--;
>                                                 }
>                                                  if (success) {
>                                                          ......some log code
>                                                  }
>                                           }



--
This message was sent by Atlassian JIRA
(v6.3.1#6329)


More information about the teiid-issues mailing list