[jboss-jira] [JBoss JIRA] (WFLY-13357) (Regression) Execution of concurrent batch jobs containg partitioned steps causes deadlock
James Perkins (Jira)
issues at jboss.org
Wed May 27 12:18:06 EDT 2020
[ https://issues.redhat.com/browse/WFLY-13357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Perkins updated WFLY-13357:
---------------------------------
Priority: Blocker (was: Major)
> (Regression) Execution of concurrent batch jobs containg partitioned steps causes deadlock
> ------------------------------------------------------------------------------------------
>
> Key: WFLY-13357
> URL: https://issues.redhat.com/browse/WFLY-13357
> Project: WildFly
> Issue Type: Bug
> Components: Batch
> Affects Versions: 19.0.0.Final
> Reporter: Felix König
> Assignee: Cheng Fang
> Priority: Blocker
> Fix For: 20.0.0.Final
>
>
> Hello,
> the issue described in JBERET-180 seems to have reappeared. I am running Wildfly 16 with jberet-1.3.3. Given that there is a default batch-thread count of 10 I was able to produce a deadlock by starting 10 instances of a partitioned job simultaneously. None of the job runs fast enough to finish before all 10 jobs have been started. All 10 Batch-threads are stuck here:
> {code}
> "Batch Thread - 1 at 33537" prio=5 tid=0x109 nid=NA waiting
> java.lang.Thread.State: WAITING
> at jdk.internal.misc.Unsafe.park(Unknown Source:-1)
> at java.util.concurrent.locks.LockSupport.park(Unknown Source:-1)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source:-1)
> at java.util.concurrent.ArrayBlockingQueue.take(Unknown Source:-1)
> at org.jberet.runtime.runner.StepExecutionRunner.beginPartition(StepExecutionRunner.java:350)
> at org.jberet.runtime.runner.StepExecutionRunner.runBatchletOrChunk(StepExecutionRunner.java:222)
> at org.jberet.runtime.runner.StepExecutionRunner.run(StepExecutionRunner.java:144)
> at org.jberet.runtime.runner.CompositeExecutionRunner.runStep(CompositeExecutionRunner.java:164)
> at org.jberet.runtime.runner.CompositeExecutionRunner.runFromHeadOrRestartPoint(CompositeExecutionRunner.java:88)
> at org.jberet.runtime.runner.JobExecutionRunner.run(JobExecutionRunner.java:60)
> at org.wildfly.extension.batch.jberet.deployment.BatchEnvironmentService$WildFlyBatchEnvironment$1.run(BatchEnvironmentService.java:180)
> at org.wildfly.extension.requestcontroller.RequestController$QueuedTask$1.run(RequestController.java:494)
> at org.jberet.spi.JobExecutor$2.run(JobExecutor.java:149)
> at org.jberet.spi.JobExecutor$1.run(JobExecutor.java:99)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source:-1)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source:-1)
> at java.lang.Thread.run(Unknown Source:-1)
> at org.jboss.threads.JBossThread.run(JBossThread.java:485)
> {code}
> which is this line of code:
> {code:java}
> completedPartitionThreads.take();
> {code}
> Rarely some threads also get stuck at line 364 instead, which is
> {code:java}
> final Serializable data = collectorDataQueue.take();
> {code}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
More information about the jboss-jira
mailing list