]
Cheng Fang reassigned WFLY-13357:
---------------------------------
Assignee: Cheng Fang
(Regression) Execution of concurrent batch jobs containg partitioned
steps causes deadlock
------------------------------------------------------------------------------------------
Key: WFLY-13357
URL:
https://issues.redhat.com/browse/WFLY-13357
Project: WildFly
Issue Type: Bug
Components: Batch
Affects Versions: 19.0.0.Final
Reporter: Felix König
Assignee: Cheng Fang
Priority: Major
Hello,
the issue described in JBERET-180 seems to have reappeared. I am running Wildfly 16 with
jberet-1.3.3. Given that there is a default batch-thread count of 10 I was able to produce
a deadlock by starting 10 instances of a partitioned job simultaneously. None of the job
runs fast enough to finish before all 10 jobs have been started. All 10 Batch-threads are
stuck here:
{code}
"Batch Thread - 1@33537" prio=5 tid=0x109 nid=NA waiting
java.lang.Thread.State: WAITING
at jdk.internal.misc.Unsafe.park(Unknown Source:-1)
at java.util.concurrent.locks.LockSupport.park(Unknown Source:-1)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown
Source:-1)
at java.util.concurrent.ArrayBlockingQueue.take(Unknown Source:-1)
at
org.jberet.runtime.runner.StepExecutionRunner.beginPartition(StepExecutionRunner.java:350)
at
org.jberet.runtime.runner.StepExecutionRunner.runBatchletOrChunk(StepExecutionRunner.java:222)
at org.jberet.runtime.runner.StepExecutionRunner.run(StepExecutionRunner.java:144)
at
org.jberet.runtime.runner.CompositeExecutionRunner.runStep(CompositeExecutionRunner.java:164)
at
org.jberet.runtime.runner.CompositeExecutionRunner.runFromHeadOrRestartPoint(CompositeExecutionRunner.java:88)
at org.jberet.runtime.runner.JobExecutionRunner.run(JobExecutionRunner.java:60)
at
org.wildfly.extension.batch.jberet.deployment.BatchEnvironmentService$WildFlyBatchEnvironment$1.run(BatchEnvironmentService.java:180)
at
org.wildfly.extension.requestcontroller.RequestController$QueuedTask$1.run(RequestController.java:494)
at org.jberet.spi.JobExecutor$2.run(JobExecutor.java:149)
at org.jberet.spi.JobExecutor$1.run(JobExecutor.java:99)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source:-1)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source:-1)
at java.lang.Thread.run(Unknown Source:-1)
at org.jboss.threads.JBossThread.run(JBossThread.java:485)
{code}
which is this line of code:
{code:java}
completedPartitionThreads.take();
{code}
Rarely some threads also get stuck at line 364 instead, which is
{code:java}
final Serializable data = collectorDataQueue.take();
{code}