]
Cheng Fang commented on WFLY-13438:
-----------------------------------
[~marcos-scholtz] I tried to reproduce it with WildFly 20.0.0.Final, but couldn't
reproduce it. I restarted the 3-server server group (with full-ha profile) many times
without seeing the "reinstate timer" errors.
Did you configure the timer data store with proper transaction isolation level
(READ_COMMITTED or SERIALIZABLE)?
Database Timers: instances sporadically fail to start in a cluster
------------------------------------------------------------------
Key: WFLY-13438
URL:
https://issues.redhat.com/browse/WFLY-13438
Project: WildFly
Issue Type: Bug
Components: EJB
Affects Versions: 19.1.0.Final
Reporter: Marcos Scholtz
Assignee: Cheng Fang
Priority: Major
We use a postgresql database for persistent timers, to control the execution of scheduled
tasks in a cluster. It mainly works, but when operating in a cluster and when a
server-group is restarted, sometimes individual instances in the cluster fail to start
because of an error when reinstating persistent timers.
Configuration of the Datasource:
{code}
<xa-datasource jndi-name="java:/datasources/MonitoringDS"
pool-name="MonitoringDS" enabled="true">
<xa-datasource-property name="ServerName">
appserver.local
</xa-datasource-property>
<xa-datasource-property name="PortNumber">
5432
</xa-datasource-property>
<xa-datasource-property name="DatabaseName">
monitoringdb
</xa-datasource-property>
<xa-datasource-class>org.postgresql.xa.PGXADataSource</xa-datasource-class>
<driver>postgresql</driver>
<security>
<user-name>xxx</user-name>
<password>xxx</password>
</security>
</xa-datasource>
{code}
Configuration of the timer service:
{code}
<timer-service thread-pool-name="default"
default-data-store="batch-clustered-store">
<data-stores>
<file-data-store name="default-file-store"
path="timer-service-data" relative-to="jboss.server.data.dir"/>
<database-data-store name="batch-clustered-store"
datasource-jndi-name="java:/datasources/MonitoringDS"
database="'postgresql'" partition="${batch.partition}"
refresh-interval="60000" allow-execution="true"/>
</data-stores>
</timer-service>
{code}
Note we use a system-property on the server-group to determine the partition name.
Also, we made NO changes to the SQL statements in "timer-sql.properties".
The timer is generated by a `@Schedule` annotation in an EJB like this:
{code}
@Schedule(
dayOfMonth = "${manual.scheduler.dayofmonth:*}",
dayOfWeek = "${manual.scheduler.dayofweek:*}",
hour = "${manual.scheduler.hour:2/6}",
minute = "${manual.scheduler.minute:2}"
)
@TransactionAttribute(TransactionAttributeType.NEVER)
void doPostboxImport() {
log.info("Starting Postbox Import");
BatchRuntime.getJobOperator().start("PostboxImportBatchlet", null);
}
{code}
The error we sporadically get when starting more than one instance at the same time (when
all instances try to reinstate timers) is like this:
{code}
14:48:28,117 WARN [com.arjuna.ats.arjuna] (ServerService Thread Pool -- 69)
{"tenant": "", "clientId": ""} ARJUNA012078: Abort
called illegaly on atomic action 0:ffff7f000001:-27f2652e:5eb2b20f:f
14:48:28,118 WARN [org.jboss.as.ejb3.timer] (ServerService Thread Pool -- 69)
{"tenant": "", "clientId": ""} WFLYEJB0161: Failed
to reinstate timer
'postbox-integration-service.postbox-integration-service.PostboxImport'
(id=unavailable) from its persistent state: javax.transaction.NotSupportedException:
WFTXN0001: A transaction is already in progress
at
org.wildfly.transaction.client@1.1.9.Final//org.wildfly.transaction.client.ContextTransactionManager.begin(ContextTransactionManager.java:60)
at
org.wildfly.transaction.client@1.1.9.Final//org.wildfly.transaction.client.ContextTransactionManager.begin(ContextTransactionManager.java:54)
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.getActivePersistentTimers(TimerServiceImpl.java:984)
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.restoreTimers(TimerServiceImpl.java:712)
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.activate(TimerServiceImpl.java:223)
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.component.EJBComponent.init(EJBComponent.java:595)
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.component.stateless.StatelessSessionComponent.init(StatelessSessionComponent.java:110)
at
org.jboss.as.ee@19.1.0.Final//org.jboss.as.ee.component.BasicComponent.start(BasicComponent.java:222)
at
org.jboss.as.ee@19.1.0.Final//org.jboss.as.ee.component.ComponentStartService$1.run(ComponentStartService.java:54)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
org.jboss.threads@2.3.3.Final//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at
org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
at
org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
at
org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
at java.base/java.lang.Thread.run(Thread.java:834)
at
org.jboss.threads@2.3.3.Final//org.jboss.threads.JBossThread.run(JBossThread.java:485)
14:48:28,153 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 69)
{"tenant": "", "clientId": ""} MSC000001: Failed
to start service
jboss.deployment.unit."postbox-integration-service-E-SNAPSHOT.war".component.PostboxImport.START:
org.jboss.msc.service.StartException in service
jboss.deployment.unit."postbox-integration-service-E-SNAPSHOT.war".component.PostboxImport.START:
java.lang.RuntimeException: java.lang.NullPointerException
at
org.jboss.as.ee@19.1.0.Final//org.jboss.as.ee.component.ComponentStartService$1.run(ComponentStartService.java:57)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
org.jboss.threads@2.3.3.Final//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at
org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
at
org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
at
org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
at java.base/java.lang.Thread.run(Thread.java:834)
at
org.jboss.threads@2.3.3.Final//org.jboss.threads.JBossThread.run(JBossThread.java:485)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.persistTimer(TimerServiceImpl.java:626)
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.createCalendarTimer(TimerServiceImpl.java:537)
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.loadAutoTimer(TimerServiceImpl.java:388)
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.restoreTimers(TimerServiceImpl.java:774)
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.activate(TimerServiceImpl.java:223)
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.component.EJBComponent.init(EJBComponent.java:595)
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.component.stateless.StatelessSessionComponent.init(StatelessSessionComponent.java:110)
at
org.jboss.as.ee@19.1.0.Final//org.jboss.as.ee.component.BasicComponent.start(BasicComponent.java:222)
at
org.jboss.as.ee@19.1.0.Final//org.jboss.as.ee.component.ComponentStartService$1.run(ComponentStartService.java:54)
... 8 more
Caused by: java.lang.NullPointerException
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.persistence.database.DatabaseTimerPersistence.addTimer(DatabaseTimerPersistence.java:336)
at
org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.persistTimer(TimerServiceImpl.java:607)
... 16 more
{code}
The error seems similar to the one described in WFLY-13386, but the stack trace and the
steps to reproduce it seem different to me, so I'm not sure if it's the same
problem. The problem also existed in Wildfly 14 , we just updated to 19.1 (also hoping
that his error would be corrected).