I found the following deadlock which is, apparently, due to the concurrent execution
of a task for a 'delayed' rule with a concurrently executing application thread
attempting to get access to a 'global'. Any recommendations for avoiding this
type of deadlock besides not using rules with 'duration()' etc. which cause
asynchronous execution with respect to my main application thread?
This problem is somewhat difficult to reproduce on demand but it does come up frequently
when the 'delayed' rule "DETECT MONITORING HAS STOPPED" is activated as
a result of the trigger conditions.
===================================================================================
This thread, my application's EnterprisePolicyManager thread, is attempting to get
access to a global, policyMgr, and is waiting for
the 'lock.lock' on RetooStatefulSession
It owns the 'ReteooStatefulSession.actionQueue'
and is waiting for the ReteooStatefulSession.lock.lock
owns: java.util.LinkedList<E> (id=207)
waited by: Thread [pool-3-thread-1] (Suspended)
owns: com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine (id=208)
sun.misc.Unsafe.park(boolean, long) line: not available [native method] [local variables
unavailable]
java.util.concurrent.locks.LockSupport.park() line: 118 [local variables unavailable]
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() line: 681
[local variables unavailable]
java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
int) line: 711
java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
line: 1041
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() line: 184 [local variables
unavailable]
java.util.concurrent.locks.ReentrantLock.lock() line: 256 [local variables unavailable]
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
line: 587
com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
org.drools.rule.Declaration[], org.drools.WorkingMemory, java.lang.Object) line: not
available
org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple, org.drools.WorkingMemory,
java.lang.Object) line: 117
org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory) line: 180
org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
org.drools.common.InternalWorkingMemory, org.drools.reteoo.LeftTuple) line: 117
org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
org.drools.reteoo.RightTuple, org.drools.spi.PropagationContext,
org.drools.common.InternalWorkingMemory, boolean) line: 28
org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory) line: 175
org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory) line: 42
org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
org.drools.common.InternalWorkingMemory) line: 326
org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
line: 221
org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
line: 394
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
line: 1486
org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
java.lang.Object, org.drools.rule.Rule, org.drools.spi.Activation) line: 158
org.drools.common.NamedEntryPoint.insert(java.lang.Object, boolean, boolean,
org.drools.rule.Rule, org.drools.spi.Activation) line: 122
org.drools.common.NamedEntryPoint.insert(java.lang.Object) line: 80
com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
java.lang.Object, boolean) line: 162
com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run() line: 249
java.lang.Thread.run() line: 595
The rule implicated in the above thread is:
rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"
salience 999
when
notification : ClusterResourceNotification() from entry-point "MONITORING"
eval(policyMgr.getMode() == ClusterPolicyManagerMode.MAINTENANCE)
then
statistics.increment("IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS");
retract(notification);
end
This other thread, apparently a scheduled thread for a rule with a 10 second duration,
is attempting to insert a fact and owns the 'lock.lock' on ReteooStatefulSession
and
is waiting for the 'ReteooStatefulSession.actionQueue'.
owns: org.drools.common.DefaultAgenda (id=4046)
waiting for: java.util.LinkedList<E> (id=207)
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
line: 1480
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
java.lang.Object, org.drools.rule.Rule, org.drools.spi.Activation,
org.drools.reteoo.ObjectTypeConf) line: 1051
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
boolean, boolean, org.drools.rule.Rule, org.drools.spi.Activation) line: 1001
org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object, boolean) line: 114
org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object) line: 108
com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
org.drools.FactHandle, java.lang.String, org.drools.FactHandle,
com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
org.apache.log4j.Logger) line: not available
com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
org.drools.WorkingMemory) line: not available
org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation) line: 934
org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext) line: 70
org.drools.time.impl.JDKTimerService$JDKCallableJob.call() line: 132
org.drools.time.impl.JDKTimerService$JDKCallableJob.call() line: 110
java.util.concurrent.FutureTask$Sync.innerRun() line: 269
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
line: 123
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
line: 65
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() line:
168
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) line: 650
java.util.concurrent.ThreadPoolExecutor$Worker.run() line: 675
java.lang.Thread.run() line: 595
The rule for this task looks like:
rule "DETECT MONITORING HAS STOPPED"
duration(10s)
salience 1000
when
lastNotification : DataServerNotification($resourceName : resourceName)
from entry-point "MONITORING"
not (DataServerNotification(resourceName == $resourceName,
this after [10s] lastNotification)
from entry-point "MONITORING")
not (ManagerFailedAlarm(expired == false,
resourceName == $resourceName))
not (DataSource(name == $resourceName,
state == ResourceState.SHUNNED ||
state == ResourceState.FAILED))
then
Object[] params = {$resourceName};
if (policyMgr.getMode() != ClusterPolicyManagerMode.MAINTENANCE)
{
lastNotification.setResourceState(ResourceState.UNKNOWN);
ManagerFailedAlarm alarm =
new ManagerFailedAlarm(lastNotification, "rule detected
monitor stop",
6, AlarmSeverity.FAULT);
logger.info(alarm.toString());
insert(alarm);
update(lastNotification);
}
end