[rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

Edson Tirelli ed.tirelli at gmail.com
Wed Nov 4 20:05:45 EST 2009


   Edward,

   Are you able to provide us with a test case? that would help us ensure we
fix this and prevent future regressions.

   Thanks,
      Edson

2009/11/4 Edward Archibald <edward.archibald at continuent.com>

> Hi Greg,
>
> Thanks for the post.  I'll give this a shot.  Turns out that I can
> reproduce the issue often enough that I'll be able to see if this simple
> change resolves it.
>
> Regards,
>
> Edward
>
> ________________________________________
> From: rules-dev-bounces at lists.jboss.org [rules-dev-bounces at lists.jboss.org]
> On Behalf Of Greg Barton [greg_barton at yahoo.com]
> Sent: Tuesday, November 03, 2009 9:43 PM
> To: Rules Dev List
> Subject: Re: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any
> suggestions for resolution?
>
> Well, I'm not sure how to avoid the deadlock without changing the drools
> codebase.  I was, however, able to change the type of
> AbstractWorkingMemory.actionQueue to
> java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization
> over the queue with no apparent ill effects. (Two tests failed for
> drools-core, but they failed whether the change was made or not.)  Also I
> don't like the fact that the current code synchronizes on actionQueue, but
> then exposes it outside the class through the getActionQueue() method, where
> access can be unsynchronized.  Changing it to ConcurrentLinkedQueue makes it
> safe to expose externally. (Not to mention that the lock can be stolen
> externally with the current code.)
>
> diff attached.  If you can run drools compiled from trunk, apply the diff
> and see if it resolves the deadlock.  If it does it's up to the drools devs
> as to whether the change should be made.  I'm just hacking about. :P
>
> --- On Tue, 11/3/09, Edward Archibald <edward.archibald at continuent.com>
> wrote:
>
> > From: Edward Archibald <edward.archibald at continuent.com>
> > Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any
> suggestions for resolution?
> > To: "rules-dev at lists.jboss.org" <rules-dev at lists.jboss.org>
> > Date: Tuesday, November 3, 2009, 9:41 PM
> >
> > I found the following deadlock which is, apparently, due to
> > the concurrent execution
> > of a task for a 'delayed' rule with a concurrently
> > executing application thread attempting to get access to a
> > 'global'.  Any recommendations for avoiding this type
> > of deadlock besides not using rules with 'duration()' etc.
> > which cause asynchronous execution with respect to my main
> > application thread?
> >
> > This problem is somewhat difficult to reproduce on demand
> > but it does come up frequently when the 'delayed' rule
> > "DETECT MONITORING HAS STOPPED" is activated as a result of
> > the trigger conditions.
> >
> >
> ===================================================================================
> >
> > This thread, my application's EnterprisePolicyManager
> > thread, is attempting to get access to a global, policyMgr,
> > and is waiting for
> > the 'lock.lock' on RetooStatefulSession
> >
> > It owns the 'ReteooStatefulSession.actionQueue'
> > and is waiting for the ReteooStatefulSession.lock.lock
> >
> > owns: java.util.LinkedList<E>  (id=207)
> > waited by: Thread [pool-3-thread-1] (Suspended)
> > owns:
> > com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine
> > (id=208)
> > sun.misc.Unsafe.park(boolean, long) line: not available
> > [native method] [local variables unavailable]
> > java.util.concurrent.locks.LockSupport.park() line: 118
> > [local variables unavailable]
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
> > line: 681 [local variables unavailable]
> >
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
> > int) line: 711
> >
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
> > line: 1041
> > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()
> > line: 184 [local variables unavailable]
> > java.util.concurrent.locks.ReentrantLock.lock() line: 256
> > [local variables unavailable]
> >
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
> > line: 587
> >
> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
> > org.drools.rule.Declaration[], org.drools.WorkingMemory,
> > java.lang.Object) line: not available
> > org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple,
> > org.drools.WorkingMemory, java.lang.Object) line: 117
> >
> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
> > org.drools.spi.PropagationContext,
> > org.drools.common.InternalWorkingMemory) line: 180
> >
> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
> > org.drools.common.InternalWorkingMemory,
> > org.drools.reteoo.LeftTuple) line: 117
> >
> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
> > org.drools.reteoo.RightTuple,
> > org.drools.spi.PropagationContext,
> > org.drools.common.InternalWorkingMemory, boolean) line: 28
> >
> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
> > org.drools.spi.PropagationContext,
> > org.drools.common.InternalWorkingMemory) line: 175
> >
> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
> > org.drools.spi.PropagationContext,
> > org.drools.common.InternalWorkingMemory) line: 42
> >
> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
> > org.drools.common.InternalWorkingMemory) line: 326
> >
> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
> > line: 221
> >
> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
> > line: 394
> >
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> > line: 1486
> >
> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
> > java.lang.Object, org.drools.rule.Rule,
> > org.drools.spi.Activation) line: 158
> > org.drools.common.NamedEntryPoint.insert(java.lang.Object,
> > boolean, boolean, org.drools.rule.Rule,
> > org.drools.spi.Activation) line: 122
> > org.drools.common.NamedEntryPoint.insert(java.lang.Object)
> > line: 80
> >
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
> > java.lang.Object, boolean) line: 162
> >
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run()
> > line: 249
> > java.lang.Thread.run() line: 595
> >
> > The rule implicated in the above thread is:
> >
> > rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"
> > salience 999
> >   when
> >     notification : ClusterResourceNotification()
> > from entry-point "MONITORING"
> >     eval(policyMgr.getMode() ==
> > ClusterPolicyManagerMode.MAINTENANCE)
> >   then
> >      statistics.increment("IF IN
> > MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS");
> >     retract(notification);
> > end
> >
> >
> >
> > This other thread, apparently a scheduled thread for a rule
> > with a 10 second duration,
> > is attempting to insert a fact and owns the 'lock.lock' on
> > ReteooStatefulSession and
> > is waiting for the 'ReteooStatefulSession.actionQueue'.
> >
> > owns: org.drools.common.DefaultAgenda  (id=4046)
> > waiting for: java.util.LinkedList<E>  (id=207)
> >
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> > line: 1480
> >
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
> > java.lang.Object, org.drools.rule.Rule,
> > org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf)
> > line: 1051
> >
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
> > boolean, boolean, org.drools.rule.Rule,
> > org.drools.spi.Activation) line: 1001
> > org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object,
> > boolean) line: 114
> > org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object)
> > line: 108
> >
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
> >
> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
> > org.drools.FactHandle, java.lang.String,
> > org.drools.FactHandle,
> > com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
> > org.apache.log4j.Logger) line: not available
> >
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
> > org.drools.WorkingMemory) line: not available
> > org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation)
> > line: 934
> >
> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext)
> > line: 70
> > org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> > line: 132
> > org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> > line: 110
> > java.util.concurrent.FutureTask$Sync.innerRun() line: 269
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
> > line: 123
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
> > line: 65
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run()
> > line: 168
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
> > line: 650
> > java.util.concurrent.ThreadPoolExecutor$Worker.run() line:
> > 675
> > java.lang.Thread.run() line: 595
> >
> > The rule for this task looks like:
> > rule "DETECT MONITORING HAS STOPPED"
> > duration(10s)
> > salience 1000
> >   when
> >     lastNotification :
> > DataServerNotification($resourceName : resourceName)
> >
> >   from entry-point "MONITORING"
> >
> >     not (DataServerNotification(resourceName ==
> > $resourceName,
> >
> >
> >    this after [10s] lastNotification)
> >
> >   from entry-point "MONITORING")
> >
> >     not (ManagerFailedAlarm(expired == false,
> >
> >    resourceName == $resourceName))
> >
> >     not (DataSource(name == $resourceName,
> >
> >    state == ResourceState.SHUNNED ||
> >
> >    state == ResourceState.FAILED))
> >
> >   then
> >     Object[] params = {$resourceName};
> >     if (policyMgr.getMode() !=
> > ClusterPolicyManagerMode.MAINTENANCE)
> >     {
> >
> > lastNotification.setResourceState(ResourceState.UNKNOWN);
> >       ManagerFailedAlarm alarm =
> >
> >         new
> > ManagerFailedAlarm(lastNotification, "rule detected monitor
> > stop",
> >
> >
> >    6, AlarmSeverity.FAULT);
> >       logger.info(alarm.toString());
> >       insert(alarm);
> >       update(lastNotification);
> >      }
> > end
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > rules-dev mailing list
> > rules-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/rules-dev
> >
>
>
>
> _______________________________________________
> rules-dev mailing list
> rules-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-dev
>



-- 
 Edson Tirelli
 JBoss Drools Core Development
 JBoss by Red Hat @ www.jboss.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-dev/attachments/20091104/cfb06afd/attachment-0001.html 


More information about the rules-dev mailing list