[rules-dev] Deadlock in the Drools core - suggested patch appears to resolve the issue

Greg Barton greg_barton at yahoo.com
Wed Nov 4 22:35:37 EST 2009


I'll poke around.

--- On Wed, 11/4/09, Michael Neale <michael.neale at gmail.com> wrote:

> From: Michael Neale <michael.neale at gmail.com>
> Subject: Re: [rules-dev] Deadlock in the Drools core - suggested patch appears to resolve the issue
> To: "Rules Dev List" <rules-dev at lists.jboss.org>
> Date: Wednesday, November 4, 2009, 9:01 PM
> That is great news. If there are
> other places we can move to
> java.util.concurrent in older code, that would also be
> welcome,
> j.u.concurrent is awesome magic which saves so much pain.
> 
> 
> 
> On Thu, Nov 5, 2009 at 12:06 PM, Edward Archibald
> <edward.archibald at continuent.com>
> wrote:
> > Hello Michael and Greg,
> >
> > I have pulled the drools head, made the patch that
> Greg suggested (thanks Greg!) and deployed the drools-core
> jar with my app.  Prior to this change, I was able to
> reproduce the deadlock - verified in the debugger and in
> exactly the same place as my earlier post - roughly 50% of
> the time.  I have tried the same test scenario, now, 10
> times with no failures.
> >
> > >From what I can tell, this problem will easily
> happen every time that I am already in the 'delayed
> execution' code created by the rule with the 'duration()'
> qualifier and, at the same time, I get a new 'fact' and
> attempt to insert it. Knowing this, I can probably come up
> with a simpler test case. I'll give it a shot.
> >
> > Thanks, again, to you both for the quick responses.
> >
> > Edward
> >
> > -----Original Message-----
> > From: rules-dev-bounces at lists.jboss.org
> [mailto:rules-dev-bounces at lists.jboss.org]
> On Behalf Of Michael Neale
> > Sent: Wednesday, November 04, 2009 2:26 AM
> > To: Rules Dev List
> > Subject: Re: [rules-dev] Deadlock in the Drools core -
> Drools 5.0 - any suggestions for resolution?
> >
> > ha - was just musing with someone the other day who
> uses "duration"
> > anymore ;) I guess its still useful to people !
> >
> > I would say that the "duration" codebase is probably
> fairly "old" - in
> > the sense that it probably pre-dates the availability
> of
> > j.u.concurrent (which was java 5 I think? ) - so
> please try out that
> > patch, if it works, we can probably pull it in (hoping
> Edson can take
> > a look).
> >
> > On Wed, Nov 4, 2009 at 4:43 PM, Greg Barton <greg_barton at yahoo.com>
> wrote:
> >> Well, I'm not sure how to avoid the deadlock
> without changing the drools codebase.  I was, however, able
> to change the type of AbstractWorkingMemory.actionQueue to
> java.util.concurrent.ConcurrentLinkedQueue and remove the
> synchronization over the queue with no apparent ill effects.
> (Two tests failed for drools-core, but they failed whether
> the change was made or not.)  Also I don't like the fact
> that the current code synchronizes on actionQueue, but then
> exposes it outside the class through the getActionQueue()
> method, where access can be unsynchronized.  Changing it to
> ConcurrentLinkedQueue makes it safe to expose externally.
> (Not to mention that the lock can be stolen externally with
> the current code.)
> >>
> >> diff attached.  If you can run drools compiled
> from trunk, apply the diff and see if it resolves the
> deadlock.  If it does it's up to the drools devs as to
> whether the change should be made.  I'm just hacking about.
> :P
> >>
> >> --- On Tue, 11/3/09, Edward Archibald <edward.archibald at continuent.com>
> wrote:
> >>
> >>> From: Edward Archibald <edward.archibald at continuent.com>
> >>> Subject: [rules-dev] Deadlock in the Drools
> core - Drools 5.0 - any suggestions for resolution?
> >>> To: "rules-dev at lists.jboss.org"
> <rules-dev at lists.jboss.org>
> >>> Date: Tuesday, November 3, 2009, 9:41 PM
> >>>
> >>> I found the following deadlock which is,
> apparently, due to
> >>> the concurrent execution
> >>> of a task for a 'delayed' rule with a
> concurrently
> >>> executing application thread attempting to get
> access to a
> >>> 'global'.  Any recommendations for avoiding
> this type
> >>> of deadlock besides not using rules with
> 'duration()' etc.
> >>> which cause asynchronous execution with
> respect to my main
> >>> application thread?
> >>>
> >>> This problem is somewhat difficult to
> reproduce on demand
> >>> but it does come up frequently when the
> 'delayed' rule
> >>> "DETECT MONITORING HAS STOPPED" is activated
> as a result of
> >>> the trigger conditions.
> >>>
> >>>
> ===================================================================================
> >>>
> >>> This thread, my application's
> EnterprisePolicyManager
> >>> thread, is attempting to get access to a
> global, policyMgr,
> >>> and is waiting for
> >>> the 'lock.lock' on RetooStatefulSession
> >>>
> >>> It owns the
> 'ReteooStatefulSession.actionQueue'
> >>> and is waiting for the
> ReteooStatefulSession.lock.lock
> >>>
> >>> owns: java.util.LinkedList<E> 
> (id=207)
> >>> waited by: Thread [pool-3-thread-1]
> (Suspended)
> >>> owns:
> >>>
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine
> >>> (id=208)
> >>> sun.misc.Unsafe.park(boolean, long) line: not
> available
> >>> [native method] [local variables unavailable]
> >>> java.util.concurrent.locks.LockSupport.park()
> line: 118
> >>> [local variables unavailable]
> >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
> >>> line: 681 [local variables unavailable]
> >>>
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
> >>> int) line: 711
> >>>
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
> >>> line: 1041
> >>>
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()
> >>> line: 184 [local variables unavailable]
> >>>
> java.util.concurrent.locks.ReentrantLock.lock() line: 256
> >>> [local variables unavailable]
> >>>
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
> >>> line: 587
> >>>
> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
> >>> org.drools.rule.Declaration[],
> org.drools.WorkingMemory,
> >>> java.lang.Object) line: not available
> >>>
> org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple,
> >>> org.drools.WorkingMemory, java.lang.Object)
> line: 117
> >>>
> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
> >>> org.drools.spi.PropagationContext,
> >>> org.drools.common.InternalWorkingMemory) line:
> 180
> >>>
> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
> >>> org.drools.common.InternalWorkingMemory,
> >>> org.drools.reteoo.LeftTuple) line: 117
> >>>
> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
> >>> org.drools.reteoo.RightTuple,
> >>> org.drools.spi.PropagationContext,
> >>> org.drools.common.InternalWorkingMemory,
> boolean) line: 28
> >>>
> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
> >>> org.drools.spi.PropagationContext,
> >>> org.drools.common.InternalWorkingMemory) line:
> 175
> >>>
> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
> >>> org.drools.spi.PropagationContext,
> >>> org.drools.common.InternalWorkingMemory) line:
> 42
> >>>
> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
> >>> org.drools.common.InternalWorkingMemory) line:
> 326
> >>>
> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
> >>> line: 221
> >>>
> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
> >>> line: 394
> >>>
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> >>> line: 1486
> >>>
> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
> >>> java.lang.Object, org.drools.rule.Rule,
> >>> org.drools.spi.Activation) line: 158

> >>>
> org.drools.common.NamedEntryPoint.insert(java.lang.Object,
> >>> boolean, boolean, org.drools.rule.Rule,
> >>> org.drools.spi.Activation) line: 122
> >>>
> org.drools.common.NamedEntryPoint.insert(java.lang.Object)
> >>> line: 80
> >>>
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
> >>> java.lang.Object, boolean) line: 162
> >>>
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run()
> >>> line: 249
> >>> java.lang.Thread.run() line: 595
> >>>
> >>> The rule implicated in the above thread is:
> >>>
> >>> rule "IF IN MAINTENANCE MODE, CONSUME ALL
> NOTIFICATIONS"
> >>> salience 999
> >>>   when
> >>>     notification :
> ClusterResourceNotification()
> >>> from entry-point "MONITORING"
> >>>     eval(policyMgr.getMode() ==
> >>> ClusterPolicyManagerMode.MAINTENANCE)
> >>>   then
> >>>      statistics.increment("IF IN
> >>> MAINTENANCE MODE, CONSUME ALL
> NOTIFICATIONS");
> >>>     retract(notification);
> >>> end
> >>>
> >>>
> >>>
> >>> This other thread, apparently a scheduled
> thread for a rule
> >>> with a 10 second duration,
> >>> is attempting to insert a fact and owns the
> 'lock.lock' on
> >>> ReteooStatefulSession and
> >>> is waiting for the
> 'ReteooStatefulSession.actionQueue'.
> >>>
> >>> owns: org.drools.common.DefaultAgenda 
> (id=4046)
> >>> waiting for: java.util.LinkedList<E> 
> (id=207)
> >>>
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> >>> line: 1480
> >>>
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
> >>> java.lang.Object, org.drools.rule.Rule,
> >>> org.drools.spi.Activation,
> org.drools.reteoo.ObjectTypeConf)
> >>> line: 1051
> >>>
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
> >>> boolean, boolean, org.drools.rule.Rule,
> >>> org.drools.spi.Activation) line: 1001
> >>>
> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object,
> >>> boolean) line: 114
> >>>
> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object)
> >>> line: 108
> >>>
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
> >>>
> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
> >>> org.drools.FactHandle, java.lang.String,
> >>> org.drools.FactHandle,
> >>>
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
> >>> org.apache.log4j.Logger) line: not available
> >>>
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
> >>> org.drools.WorkingMemory) line: not available
> >>>
> org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation)
> >>> line: 934
> >>>
> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext)
> >>> line: 70
> >>>
> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> >>> line: 132
> >>>
> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> >>> line: 110
> >>>
> java.util.concurrent.FutureTask$Sync.innerRun() line: 269
> >>>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
> >>> line: 123
> >>>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
> >>> line: 65
> >>>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run()
> >>> line: 168
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
> >>> line: 650
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run() line:
> >>> 675
> >>> java.lang.Thread.run() line: 595
> >>>
> >>> The rule for this task looks like:
> >>> rule "DETECT MONITORING HAS STOPPED"
> >>> duration(10s)
> >>> salience 1000
> >>>   when
> >>>     lastNotification :
> >>> DataServerNotification($resourceName :
> resourceName)
> >>>
> >>>   from entry-point "MONITORING"
> >>>
> >>>     not (DataServerNotification(resourceName
> ==
> >>> $resourceName,
> >>>
> >>>
> >>>    this after [10s] lastNotification)
> >>>
> >>>   from entry-point "MONITORING")
> >>>
> >>>     not (ManagerFailedAlarm(expired ==
> false,
> >>>
> >>>    resourceName == $resourceName))
> >>>
> >>>     not (DataSource(name == $resourceName,
> >>>
> >>>    state == ResourceState.SHUNNED ||
> >>>
> >>>    state == ResourceState.FAILED))
> >>>
> >>>   then
> >>>     Object[] params = {$resourceName};
> >>>     if (policyMgr.getMode() !=
> >>> ClusterPolicyManagerMode.MAINTENANCE)
> >>>     {
> >>>
> >>>
> lastNotification.setResourceState(ResourceState.UNKNOWN);
> >>>       ManagerFailedAlarm alarm =
> >>>
> >>>         new
> >>> ManagerFailedAlarm(lastNotification, "rule
> detected monitor
> >>> stop",
> >>>
> >>>
> >>>    6, AlarmSeverity.FAULT);
> >>>       logger.info(alarm.toString());
> >>>       insert(alarm);
> >>>       update(lastNotification);
> >>>      }
> >>> end
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> _______________________________________________
> >>> rules-dev mailing list
> >>> rules-dev at lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/rules-dev
> >>>
> >>
> >>
> >>
> >> _______________________________________________
> >> rules-dev mailing list
> >> rules-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/rules-dev
> >>
> >>
> >
> >
> >
> > --
> > Michael D Neale
> > home: www.michaelneale.net
> > blog: michaelneale.blogspot.com
> >
> > _______________________________________________
> > rules-dev mailing list
> > rules-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/rules-dev
> >
> > _______________________________________________
> > rules-dev mailing list
> > rules-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/rules-dev
> >
> 
> 
> 
> -- 
> Michael D Neale
> home: www.michaelneale.net
> blog: michaelneale.blogspot.com
> 
> _______________________________________________
> rules-dev mailing list
> rules-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-dev
> 


      



More information about the rules-dev mailing list