[rules-users] [Drools Planner] Hard constraint appears to be ignored - How to isolate a working memory corruption in drools expert

Geoffrey De Smet ge0ffrey.spam at gmail.com
Tue Feb 28 05:52:27 EST 2012


Hi Reinis,

So drools-core 5.4.0.Beta2 doesn't fix it?
Then there might be another working memory corruption bug left in drools 
expert.

Read the manual section about "Delta score calculation" to understand 
why there's a presumed and a real workingMemory.

Isolating a working memory corruption is a PITA... but if you want to go 
for it, here's how:

> here's how I usually isolate such a bug to a small reproducable unit 
> test:
>
> - Turn on environmentMode TRACE so it crashes as fast as possible
> - Turn on TRACE logging.
> http://docs.jboss.org/drools/release/5.4.0.Beta1/drools-planner-docs/html_single/index.html#d0e2450 
>
> So you see something like this:
> INFO  Solver started: time spend (0), score (-6), new best score (-6), 
> random seed (0).
> TRACE         Ignoring not doable move (col0 at row0 =>  row0).
> TRACE         Move score (-4), accepted (true) for move (col0 at row0 =>  
> row1).
> TRACE         Move score (-4), accepted (true) for move (col0 at row0 =>  
> row2).
> TRACE         Move score (-4), accepted (true) for move (col0 at row0 =>  
> row3).
> ...
> TRACE         Move score (-3), accepted (true) for move (col1 at row0 =>  
> row3).
> ...
> TRACE         Move score (-3), accepted (true) for move (col2 at row0 =>  
> row3).
> ...
> TRACE         Move score (-4), accepted (true) for move (col3 at row0 =>  
> row3).
> DEBUG     Step index (0), time spend (6), score (-3), new best score 
> (-3), accepted move size (12) for picked step (col1 at row0 =>  row3).
> ...
> DEBUG     Step index (1), time spend (10), score (-2), new best score 
> (-2), accepted move size (12) for picked step (...).
> ...
> - If there's a construction heuristic that doesn't crash: remove it by:
> -- run only the construction heuristic config
> -- take the output solution as the new input solution
> -- remove the construction heuristic config
>
> - If it doesn't crash in step 0: remove non-crashing steps before it
> -- run it just before the step it crashes and take the output solution 
> as the input solution
> -- verify that it now crashes in step 0 (*)
>
> - If it doesn't crash in the first move: remove non-crashing move's 
> before it
> -- Open DefaultDecider.decideNextStep(LocalSearchStepScope) and add 
> moveIterator.next().
> For example, if move 6 crashes:
>
> public void decideNextStep(LocalSearchStepScope localSearchStepScope) {
> WorkingMemory workingMemory = localSearchStepScope.getWorkingMemory();
> Iterator<Move>  moveIterator = 
> selector.moveIterator(localSearchStepScope);
> // BEGIN
> moveIterator.next();
> moveIterator.next();
> moveIterator.next();
> moveIterator.next();
> moveIterator.next(); // 5 next()'s
> // END
> while (moveIterator.hasNext()) {
> Move move = moveIterator.next();
> ...
>
> -- verify that it now crashes in move 0 (*)
>
> (*) If it doesn't crash, that means that an earlier step or move 
> helped cause this issue.
> Remove as many steps and moves as you can.
> This means that the final unit test will have multiple fireAllRules():
> ... do move 1 ...
> ... fireAllRules
> ... undo move 1 ...
> (... fireAllRules)
> ... do move 2 ...
> ... fireAllRules
> // Now the WM is corrupted
>
>
> - Now (and not sooner!), remove rules by commented out all the rules 
> in the DRL that doesn't stop the crashing
> - Remove all the problem facts and planning entities that doesn't stop 
> the crashing.
> - Now the problem should be small enough to rewrite it in a simple 
> unit test for Drools Expert. 


Op 27-02-12 23:43, Reinis schreef:
> Hello,
>
> I did all the suggestions from Geoffrey (thanks sir) and now the TRACE is failing with an additional rule:
>
> java.lang.IllegalStateException: The presumedScore (0hard/0soft) is corrupted because it is not the realScore  (0hard/-1soft).
> Presumed workingMemory:
>     Score rule (penalize late assignments) has count (11) and weight total (8).
>     Score rule (reward Resource consolidation) has count (7) and weight total (7).
> Real workingMemory:
>     Score rule (penalize late assignments) has count (11) and weight total (*7*).
>     Score rule (reward Resource consolidation) has count (*8*) and weight total (*8*).
>
> The offending rules look like this:
>
> rule "penalize late Assignments"
>       when
>           $project : Project(kickOff != null, $kickOff : kickOff)
>           $assignment : Assignment(project == $project)
>           eval($assignment.getInterval().getStart().isAfter($kickOff))
>       then
>           insertLogical(new IntConstraintOccurrence("penalize late assignments", ConstraintType.NEGATIVE_SOFT,
> 			new Duration($kickOff, $assignment.getInterval().getStart()).toStandardDays().getDays(), $assignment, $project));
> end
>
> rule "reward Resource consolidation"
> 	when
> 		$leftAssignment : Assignment($project : project, $leftResource : resource != null, $leftId : id, $leftIntervalStartMillis : intervalStartMillis)
> 		$rightAssignment : Assignment(project == $project, resource == $leftResource, id != $leftId, intervalStartMillis<= $leftIntervalStartMillis)
> 		eval(new Duration($leftAssignment.getIntervalStartMillis() - $rightAssignment.getIntervalStartMillis()).getStandardDays() == 0)
> 	then
> 		insertLogical(new IntConstraintOccurrence("reward Resource consolidation", ConstraintType.POSITIVE
> 		, 1, $leftAssignment, $rightAssignment));
> end
> *
> If I comment the "reward Resource consolidation" rule out, the test case completes successfully.
> *
> Is there anything else I can try?
>
> br
> Reinis
>
> On 02/27/2012 09:46 AM, Geoffrey De Smet wrote:
>
>> Op 26-02-12 19:47, Reinis schreef:
>>> Hello,
>>>
>>>> 1) TRACE
>>> It did actually crash!
>> Good :)
>>> java.lang.IllegalStateException: The presumedScore (0hard/-860soft) is corrupted because it is not the realScore  (0hard/-858soft).
>> It's not crashing on the hard constraint difference, but that probably
>> means it doesn't get there yet.
>>> Presumed workingMemory:
>>>       Score rule (reward Resource consolidation) has count (44) and weight total (44).
>>> Real workingMemory:
>>>       Score rule (reward Resource consolidation) has count (46) and weight total (46).
>>>
>>> But the rule causing it looks "normal" to me:
>>> rule "reward Resource consolidation"
>>> 	when
>>> 		$leftAssignment : Assignment($project : project, $leftResource : resource != null, $leftId : id)
>>> 		$rightAssignment : Assignment(project == $project, resource == $leftResource, id != $leftId)
>>> 		eval(new Duration(Math.abs($leftAssignment.getInterval().getStartMillis() - $rightAssignment.getInterval().getStartMillis())).getStandardDays() == 0)
>> Looks good to me.
>>
>> What you could do is make a Assignment.getIntervalStartMillis() and
>> Assignment( ..., $leftStartMillis : intervalStartMillis)
>> and add that rightAssignment's startMillis should be>   then
>> leftAssignment's $leftStartMillis to avoid the Math.abs and double
>> counting of this constraint.
>>> 	then
>>> 		insertLogical(new IntConstraintOccurrence("reward Resource consolidation", ConstraintType.POSITIVE
>>> 		, 1, $leftAssignment, $rightAssignment));
>>> end
>>>
>>>> 2) drools version and planner version
>>> drools-core-5.3.1.Final
>> Can you try TRACE with drools-core, drools-compiler and knowledge-api
>> 5.4.0.Beta2?
>> There was a working memory corruption bug fixed in drools 5.4 over 5.3,
>> not sure if it made 5.3.1.
>>> drools-planner-core-5.3.1.Final
>> (No need to upgrade planner itself at this time - as that will force you
>> down the UpgradeRecipe.txt).
>>> Thank you for the hints! Is the "eval" causing all this trouble?
>>>
>>> br
>>> Reinis
>>>
>>>
>>> On 02/26/2012 06:45 PM, Geoffrey De Smet wrote:
>>>
>>>> 1) If you haven't already, can you turn on<environmentMode>TRACE</...>
>>>> and let it run a while (it will be much slower)?
>>>> Report here if that crashes or not.
>>>>
>>>> 2) What drools version and what planner version are you using?
>>>>
>>>> Op 26-02-12 17:51, Reinis schreef:
>>>>> Hello, it appears that hard constraint I have defined is being ignored
>>>>> in some rare cases. Here's an example:
>>>>>
>>>>> This is what my application logs after getting best solution after
>>>>> solver has returned from solve():
>>>>>
>>>>> 2012-02-23 21:28:54,595 [main] INFO  Project Project1 assignment
>>>>> 563:[2012-02-27T13:00:00.000/2012-02-27T17:00:00.000] was assigned to John
>>>>>
>>>>> 2012-02-23 21:28:54,597 [main] INFO  Project Project2 assignment
>>>>> 508:[2012-02-27T13:00:00.000/2012-02-27T17:00:00.000] was assigned to John
>>>>>
>>>>> (The log says that John was assigned twice in same time period to two
>>>>> different assignments (assignment 563 and assignment 508)
>>>>>
>>>>> Here short explanation of the relevant domain data
>>>>> assignment.id = 563
>>>>> assignment.interval = 2012-02-27T13:00:00.000/2012-02-27T17:00:00.000
>>>>> assignment.project = Project1
>>>>> assignment.resource = John
>>>>>
>>>>> And this is rule that, to my opinion should've prohibited that two
>>>>> assignments get assigned to the same resource:
>>>>>
>>>>> rule "only one assignment within interval for the same resource"
>>>>>           when
>>>>>               $leftAssignment : Assignment($leftId : id, $interval :
>>>>> interval, resource != null, $resource : resource)
>>>>>               $rightAssignment : Assignment(interval == $interval, resource
>>>>> == $resource, id != $leftId)
>>>>>           then
>>>>>               insertLogical(new IntConstraintOccurrence("only one assignment
>>>>> within interval for the same resource", ConstraintType.NEGATIVE_HARD, 1,
>>>>> $leftAssignment, $rightAssignment));
>>>>> end
>>>>>
>>>>> The score looks like this:
>>>>>
>>>>> 2012-02-23 21:28:54,594 [main] DEBUG The hard score is: 0 and the soft
>>>>> score is: -532
>>>>>
>>>>> So no hard constraints are broken according to planner.
>>>>>
>>>>> My question is not what is wrong with this particular example (unless it
>>>>> is obvious for you), but if you can think of any anti-pattern that would
>>>>> cause this behavior?
>>>>>
>>>>> What I already checked is:
>>>>> - this happens really sporadically (most of the time the result is
>>>>> CORRECT but in 1 from 5 runs I get this kind of error);
>>>>> - I checked, re-checked and re-re-checked all the hashCode, compareTo,
>>>>> solutionHashCode, solutionCompare, clone, ... methods - they appear to
>>>>> be correct;
>>>>> - moves are simple and I tried to keep them consistent with the way
>>>>> moves in drools-planner examples are constructed;
>>>>>
>>>>> Would be great if you could give me a hint to what direction I should
>>>>> look next.
>>>>>
>>>>> br
>>>>> Reinis
>>>>> _______________________________________________
>>>>> rules-users mailing list
>>>>> rules-users at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/rules-users
>>>>>
>>> _______________________________________________
>>> rules-users mailing list
>>> rules-users at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/rules-users
>>>
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users
>

-- 
With kind regards,
Geoffrey De Smet





More information about the rules-users mailing list