[rules-dev] Miss Manners

Sat Mar 28 12:17:39 EDT 2009

Greetings:

Well, I really wish more people would read what Steve and I have been  
blogging about benchmarks for the past few years.  While one proposes  
improving Miss Manners the other proposes dropping it altogether.  My  
personal feelings "used" to be that we could improve Miss Manners so  
that it could be used.  Now, I feel that is any "improvement" will  
come through massive amounts of data because the rules are WAY too  
simplistic - and then all you will be testing is the performance of  
the system itself and NOT the rulebase.

Also, if you trace the firing of the rules, you will find that once  
started, for all practical purposes, only one rule keeps firing over  
and over.  Miss Manners was designed for one purpose; to stress-test  
the Agenda Table by recursively putting all of the guests on the table  
over and over and over again.  Greg Barton - as stated earlier  
somewhere - repeated this test using straight-up Java and did it just  
as quickly.

Regardless,  a rulebase benchmark should be composed of several tests:

Forward Chaining
Backward Chaining
Non-Monotonicity (The hallmark of a rulebase)
Complex Rules
Rules with a high level of Specificity
Lots of (maybe 100 or more) "simple" rules that chain between themselves
Stress the conflict resolution strategy
Stress pattern matching

Just having an overwhelming amount of data is not sufficient for a  
rulebase benchmark - that would be more in line with a test of the  
database efficiency and/or the available memory. Further, it has been  
"proven" over time that compiling rules into Java code or into C++  
code (something that vendors call "sequential rules") is much faster  
than using the inference engine. True, and it should be. After all,  
most inference engines are based in Java or C++ code and the rules are  
merely an extension, another layer of abstraction, if you will. But  
sequential rules do not have the flexibility of the engine and, in  
most cases, have to be "manually" arranged so that they fire in the  
correct order. An inference engine, being non-monotonic, does not have  
that restriction.  Simply put, most rulebased systems cannot pass  
muster on the simple WaltzDB-16 benchmark. We now have a WaltzDB-200  
test should they want to try something more massive.

New Benchmarks: Perhaps we should try some of the NP-hard problems -  
that would eliminate most of the "also ran" tools. Also, perhaps we  
should be checking on the "flexibility" of a rulebase by processing on  
multiple platforms (not just Windows) as well as checking performance  
and scalability on multiple processors; perhaps 4, 8 or 16 (or more)  
CPU machines. An 8/16 CPU Mac is now available at a reasonable price  
as is the i7 Intel (basically 4/8 cores) CPU. But these are 64-bit  
CPUs and some rule engines are not supported for 64-bit platforms.  
Sad, but true. Some won't even run on Unix but only on LInux. Again,  
sad, but true.

So, any ideas? I'm thinking that someone, somewhere has a better  
suggestion than a massive decision table, 64-Queens, Sudoku or a  
revision of the Minnnesota database benchmark. Hopefully...  For now,  
I strongly suggest that we resolve to use the WaltzDB-200 benchmark  
(which should satisfy all parties for this year) and develop something  
much better for 2010.

jco
"NOW do you believe?"
(Morpheus to Trinity in "The Matrix")
Buy USA FIRST !

http://www.kbsc.com [Home base for AI connections]
http://www.OctoberRulesFest.org [Home for AI conferences]
http://JavaRules.blogspot.com [Java-Oriented Rulebased Systems]
http://ORF2009.blogspot.com [October Rules Fest]
http://exscg.blogspot.com/ [Expert Systems Consulting Group]

On Mar 28, 2009, at 1:21 AM, Wolfgang Laun wrote:

> The organizers of RuleML 2009 have announced that one of their
> topics for the Challenge http://www.defeasible.org/ruleml2009/ 
> challenge
> of this year's event is going to be "benchmark for evaluation of  
> rule engines".
>
> Folks interested in benchmarks are cordially invited to provide
> requirements, outlines, ideas, etc., for benchmarks to be sent in for
> the Challenge, and, of course, to submit actual benchmarks to the
> RuleML Symposium.
>
> Spearhead RBS implementations such as Drools provide an
> excellent arena for real-world applications of rules. Personally, I  
> think
> that benchmarks should not only address purely FOL-lish
> pattern combinations but also assess how well the interaction with the
> embedding environment (such as predicate evaluation,
> the execution of RHS consequences with agenda updates, etc.,)
> is handled.
>
> Regards
> Wolfgang Laun
> RuleML 2009 Program Committee
>
>
> On Fri, Mar 27, 2009 at 11:09 PM, Steve Núñez <brms at illation.com.au>  
> wrote:
> Mark,
>
> Agreed that Manners needs improvement. In it's current form, it's  
> nearly
> useless as a comparative benchmark. You might want to check with  
> Charles
> Young, if he's not on the list, who did a very through analysis of  
> Manners a
> while back and may have some ideas.
>
> Whilst on the topic, I am interested in any other benchmarking ideas  
> that
> folks may have. We're in the process of putting together (hopefully)
> comprehensive set of benchmarks for performance testing.
>
> Cheers,
>    - Steve
>
> On 28/03/09 5:06 AM, "Mark Proctor" <mproctor at codehaus.org> wrote:
>
> > I was wondering if anyone fancied having a go at improving Miss  
> Manners
> > to make it harder and less easy to cheat. The problem with manners  
> at
> > the moment is that it computes a large cross product, of which  
> only one
> > rule fires and the other activations are cancelled. What many  
> engines do
> > now is abuse the test by not calculating the full cross product  
> and thus
> > not doing all the work.
> >
> > Mannsers is explained here:
> > https://hudson.jboss.org/hudson/job/drools/lastSuccessfulBuild/artifact/trunk/
> > target/docs/drools-expert/html/ch09.html#d0e7455
> >
> > So I was thinking that first the amounts of data needs to be  
> increased
> > from say 128 guests to  512 guests. Then the problem needs to be  
> made
> > harder, and the full conflict set needs to be forced to be  
> evalated. So
> > maybe the first assign_seating rule is as normal where it just  
> finds M/F
> > pairs with same hobbies, but additionally we should have a scoring
> > process so that those matched in the first phase then each must have
> > some compatability score calculated against them and then the one  
> with
> > the best score is picked. Maybe people have other ways to improve  
> the
> > complexity of the test, both in adding more rules and more complex  
> rules
> > and more data.
> >
> > Mark
> >
> > _______________________________________________
> > rules-dev mailing list
> > rules-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/rules-dev
>
> --
>
> Level 40
> 140 William Street
> Melbourne, VIC 3000
> Australia
>
> Phone:  +61 3 9607 8287
> Mobile: +61 4 0096 4240
> Fax:    +61 3 9607 8282
> http://illation.com.au
>
>
> _______________________________________________
> rules-dev mailing list
> rules-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-dev
>
> _______________________________________________
> rules-dev mailing list
> rules-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-dev/attachments/20090328/9c626a29/attachment.html