[rules-users] Performance is too bad when using matches predicate

Jared Davis sunray at davisprogramming.com
Tue Jun 2 19:16:05 EDT 2009



starsavari wrote:
> 
> We are considering drool rule engine for data profiling (run the data
> through the various regex pattern for data analysis) on a large amount of
> data. We have rules to determine the SSN, Phone number, Driver license
> number, tracking number etc. Most of our rules are regex pattern matching.
> I've developed a sample program with 5 rules, all are pattern matching
> rules and running the rules against million facts/objects (each object has
> 2 string fields). It takes around 58 seconds versus 2 seconds when I run
> the same thing with plain JAVA code. 
> 
> 


I ran into a similar issue but ran out of time before I could put together a
clean fix. Attached is a partial fix to TEST at your own risk. It basically
caches the regular expression instead of re-compiling it on every match.  It
has no limits on size so some might call it a memory leaker.
http://www.nabble.com/file/p23842739/MatchesEvaluatorsDefinition.java
MatchesEvaluatorsDefinition.java 




Check out thread
http://www.nabble.com/Help-with-MatchesEvaluatorsDefinition-that-caches-compiled-regular-expression-patterns-td23377792.html
-- 
View this message in context: http://www.nabble.com/Performance-is-too-bad-when-using-matches-predicate-tp23841498p23842739.html
Sent from the drools - user mailing list archive at Nabble.com.




More information about the rules-users mailing list