starsavari wrote:
We are considering drool rule engine for data profiling (run the data
through the various regex pattern for data analysis) on a large amount of
data. We have rules to determine the SSN, Phone number, Driver license
number, tracking number etc. Most of our rules are regex pattern matching.
I've developed a sample program with 5 rules, all are pattern matching
rules and running the rules against million facts/objects (each object has
2 string fields). It takes around 58 seconds versus 2 seconds when I run
the same thing with plain JAVA code.
I ran into a similar issue but ran out of time before I could put together a
clean fix. Attached is a partial fix to TEST at your own risk. It basically
caches the regular expression instead of re-compiling it on every match. It
has no limits on size so some might call it a memory leaker.
http://www.nabble.com/file/p23842739/MatchesEvaluatorsDefinition.java
MatchesEvaluatorsDefinition.java
Check out thread
http://www.nabble.com/Help-with-MatchesEvaluatorsDefinition-that-caches-c...
--
View this message in context:
http://www.nabble.com/Performance-is-too-bad-when-using-matches-predicate...
Sent from the drools - user mailing list archive at
Nabble.com.