[rules-users] 答复: Working memory batch insert performance

Zhuo Li milanello1998 at gmail.com
Tue Dec 20 09:19:36 EST 2011


Thanks Vincent. Slicing the data is definitely an option but I guess
stateless session is a better fit for this solution? Stateful session means
you will have to keep your slices into different working memory slots which
will not be efficient intuitionally. 

 

Best

Abe

 

发件人: rules-users-bounces at lists.jboss.org
[mailto:rules-users-bounces at lists.jboss.org] 代表 Vincent Legendre
发送时间: 2011年12月20日 22:01
收件人: Rules Users List
主题: Re: [rules-users] Working memory batch insert performance

 

There is a recent post on "poor performance from a simple join" that
highligths almost the same questions : because "insert" trigger RETE
propagation, time to insert depends on rules complexity.

May be you can start by looking at your rules to optimise them (see the
previous post for some tips).

If it is still too long, may be you can cut your data in smaller groups. The
main problem here is to be able to cut the data into pertinent groups
according to rules (problems can happend if you have some accumulates, or
exists, or not .... If you only have simple filters, you can cut your data
where your want. If your rules are reasonning with global existence or lack
for a fact, then you must ensure that, for example, a "not MyFact()" is true
because the fact does not exists at all, and not only because it is not part
of the chunk ...).


Le 20/12/2011 14:27, Mark Proctor a écrit : 

On 20/12/2011 13:09, Zhuo Li wrote: 

Hi, folks,

 

I recently did a benchmark on Drools 5.1.2 and noticed that data insert into
a stateful session is very time consuming. It took me about 30 minutes to
insert 10,000 data rows on a 512M heapsize JVM. Hence I have to keep
inserting data rows when I receive them and keep them in working memory,
rather than loading them in a batch at a given time. This is not a friendly
way for disaster recovery and I have two questions here to see if anybody
has any thoughts:

10K rows? is that 10K bean insertions? 30 minutes sounds bad. We know people
doing far more than that much quicker.



 

Is there any better way to improve the performance of data insert into a
stateful session;

There is nothing faster than "insert".
If you don't need inference, you can try turning on "sequential" mode, but
in general the performance gain is < 5%.



I noticed that there is a method called BatchExecution() for a stateless
session. Did not get a chance to test it yet but is this a better way to
load data in a batch and then run rules?

That is related to scripting an engine, it uses command objects to call the
inert() method - so definitely not faster.



 

My requirement is I need to load a batch of data once by end of the day, and
then run the rules to filter out matched data against unmatched data. I have
a 3-hour processing window to complete this loading and matching process,
and the data I need to load is about 1 million to 2 millions. My JVM
heapsize can be set up to 1024 M.

 

Best regards

Abe

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20111220/8e97074f/attachment.html 


More information about the rules-users mailing list