[rules-users] Working memory batch insert performance

Tue Dec 20 09:00:39 EST 2011

There is a recent post on "poor performance from a simple join" that 
highligths almost the same questions : because "insert" trigger RETE 
propagation, time to insert depends on rules complexity.

May be you can start by looking at your rules to optimise them (see the 
previous post for some tips).

If it is still too long, may be you can cut your data in smaller groups. 
The main problem here is to be able to cut the data into pertinent 
groups according to rules (problems can happend if you have some 
accumulates, or exists, or not .... If you only have simple filters, you 
can cut your data where your want. If your rules are reasonning with 
global existence or lack for a fact, then you must ensure that, for 
example, a "not MyFact()" is true because the fact does not exists at 
all, and not only because it is not part of the chunk ...).

Le 20/12/2011 14:27, Mark Proctor a écrit :
> On 20/12/2011 13:09, Zhuo Li wrote:
>>
>> Hi, folks,
>>
>> I recently did a benchmark on Drools 5.1.2 and noticed that data 
>> insert into a stateful session is very time consuming. It took me 
>> about 30 minutes to insert 10,000 data rows on a 512M heapsize JVM. 
>> Hence I have to keep inserting data rows when I receive them and keep 
>> them in working memory, rather than loading them in a batch at a 
>> given time. This is not a friendly way for disaster recovery and I 
>> have two questions here to see if anybody has any thoughts:
>>
> 10K rows? is that 10K bean insertions? 30 minutes sounds bad. We know 
> people doing far more than that much quicker.
>>
>> 1.Is there any better way to improve the performance of data insert 
>> into a stateful session;
>>
> There is nothing faster than "insert".
> If you don't need inference, you can try turning on "sequential" mode, 
> but in general the performance gain is < 5%.
>>
>> 2.I noticed that there is a method called BatchExecution() for a 
>> stateless session. Did not get a chance to test it yet but is this a 
>> better way to load data in a batch and then run rules?
>>
> That is related to scripting an engine, it uses command objects to 
> call the inert() method - so definitely not faster.
>>
>> My requirement is I need to load a batch of data once by end of the 
>> day, and then run the rules to filter out matched data against 
>> unmatched data. I have a 3-hour processing window to complete this 
>> loading and matching process, and the data I need to load is about 1 
>> million to 2 millions. My JVM heapsize can be set up to 1024 M.
>>
>> Best regards
>>
>> Abe
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20111220/1714bf99/attachment.html