[rules-users] 答复: Working memory batch insert performance

Tue Dec 20 09:53:10 EST 2011

As Mark said, stateless VS stateful is not really relevant for speed. In 
fact, stateless session is a a wrapper for a stateful session (except 
for sequential mode). Stateless session will save you from writing 
explicit fact insertion (3 or 4 line of codes) and dispose your session 
at end (1 line of code). For your problem, I recommend using stateful, 
as you will have to tune your fact insertion ... The only thing not to 
forget is to dispose your session at the end.

Cutting your data in groups will result in using multiples ksession (ie 
different working memories), for state or stateful ... And this is the 
point : having multiple small ksessions is better than having a very big 
ksession, especially if you have a big number of joins. Creating new 
ksessions is quick, as soon as you have a shared rulebase (all new 
ksession are created by the shared rulebase, compiled once with all your 
rules. It is compilation which is time consuming, not WM creation). And 
with multiple ksession, you can do the job in parrallel on a machine 
cluster, so it becomes easy to enter in any deadline : just use more 
machines !!

Le 20/12/2011 15:19, Zhuo Li a écrit :
>
> Thanks Vincent. Slicing the data is definitely an option but I guess 
> stateless session is a better fit for this solution? Stateful session 
> means you will have to keep your slices into different working memory 
> slots which will not be efficient intuitionally.
>
> Best
>
> Abe
>
> *发件人:*rules-users-bounces at lists.jboss.org 
> [mailto:rules-users-bounces at lists.jboss.org] *代表 *Vincent Legendre
> *发 送时间:*2011年12月20日22:01
> *收件人:*Rules Users List
> *主题:*Re: [rules-users] Working memory batch insert performance
>
> There is a recent post on "poor performance from a simple join" that 
> highligths almost the same questions : because "insert" trigger RETE 
> propagation, time to insert depends on rules complexity.
>
> May be you can start by looking at your rules to optimise them (see 
> the previous post for some tips).
>
> If it is still too long, may be you can cut your data in smaller 
> groups. The main problem here is to be able to cut the data into 
> pertinent groups according to rules (problems can happend if you have 
> some accumulates, or exists, or not .... If you only have simple 
> filters, you can cut your data where your want. If your rules are 
> reasonning with global existence or lack for a fact, then you must 
> ensure that, for example, a "not MyFact()" is true because the fact 
> does not exists at all, and not only because it is not part of the 
> chunk ...).
>
>
> Le 20/12/2011 14:27, Mark Proctor a écrit :
>
> On 20/12/2011 13:09, Zhuo Li wrote:
>
> Hi, folks,
>
> I recently did a benchmark on Drools 5.1.2 and noticed that data 
> insert into a stateful session is very time consuming. It took me 
> about 30 minutes to insert 10,000 data rows on a 512M heapsize JVM. 
> Hence I have to keep inserting data rows when I receive them and keep 
> them in working memory, rather than loading them in a batch at a given 
> time. This is not a friendly way for disaster recovery and I have two 
> questions here to see if anybody has any thoughts:
>
> 10K rows? is that 10K bean insertions? 30 minutes sounds bad. We know 
> people doing far more than that much quicker.
>
> Is there any better way to improve the performance of data insert into 
> a stateful session;
>
> There is nothing faster than "insert".
> If you don't need inference, you can try turning on "sequential" mode, 
> but in general the performance gain is < 5%.
>
> I noticed that there is a method called BatchExecution() for a 
> stateless session. Did not get a chance to test it yet but is this a 
> better way to load data in a batch and then run rules?
>
> That is related to scripting an engine, it uses command objects to 
> call the inert() method - so definitely not faster.
>
> My requirement is I need to load a batch of data once by end of the 
> day, and then run the rules to filter out matched data against 
> unmatched data. I have a 3-hour processing window to complete this 
> loading and matching process, and the data I need to load is about 1 
> million to 2 millions. My JVM heapsize can be set up to 1024 M.
>
> Best regards
>
> Abe
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20111220/8c4ed6b8/attachment.html