[rules-users] A few general questions on scaling StatefulKnowledgeSessions

Fri Aug 17 06:54:25 EDT 2012

To be honest it sounds to me as though you need to get one of the Red Hat guys in to do some proper consultancy for your specific use case.

However...

Aiming to produce a system with the "highest scalability and performance" sounds like a strategy to produce an over engineered, overpriced solution. Google process over 30,000 search requests per second from millions of users around the world. Is that the level of scalability you need to achieve?

Are you trying to achieve 'web scale'?
http://kirkwylie.blogspot.co.uk/2010/09/cartoon-characters-discuss-web-scale.html

Given a slightly silly example...
Solution A - Processes 1 request per second on one server. Scales perfectly - i.e. Processes 2000 requests per second on 2000 servers.
Solution B - Processes 1 request per millisecond on one server. Can't run on multiple servers.

… which is the better solution? It depends very much on what your expected load is going to be and how much money you want to spend.

The only sensible way to make this decision is to estimate realistic load for your system, measure the system performance and optimise based on those measurements.

But here are some slightly more practical thoughts from my experience...
Inserting new facts is slow. (although still sub-millisecond)
Evaluating rules is fast.
If the full set of facts required to evaluate your rules is large then you're probably better off with a stateless session where you load facts into working memory in advance and then fire in small request facts on which you wish to make a decision.

If the entirety of what you're making a decision on can be expressed in a handful of facts then it may be reasonable to use a stateless session, as you will need to insert them for every request.

If you're concerned about the size of the working memory, then I would have to assume that you have a large volume of facts to insert and would therefore be better off with a stateful session with most of those facts pre-loaded.

If you're truly interested in CEP (especially streaming events), then you need stateful sessions.

I hope that's a little bit useful.

Steve

On 17 Aug 2012, at 09:09, Wolfgang Laun <wolfgang.laun at gmail.com> wrote:

> See inline.
> 
> On 17/08/2012, Skiddlebop <lmalson at cisco.com> wrote:
>> Greetings All!
>> 
>> I humbly request your guidance and insights!
>> 
>> Overview:
>> We are currently undergoing evaluation of how to best proceed using the
>> Drools Suite to best meet the current and future business needs with the
>> highest system scalability and performance. We are attempting to make the
>> proper system design choices, particularly with respect to which of the two
>> KnowledgeSession types (stateful or stateless) to use and how to best use
>> them to scale the system.
>> 
>> The Context: We are using Drools for operational decision making,
>> monitoring, and workforce resource management; this naturally entails some
>> degree of event processing, temporal reasoning, state management, and
>> inference.
>> Given the nature of this context, it seems that a StatefulKnowledgeSession
>> is justified and best (but may not be entirely necessary).
>> 
>> The current approach:
>> Currently, our rule model is not very mature or stable... Consequently, the
>> approach is to use one very large long-running StatefulKnowledgeSession
>> containing all relevant operational data. This single
>> StatefulKnowledgeSession will be constructed and disposed of (and
>> reconstructed with operational state) on a very infrequent interval, say
>> every 24 hours. In this fashion, a single working memory network manages
>> the
>> entire operational state and holds all relevant facts; Each fact is updated
>> on a per-event basis.
> 
> The number of expected events/time unit is an important factor in deciding
> about your system's architecture. What do you expect?
> 
>> 
>> The problem:
>> This approach has many drawbacks in my opinion... I'll mention just a
>> few...
>> StatefulSessions are not thread-safe (require sequential processing)
> 
> This is not the same thing: a Stateful Session is thread-safe so that its
> methods can be called by more than one thread. But it is correct that
> synchronization must guarantee mutual exclusion for all core operations,
> resulting in what (I think) you mean by sequential processing.
> 
>> and
>> consequently will not scale; it is also a single point of failure. Also, as
>> the size of Working Memory grows, processing time increases and garbage
>> collection becomes very messy and laggy (when performed).
> 
> Are these observations or just fears?
> 
>> 
>> The potential solution:
>> To enable greater scalability, differentiation, and parallelization, it
>> seems wise to partition the rules/facts into multiple specialized
>> concurrently operating StatefulKnowledgeSession (KnowledgeBase) instances.
>> However, if done improperly, this poses a difficult problem as with greater
>> separation, the more myopic our reasoning becomes.
>> 
>> Questions (some of which are intentionally dumb):
>> 
>> Given the nature of the above mentioned business context, does a Stateful
>> approach seem justified (or is it advised to follow KISS and remain
>> stateless)?
> 
> There's no gain in using a stateless session unless you can use it in
> sequential mode, which seems very unlikely from your narrative.
> 
> 
>> 
>> What are the recommended strategies to best scale StatefulKnowledgeSessions
>> (or a set of StatefulKnowledgeSessions) as they are inherently
>> single-threaded?
> 
> You can run multiple sessions in parallel.
> 
>> 
>> If multiple StatefulKnowledgeSessions/KnowledgeBases are used, what are the
>> recommended strategies to partition/individualize/classify them? Should we
>> do this according to type, instance value, unique identifier, group of
>> interrelated objects, etc? I understand this is very domain/use-case
>> specific, but I'm curious how others approach this matter.
>> 
> 
> You can partition your working memory and knowledge base on anything that
> permits processing all by itself. A set of rules investigating integer
> numbers might be divided in two sessions: one for even numbers, one
> for odd numbers.
> 
> Moreover, if processing implies stages, you might consider passing
> events from one session to the next.
> 
> But see my question w.r.t. the numbers you're expecting.
> 
> 
>> What are the recommended strategies with respect to frequency (or triggers)
>> of disposal of a StatefulKnowledgeSession and/or the retraction of the
>> facts
>> therein?
>> 
> 
> This doesn't make sense in combination with the processing model
> you've described.
> 
>> What would you say a healthy average working-memory size is?
> 
> Define "healthy" :) And if the *average* size is "healthy", it  mean
> that it grows into unhealthy dimensions :)
> 
> (Is this one of the "intentionally dumb" questions?)
> 
>> 
>> What would you say the average lifespan/duration of a
>> StatefulKnowledgeSession is?
> 
> For the application my company is running, the sessions run as long as
> possible, and so the average duration is O(months). - Generally
> speaking: make it as long as possible.
> 
> (Another one?)
> 
> 
>> 
>> If we have an external data store (which holds state) from which we can
>> query and reconstruct working-memory state for any given object (or set of
>> objects), would it be best to continue with the single large
>> StatefulKnowledgeSession approach?
> 
> Pardon my tartness: only if you can't manage to keep the data apart.
> 
>> 
>> Also, If we can always reconstruct state, is there any material difference
>> (capability-wise) between a Stateful and Stateless KnowledgeSession besides
>> inference/iterative decision making? Is it possible to do Temporal
>> Reasoning/CEP with a StatelessKnowledgeSession (I think not)?
> 
> I don't think you have fully grasped the restrictions accompanying a
> stateless session?
> 
> The answer to the second question is "partly, yes", since temporal
> operators work anywhere. Other features: no.
> 
> -W
> 
>> 
>> I entirely understand that most of this is very context specific, and that
>> no one else can solution this on my behalf; Some of these questions may be
>> very obtuse... However, I'd sincerely appreciate any insights from this
>> righteously authoritative community.
>> 
>> With Humility and Gratitude,
>> Skiddlebop
>> 
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20120817/b6d03d72/attachment-0001.html