[rules-users] To accumulate or not to accumulate

Wolfgang Laun wolfgang.laun at gmail.com
Fri Aug 19 11:25:59 EDT 2011


2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com>

> **
>
> I am not sure I understand what you mean by "random order" but I guess it
> has to do with my ArrayList result type.
> What I had in mind is to put all sentences in a TreeSet during the "action"
> method, and finally issue an ArrayList result object by iterating over the
> TreeSet and grouping sentences.
>

Heh :) I clean forgot that I had done this sort of thing not too long ago.


> My first guess was that such an accumulator might be faster than a
> construction of windows using rules.
> However I admit your suggestion is very elegant, and I thank you for that!
> I am probably still too imperative-minded...
>

Well, a procedural solution would be a reasonable alternative for this
problem.

-W


>
> Regards,
>
> Bruno.
>
> Le 19/08/2011 16:05, Wolfgang Laun a écrit :
>
> How would you write "buildwindows", given that its "action" method would be
> called once for each Sentence, in random order?
>
> It's very simple to write a very small set of rules to construct all
> SentenceWindow facts of size 1 and then to extend them to any desired size,
> depending on some parameter.
> 1. Given a Sentence and no Window beginning with it, create a Window of
> length 1.
> 2. Given a Window of size n < desiredSize and given a Sentence immediately
> following it, extend the Window to one of size n+1.
> 3a. For any Window of desiredSize, inspect it for "closely situated
> ManualAnnotations".
> 3b. If ManualAnnotations have been associated with their containing
> Sentences up-front, you just need to find Windows with more than 1
> ManualAnnotation, adding them in the RHS of rule 2 above.
>
> -W
>
>
>  2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com>
>
>>
>> Hi Wolfgang,
>>
>> Thanks for your answer.
>> Sentences are not contiguous (might be some space characters in between)
>> but manual annotations cannot overlap sentences (interpret "overlap" in
>> terms of Drools Fusion terminology).
>> If I had an "inside" operator, do you think the following accumulate
>> option could be better?
>>
>> when
>>     *$result : ArrayList() from accumulate ( $s: Sentence(),
>> buildwindows($s))*
>>    * $w : SentenceWindows () **from $result*
>>      a1 : ManualAnnotation (this *inside *$w)
>>     a2 : ManualAnnotation (this != a1, this *inside *$w)
>> then
>>     ... do something with a1 and a2 since they are "close" to each other
>>  end
>>
>> Does anyone know something about accumulator parametrization (looking at
>> the source code it does not seem to be possible, though)?
>> Maybe a syntax inspired of operator parametrization could be nice:
>>
>>      $result : ArrayList() from accumulate ( $s: Sentence(), *buildwindows[3]($s)
>> *)
>>
>> Best regards,
>>
>> Bruno.
>>
>> Le 19/08/2011 13:55, Wolfgang Laun a écrit :
>>
>> There are some details that one should consider before deciding on a
>> particular implementation technique.
>>
>>    - Are all Sentences contiguous, i.e., s1.end = pred( s2.start )
>>    - Can a ManualAnnotation start on one Sentence and end in the next or
>>    any further successor?
>>
>> As in all problems where constraints depend on an order between facts,
>> performance is going to be a problem with increasing numbers of Sentences
>> and ManualAnnotations.
>>
>> Your accumulate plan could be a very inefficient approach. Creating O(N*N)
>> pairs and then looking for an overlapping window is much worse than looking
>> at each window, for instance. But it depends on the expected numbers for
>> both.
>>
>> -W
>>
>>
>>
>> 2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com>
>>
>>>  Hello,
>>>
>>> I am trying to implement rules handling "Sentence", "ManualAnnotation"
>>> objects (imagine someone highligthing words of the document). Basically
>>> "Sentence" objects have "start" and "end" positions (fields) into the text
>>> of a document, and they are Comparable according to their location into the
>>> document.
>>>
>>> I need to write rules using the notion "window of consecutive sentences".
>>>
>>>
>>> Basically I am not very interested by those "SentenceWindow" objects, I
>>> just need them to define a kind of proximity between "ManualAnnotation"
>>> objects.
>>> What I eventually need in the "when" of my rule is something like:
>>>
>>> when
>>>     ... maybe something creating the windows
>>>     a1 : ManualAnnotation ()
>>>     a2 : ManualAnnotation (this != a1)
>>>     SentenceWindow (this includes a1, this includes a2)
>>> then
>>>     ... do something with a1 and a2 since they are "close" to each other
>>> end
>>>
>>> As I don't know the "internals" of Drools, I would like to have your
>>> opinion about what the best "idiom":
>>>
>>>    - create all SentenceWindow objects and insert them in the working
>>>    memory, then write rules against all the facts (SentenceWindow and
>>>    ManualAnnotation)
>>>     - implement an accumulator that will create a list of
>>>    SentenceWindow object
>>>
>>>
>>> The first option could look like:
>>>
>>> rule "Create sentence windows"
>>>    when
>>>       # find 3 consecutive sentences
>>>       s1 : Sentence()
>>>       s2 : Sentence(this > s1)
>>>       s3 : Sentence(this > s2)
>>>       not Sentence(this != s2 && > s1 && < s3)
>>>    then
>>>       SentenceWindow swindow = new SentenceWindow();
>>>       swindow.setStart(s1.getStart());
>>>       swindow.setTheend(s3.getEnd());
>>>       insert(swindow);
>>> end
>>>
>>> ... Then use the first rule "as is".
>>>
>>> The accumulator option could look like (I am not really sure the syntax
>>> is correct) :
>>>
>>> when
>>>     *$result : ArrayList() from accumulate ( $s: Sentence(),
>>> buildwindows($s))*
>>>     a1 : ManualAnnotation ()
>>>     a2 : ManualAnnotation (this != a1)
>>>    * SentenceWindows (this includes a1, this includes a2) **from $result
>>> *
>>>  then
>>>     ... do something with a1 and a2 since they are "close" to each other
>>>  end
>>>
>>> Is it possible to decide if one way is best than the other?
>>>
>>> And one last question: it is possible to "parametrize" an accumulator (in
>>> order to provide the number of sentences that should be put in the windows)?
>>> I mean something like:
>>>
>>> when
>>>     $result : ArrayList() from accumulate ( $s: Sentence(), *
>>> buildwindows(3,* $s))
>>>
>>>
>>> Thanks in advance for you insights,
>>>
>>> Best regards,
>>>
>>> Bruno.
>>>
>>> _______________________________________________
>>> rules-users mailing list
>>> rules-users at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/rules-users
>>>
>>>
>>
>> _______________________________________________
>> rules-users mailing listrules-users at lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>>
>> _______________________________________________
>> rules-users mailing list
>> rules-users at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>
> _______________________________________________
> rules-users mailing listrules-users at lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/rules-users
>
>
>
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20110819/8db41e0a/attachment.html 


More information about the rules-users mailing list