Hi Wolfgang,
I finally decided to test different implementations:
* first based on an accumulation function
* second (your suggestion) relying on drools to 1) build all
SentenceWindows then to 2) locate ManualAnnotations inside those
Windows
* third (your suggestion as well) relying on drools to 1) build only
SentenceWindows that might be interesting (containing one of the
ManualAnnotations I am looking for) then to 2) locate
ManualAnnotations inside those Windows
First implementation performs quite well but I am stuck on the
parametrization (I need to to define build2windows, build3windows,
build4windows... functions): 47 milliseconds on 100 sentences, 94 ms
sentences with 1000 sentences
Second implementations is of course sub optimal since it creates many
useless windows: 125 ms on 100 sentences, 14400 ms on 1000 sentences
Third implementation is very versatile and its performances are
comparable to the accumulator solution: 93 ms on 100 sentences, 125 ms
on 1000 sentences
So thanks again for your suggestion; it was definitely useful :-).
Regards,
Bruno.
Le 19/08/2011 17:25, Wolfgang Laun a écrit :
2011/8/19 Bruno Freudensprung <bruno.freudensprung(a)temis.com
<mailto:bruno.freudensprung@temis.com>>
I am not sure I understand what you mean by "random order" but I
guess it has to do with my ArrayList result type.
What I had in mind is to put all sentences in a TreeSet during the
"action" method, and finally issue an ArrayList result object by
iterating over the TreeSet and grouping sentences.
Heh :) I clean forgot that I had done this sort of thing not too long ago.
My first guess was that such an accumulator might be faster than a
construction of windows using rules.
However I admit your suggestion is very elegant, and I thank you
for that! I am probably still too imperative-minded...
Well, a procedural solution would be a reasonable alternative for this
problem.
-W
Regards,
Bruno.
Le 19/08/2011 16:05, Wolfgang Laun a écrit :
> How would you write "buildwindows", given that its "action"
> method would be called once for each Sentence, in random order?
>
> It's very simple to write a very small set of rules to construct
> all SentenceWindow facts of size 1 and then to extend them to any
> desired size, depending on some parameter.
> 1. Given a Sentence and no Window beginning with it, create a
> Window of length 1.
> 2. Given a Window of size n < desiredSize and given a Sentence
> immediately following it, extend the Window to one of size n+1.
> 3a. For any Window of desiredSize, inspect it for "closely
> situated ManualAnnotations".
> 3b. If ManualAnnotations have been associated with their
> containing Sentences up-front, you just need to find Windows with
> more than 1 ManualAnnotation, adding them in the RHS of rule 2 above.
>
> -W
>
>
> 2011/8/19 Bruno Freudensprung <bruno.freudensprung(a)temis.com
> <mailto:bruno.freudensprung@temis.com>>
>
>
> Hi Wolfgang,
>
> Thanks for your answer.
> Sentences are not contiguous (might be some space characters
> in between) but manual annotations cannot overlap sentences
> (interpret "overlap" in terms of Drools Fusion terminology).
> If I had an "inside" operator, do you think the following
> accumulate option could be better?
>
> when
> *$result : ArrayList() from accumulate ( $s: Sentence(),
> buildwindows($s))*
> *$w : SentenceWindows () **from $result*
> a1 : ManualAnnotation (this *inside *$w)
> a2 : ManualAnnotation (this != a1, this *inside *$w)
> then
> ... do something with a1 and a2 since they are "close" to
> each other
> end
>
> Does anyone know something about accumulator parametrization
> (looking at the source code it does not seem to be possible,
> though)?
> Maybe a syntax inspired of operator parametrization could be
> nice:
>
> $result : ArrayList() from accumulate ( $s: Sentence(),
> *buildwindows[3]($s)*)
>
> Best regards,
>
> Bruno.
>
> Le 19/08/2011 13:55, Wolfgang Laun a écrit :
>> There are some details that one should consider before
>> deciding on a particular implementation technique.
>>
>> * Are all Sentences contiguous, i.e., s1.end = pred(
>> s2.start )
>> * Can a ManualAnnotation start on one Sentence and end
>> in the next or any further successor?
>>
>> As in all problems where constraints depend on an order
>> between facts, performance is going to be a problem with
>> increasing numbers of Sentences and ManualAnnotations.
>>
>> Your accumulate plan could be a very inefficient approach.
>> Creating O(N*N) pairs and then looking for an overlapping
>> window is much worse than looking at each window, for
>> instance. But it depends on the expected numbers for both.
>>
>> -W
>>
>>
>>
>> 2011/8/19 Bruno Freudensprung <bruno.freudensprung(a)temis.com
>> <mailto:bruno.freudensprung@temis.com>>
>>
>> Hello,
>>
>> I am trying to implement rules handling "Sentence",
>> "ManualAnnotation" objects (imagine someone highligthing
>> words of the document). Basically "Sentence" objects
>> have "start" and "end" positions (fields) into
the text
>> of a document, and they are Comparable according to
>> their location into the document.
>>
>> I need to write rules using the notion "window of
>> consecutive sentences".
>>
>> Basically I am not very interested by those
>> "SentenceWindow" objects, I just need them to define a
>> kind of proximity between "ManualAnnotation" objects.
>> What I eventually need in the "when" of my rule is
>> something like:
>>
>> when
>> ... maybe something creating the windows
>> a1 : ManualAnnotation ()
>> a2 : ManualAnnotation (this != a1)
>> SentenceWindow (this includes a1, this includes a2)
>> then
>> ... do something with a1 and a2 since they are
>> "close" to each other
>> end
>>
>> As I don't know the "internals" of Drools, I would
like
>> to have your opinion about what the best "idiom":
>>
>> * create all SentenceWindow objects and insert them
>> in the working memory, then write rules against
>> all the facts (SentenceWindow and ManualAnnotation)
>> * implement an accumulator that will create a list
>> of SentenceWindow object
>>
>>
>> The first option could look like:
>>
>> |||rule "Create sentence windows"
>> when
>> # find 3 consecutive sentences
>> s1 : Sentence()
>> s2 : Sentence(this > s1)
>> s3 : Sentence(this > s2)
>> not Sentence(this != s2 && > s1 && < s3)
>> then
>> SentenceWindow swindow = new SentenceWindow();
>> swindow.setStart(s1.getStart());
>> swindow.setTheend(s3.getEnd());
>> insert(swindow);
>> end|
>>
>> ... Then use the first rule "as is".
>>
>> The accumulator option could look like (I am not really
>> sure the syntax is correct) :
>>
>> when
>> *$result : ArrayList() from accumulate ( $s: Sentence(),
>> buildwindows($s))*
>> a1 : ManualAnnotation ()
>> a2 : ManualAnnotation (this != a1)
>> *SentenceWindows (this includes a1, this includes a2)
>> **from $result*
>> then
>> ... do something with a1 and a2 since they are
>> "close" to each other
>> end
>>
>> Is it possible to decide if one way is best than the other?
>>
>> And one last question: it is possible to "parametrize"
>> an accumulator (in order to provide the number of
>> sentences that should be put in the windows)?
>> I mean something like:
>>
>> when
>> $result : ArrayList() from accumulate ( $s:
>> Sentence(), *buildwindows(3,* $s))
>>
>>
>> Thanks in advance for you insights,
>>
>> Best regards,
>>
>> Bruno.
>>
>> _______________________________________________
>> rules-users mailing list
>> rules-users(a)lists.jboss.org
>> <mailto:rules-users@lists.jboss.org>
>>
https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>>
>> _______________________________________________
>> rules-users mailing list
>> rules-users(a)lists.jboss.org <mailto:rules-users@lists.jboss.org>
>>
https://lists.jboss.org/mailman/listinfo/rules-users
>
>
> _______________________________________________
> rules-users mailing list
> rules-users(a)lists.jboss.org <mailto:rules-users@lists.jboss.org>
>
https://lists.jboss.org/mailman/listinfo/rules-users
>
>
>
> _______________________________________________
> rules-users mailing list
> rules-users(a)lists.jboss.org <mailto:rules-users@lists.jboss.org>
>
https://lists.jboss.org/mailman/listinfo/rules-users
_______________________________________________
rules-users mailing list
rules-users(a)lists.jboss.org <mailto:rules-users@lists.jboss.org>
https://lists.jboss.org/mailman/listinfo/rules-users
_______________________________________________
rules-users mailing list
rules-users(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users