<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
Hi Wolfgang,<br>
<br>
I finally decided to test different implementations:<br>
<ul>
<li>first based on an accumulation function</li>
<li>second (your suggestion) relying on drools to 1) build all
SentenceWindows then to 2) locate ManualAnnotations inside those
Windows</li>
<li>third (your suggestion as well) relying on drools to 1) build
only SentenceWindows that might be interesting (containing one
of the ManualAnnotations I am looking for) then to 2) locate
ManualAnnotations inside those Windows</li>
</ul>
First implementation performs quite well but I am stuck on the
parametrization (I need to to define build2windows, build3windows,
build4windows... functions): 47 milliseconds on 100 sentences, 94 ms
sentences with 1000 sentences<br>
Second implementations is of course sub optimal since it creates
many useless windows: 125 ms on 100 sentences, 14400 ms on 1000
sentences<br>
Third implementation is very versatile and its performances are
comparable to the accumulator solution: 93 ms on 100 sentences, 125
ms on 1000 sentences<br>
<br>
So thanks again for your suggestion; it was definitely useful :-).<br>
<br>
Regards,<br>
<br>
Bruno.<br>
<br>
Le 19/08/2011 17:25, Wolfgang Laun a écrit :
<blockquote
cite="mid:CANaj1Le17YGBqnoDWK6pzey7psxaJmRS4mFiEZk65Cqq++ssSA@mail.gmail.com"
type="cite">2011/8/19 Bruno Freudensprung <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:bruno.freudensprung@temis.com">bruno.freudensprung@temis.com</a>></span><br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="border-left: 1px solid
rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left:
1ex;">
<div bgcolor="#ffffff" text="#000000"> <br>
I am not sure I understand what you mean by "random order"
but I guess it has to do with my ArrayList result type.<br>
What I had in mind is to put all sentences in a TreeSet
during the "action" method, and finally issue an ArrayList
result object by iterating over the TreeSet and grouping
sentences.<br>
</div>
</blockquote>
<div><br>
Heh :) I clean forgot that I had done this sort of thing not
too long ago.<br>
</div>
<blockquote class="gmail_quote" style="border-left: 1px solid
rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left:
1ex;">
<div bgcolor="#ffffff" text="#000000">My first guess was that
such an accumulator might be faster than a construction of
windows using rules.<br>
However I admit your suggestion is very elegant, and I thank
you for that! I am probably still too imperative-minded...<br>
</div>
</blockquote>
<div><br>
Well, a procedural solution would be a reasonable alternative
for this problem.<br>
<br>
-W<br>
</div>
<blockquote class="gmail_quote" style="border-left: 1px solid
rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left:
1ex;">
<div bgcolor="#ffffff" text="#000000"> <br>
Regards,<br>
<br>
Bruno.<br>
<br>
Le 19/08/2011 16:05, Wolfgang Laun a écrit :
<div>
<div class="h5">
<blockquote type="cite">How would you write
"buildwindows", given that its "action" method would
be called once for each Sentence, in random order?<br>
<br>
It's very simple to write a very small set of rules to
construct all SentenceWindow facts of size 1 and then
to extend them to any desired size, depending on some
parameter.<br>
1. Given a Sentence and no Window beginning with it,
create a Window of length 1.<br>
2. Given a Window of size n < desiredSize and given
a Sentence immediately following it, extend the Window
to one of size n+1.<br>
3a. For any Window of desiredSize, inspect it for
"closely situated ManualAnnotations".<br>
3b. If ManualAnnotations have been associated with
their containing Sentences up-front, you just need to
find Windows with more than 1 ManualAnnotation, adding
them in the RHS of rule 2 above.<br>
<br>
-W<br>
<br>
<br>
<div class="gmail_quote"> 2011/8/19 Bruno
Freudensprung <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:bruno.freudensprung@temis.com"
target="_blank">bruno.freudensprung@temis.com</a>></span><br>
<blockquote class="gmail_quote" style="border-left:
1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt
0.8ex; padding-left: 1ex;">
<div bgcolor="#ffffff" text="#000000"> <br>
Hi Wolfgang,<br>
<br>
Thanks for your answer.<br>
Sentences are not contiguous (might be some
space characters in between) but manual
annotations cannot overlap sentences (interpret
"overlap" in terms of Drools Fusion
terminology).<br>
If I had an "inside" operator, do you think the
following accumulate option could be better? <br>
<br>
<tt>when<br>
<b>$result : ArrayList() from accumulate
( $s: Sentence(), buildwindows($s))</b><br>
</tt><tt> <b> $w : SentenceWindows () </b></tt><b><tt>from
$result</tt></b><br>
<tt> </tt><tt> a1 : ManualAnnotation (this <b>inside
</b>$w)<br>
a2 : ManualAnnotation (this != a1, </tt><tt>this
<b>inside </b>$w</tt><tt>)<br>
</tt><tt>then<br>
... </tt><tt>do something with a1 and a2
since they are "close" to each other</tt><br>
<tt> end</tt><br>
<br>
Does anyone know something about accumulator
parametrization (looking at the source code it
does not seem to be possible, though)?<br>
Maybe a syntax inspired of operator
parametrization could be nice:<br>
<br>
<tt> $result : ArrayList() from accumulate (
$s: Sentence(), <b> buildwindows[3]($s)</b>)<br>
</tt><br>
Best regards,<br>
<br>
Bruno.<br>
<br>
Le 19/08/2011 13:55, Wolfgang Laun a écrit :
<blockquote type="cite">There are some details
that one should consider before deciding on a
particular implementation technique.<br>
<ul>
<li>Are all Sentences contiguous, i.e.,
s1.end = pred( s2.start )</li>
<li>Can a ManualAnnotation start on one
Sentence and end in the next or any
further successor?</li>
</ul>
As in all problems where constraints depend on
an order between facts, performance is going
to be a problem with increasing numbers of
Sentences and ManualAnnotations.<br>
<br>
Your accumulate plan could be a very
inefficient approach. Creating O(N*N) pairs
and then looking for an overlapping window is
much worse than looking at each window, for
instance. But it depends on the expected
numbers for both.<br>
<br>
-W<br>
<br>
<br>
<br>
<div class="gmail_quote">2011/8/19 Bruno
Freudensprung <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:bruno.freudensprung@temis.com"
target="_blank">bruno.freudensprung@temis.com</a>></span><br>
<blockquote class="gmail_quote"
style="border-left: 1px solid rgb(204,
204, 204); margin: 0pt 0pt 0pt 0.8ex;
padding-left: 1ex;">
<div bgcolor="#ffffff" text="#000000">
Hello,<br>
<br>
I am trying to implement rules handling
"Sentence", "ManualAnnotation" objects
(imagine someone highligthing words of
the document). Basically "Sentence"
objects have "start" and "end" positions
(fields) into the text of a document,
and they are Comparable according to
their location into the document.<br>
<br>
I need to write rules using the notion
"window of consecutive sentences". <br>
<br>
Basically I am not very interested by
those "SentenceWindow" objects, I just
need them to define a kind of proximity
between "ManualAnnotation" objects.<br>
What I eventually need in the "when" of
my rule is something like:<br>
<br>
<tt>when<br>
... maybe something creating the
windows<br>
a1 : ManualAnnotation ()<br>
a2 : ManualAnnotation (this != a1)<br>
SentenceWindow (this includes a1,
this includes a2)<br>
then<br>
... do something with a1 and a2
since they are "close" to each other<br>
end</tt><br>
<br>
As I don't know the "internals" of
Drools, I would like to have your
opinion about what the best "idiom":<br>
<ul>
<li>create all SentenceWindow objects
and insert them in the working
memory, then write rules against all
the facts (SentenceWindow and
ManualAnnotation)<br>
</li>
<li>implement an accumulator that will
create a list of SentenceWindow
object </li>
</ul>
<br>
The first option could look like:<br>
<br>
<code></code><code>rule "Create sentence
windows"<br>
when<br>
# find 3 consecutive sentences<br>
s1 : Sentence()<br>
s2 : Sentence(this > s1)<br>
s3 : Sentence(this > s2)<br>
not Sentence(this != s2
&& > s1 && < s3)<br>
then<br>
SentenceWindow swindow = new
SentenceWindow();<br>
swindow.setStart(s1.getStart());<br>
swindow.setTheend(s3.getEnd());<br>
insert(swindow);<br>
end</code><br>
<br>
... Then use the first rule "as is".<br>
<br>
The accumulator option could look like
(I am not really sure the syntax is
correct) :<br>
<br>
<tt>when<br>
<b>$result : ArrayList() from
accumulate ( $s: Sentence(),
buildwindows($s))</b><br>
</tt><tt> a1 : ManualAnnotation ()<br>
a2 : ManualAnnotation (this != a1)<br>
<b> SentenceWindows (this includes
a1, this includes a2) </b></tt><b><tt>from
$result</tt></b><br>
<tt> then<br>
... </tt><tt>do something with a1
and a2 since they are "close" to each
other</tt><br>
<tt> end</tt><br>
<br>
Is it possible to decide if one way is
best than the other?<br>
<br>
And one last question: it is possible to
"parametrize" an accumulator (in order
to provide the number of sentences that
should be put in the windows)?<br>
I mean something like:<br>
<br>
<tt>when<br>
$result : ArrayList() from
accumulate ( $s: Sentence(), <b>buildwindows(3,</b>
$s))<br>
</tt><br>
<br>
Thanks in advance for you insights,<br>
<br>
Best regards,<br>
<font color="#888888"> <br>
Bruno.<br>
</font></div>
<br>
_______________________________________________<br>
rules-users mailing list<br>
<a moz-do-not-send="true"
href="mailto:rules-users@lists.jboss.org"
target="_blank">rules-users@lists.jboss.org</a><br>
<a moz-do-not-send="true"
href="https://lists.jboss.org/mailman/listinfo/rules-users"
target="_blank">https://lists.jboss.org/mailman/listinfo/rules-users</a><br>
<br>
</blockquote>
</div>
<br>
<pre><fieldset></fieldset>
_______________________________________________
rules-users mailing list
<a moz-do-not-send="true" href="mailto:rules-users@lists.jboss.org" target="_blank">rules-users@lists.jboss.org</a>
<a moz-do-not-send="true" href="https://lists.jboss.org/mailman/listinfo/rules-users" target="_blank">https://lists.jboss.org/mailman/listinfo/rules-users</a>
</pre>
</blockquote>
<br>
</div>
<br>
_______________________________________________<br>
rules-users mailing list<br>
<a moz-do-not-send="true"
href="mailto:rules-users@lists.jboss.org"
target="_blank">rules-users@lists.jboss.org</a><br>
<a moz-do-not-send="true"
href="https://lists.jboss.org/mailman/listinfo/rules-users"
target="_blank">https://lists.jboss.org/mailman/listinfo/rules-users</a><br>
<br>
</blockquote>
</div>
<br>
<pre><fieldset></fieldset>
_______________________________________________
rules-users mailing list
<a moz-do-not-send="true" href="mailto:rules-users@lists.jboss.org" target="_blank">rules-users@lists.jboss.org</a>
<a moz-do-not-send="true" href="https://lists.jboss.org/mailman/listinfo/rules-users" target="_blank">https://lists.jboss.org/mailman/listinfo/rules-users</a>
</pre>
</blockquote>
<br>
</div>
</div>
</div>
<br>
_______________________________________________<br>
rules-users mailing list<br>
<a moz-do-not-send="true"
href="mailto:rules-users@lists.jboss.org">rules-users@lists.jboss.org</a><br>
<a moz-do-not-send="true"
href="https://lists.jboss.org/mailman/listinfo/rules-users"
target="_blank">https://lists.jboss.org/mailman/listinfo/rules-users</a><br>
<br>
</blockquote>
</div>
<br>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
rules-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:rules-users@lists.jboss.org">rules-users@lists.jboss.org</a>
<a class="moz-txt-link-freetext" href="https://lists.jboss.org/mailman/listinfo/rules-users">https://lists.jboss.org/mailman/listinfo/rules-users</a>
</pre>
</blockquote>
<br>
</body>
</html>