1) Finding all the duplicate pairs is causing combinatorial explosion.
With 30000 objects the rule will compare every object against every
other object which requires running the run about 450 MILLION times. If
you must discover every pair that is a duplicate, then I think you have
a hard problem. However, if you can stop considering an object once you
see it is a duplicate of at least one other datum, then you can add a
field to your object to record that it is already a duplicate and remove
it from further consideration, something like the following:
public class Data {
private int id =0;
private boolean isDuplicated = false;
public Data(int id) {
this.id = id;
}
public int getId() {
return id;
}
public boolean getIsDuplicated() {
return isDuplicated;
}
public void setIsDuplicated( boolean isDuplicated ) {
this.isDuplicated = isDuplicated;
}
}
rule "Unique data"
when
data : Data()
old : Data( isDuplicated == false, this != data, this.id == data.id)
then
log.log("Following data are not unique: " + data.getId() + " and
" + old.getId());
$data.setIsDuplicated( true );
$old.setIsDuplicated( true );
end
--------------------
2) I'm pretty sure the test, this != $data, is redundant since it's
understood that the condition is relating two *different* objects of the
same class. Tagging the id fields will make rule a little more efficient
and easier to read. There's no need to use 'this.id', it's understood
that 'id' is the id field of old.
rule "Unique data"
when
$data1: Data( $id1: id )
$data2: Data( isDuplicated == false, id == $id1 )
then
log.log( "Duplicated id: " + $id1 );
$data1.setIsDuplicated( true );
$data2.setIsDuplicated( true );
end
( I like to use dollar signs at the front of tag names because it makes
it easier to distinguish what is what. You don't have to. )
ygaurav wrote:
Hi All
I am new to drools and I am trying to see if we can use it. I have a simple
file
public class Data {
private int id =0;
public Data(int id) {
this.id = id;
}
public int getId() {
return id;
}
}
rule "Unique data"
when
data : Data()
old : Data(this != data, this.id == data.id)
then
log.log("Following data are not unique: " + data.getId() + " and
" +
old.getId());
end
When I try to load 30,000 of data in memory it takes long time ( around 12
hours ) Can anybody suggests a better way of doing it.
Thanks
Gaurav