[rules-users] UTF8 characters corrupted when embedded in DRL files

crafferty chris.rafferty at sidonis.com
Mon Aug 9 07:13:04 EDT 2010


Drools, version, 5.0.1 running on Java 1.6.0_12

Our application creates rules in DRL files,  the rules contain non-Latin
characters,  here is an example rule:

//****************************************************************************
//
// TRIGGER_ RULE: Detect Dual Receiver Failure
//
//****************************************************************************

rule TRIGGER_Detect_Dual_Receiver_Failure_43296731577724
	no-loop true
	salience 0

	when

// When a Child Group exists with:

		child0:ChildGroup(
//	base class is "com.sidonis.vp.dtvexample.BroadcastEquipment"

			baseClass == "com.sidonis.vp.dtvexample.BroadcastEquipment",


//	parent base class is "com.sidonis.vp.dtvexample.BroadcastEquipment"

			parentBaseClass == "com.sidonis.vp.dtvexample.BroadcastEquipment",


//	parent sub class starts with "DualReceiver"

			child0_parentSubClass:parentSubClass ->
(child0_parentSubClass.startsWith( "DualReceiver" )),


//	failed count is 2

			failed == 2
		)
	then
		 StatisticsSystemRuleFunctions.incrementTrigRuleCount();
		StatePropagationFunctions.trigForceParentFailedViaChild(
"TRIGGER_Detect_Dual_Receiver_Failure_43296731577724",
			 "あえいおう",
			 child0,
			 true );
	end
	
When the static method trigForceParentFailedViaChild is invoke as a
consequence of fullfilling the rule conditions,  the non-latin text in the
second argument is corrupted when output to a file.

The DRL is loaded as a resource by our application and passed to Drools
thus:

		try {
			final FileInputStream fis = new FileInputStream(drlFile);
			fr = new InputStreamReader(fis, textEncoding);
		
		} catch (FileNotFoundException e1) {
				log.error("Cannot open DRL file: " + drlFile.getName());
				return null;
		} catch (UnsupportedEncodingException e) {
				log.error("Text encoding configuration problem " + textEncoding);
				return null;
		}
		// Load DRL file into PackageBuilder and create Package
		Resource resource = ResourceFactory.newReaderResource(fr, textEncoding);
		
		KnowledgeBuilder knowledgeBuilder =
KnowledgeBuilderFactory.newKnowledgeBuilder(knowledgeBuilderConfiguration);

		knowledgeBuilder.add(resource, ResourceType.DRL);

Note, 'textEncoding' is loaded from a properties file by the application and
is set to "UTF8".   The system property "file.encoding" for the
application's JVM is also set to "UTF8".

We have also tried using the ResourceFactory's newFileResource passing it
the drlFile Object, but to no avail, the result is the same.

So we have debugged the Drools 5.0.1 source and discovered that when the
stream representing the DRL is wrapped by the ANTLRInputStream, line 320, 
org.drools.compiler.DrlParser (drools-compiler project) the encoding of the
input stream is not passed on to the ANTLRInputStream that is used to wrap
it. If we modify the Drools code to use the overloaded ANTLRInputStream
constructor that accepts the stream encoding then the non-Latin text is
passed correctly to our method.    We've also checked the code on the head
of the repository and it is same.  Although this seems like a bug, I want to
make sure we're not doing something wrong first.   

-- 
View this message in context: http://drools-java-rules-engine.46999.n3.nabble.com/UTF8-characters-corrupted-when-embedded-in-DRL-files-tp1054399p1054399.html
Sent from the Drools - User mailing list archive at Nabble.com.




More information about the rules-users mailing list