[shrinkwrap-issues] [JBoss JIRA] (SHRINKDESC-97) XmlDomNodeDescriptorImporterImpl.from(String) uses the platform encoding to parse the String argument resulting in XML parser exceptions when parsing the XML Document

Vineet Reynolds (Created) (JIRA) jira-events at lists.jboss.org
Sat Oct 15 01:03:16 EDT 2011


XmlDomNodeDescriptorImporterImpl.from(String) uses the platform encoding to parse the String argument resulting in XML parser exceptions when parsing the XML Document
----------------------------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: SHRINKDESC-97
                 URL: https://issues.jboss.org/browse/SHRINKDESC-97
             Project: ShrinkWrap Descriptors
          Issue Type: Bug
         Environment: Windows 7 32-bit, Oracle/Sun Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
            Reporter: Vineet Reynolds


Going by the forum reference, the {{XmlDomNodeDescriptorImporterImpl.from(String)}} method in ShrinkWrap Descriptors fails when invoked by {{ConfigurationSysPropResolver.resolveSystemProperties(ArquillianDescriptor)}} in Arquillian-Core, when arquillian.xml contains umlauts. The following exception stacktrace was recorded when running the tests in the arquillian-tomcat-managed-7 project (with a modified arquillian.xml, posted below):

{noformat}
Running org.jboss.arquillian.container.tomcat.managed_7.TomcatManagedClientTestCase
org.apache.maven.surefire.booter.SurefireExecutionException: Could not create a new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor see cause.; nested exception is java.lang.RuntimeException: Could not create a new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor see cause.
java.lang.RuntimeException: Could not create a new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor see cause.
	at org.jboss.arquillian.test.spi.SecurityActions.newInstance(SecurityActions.java:170)
	at org.jboss.arquillian.test.spi.TestRunnerAdaptorBuilder.build(TestRunnerAdaptorBuilder.java:52)
	at org.jboss.arquillian.junit.Arquillian.run(Arquillian.java:72)
	at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59)
	at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:115)
	at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:102)
	at org.apache.maven.surefire.Surefire.run(Surefire.java:180)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350)
	at org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.jboss.arquillian.test.spi.SecurityActions.newInstance(SecurityActions.java:166)
	... 12 more
Caused by: org.jboss.shrinkwrap.descriptor.api.DescriptorImportException: Could not import XML from stream
	at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporterImpl.importAsNode(XmlDomNodeImporterImpl.java:76)
	at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporter.importAsNode(XmlDomNodeImporter.java:46)
	at org.jboss.shrinkwrap.descriptor.spi.node.NodeDescriptorImporterBase.from(NodeDescriptorImporterBase.java:72)
	at org.jboss.shrinkwrap.descriptor.spi.DescriptorImporterBase.from(DescriptorImporterBase.java:142)
	at org.jboss.shrinkwrap.descriptor.spi.DescriptorImporterBase.from(DescriptorImporterBase.java:132)
	at org.jboss.arquillian.config.impl.extension.ConfigurationSysPropResolver.resolveSystemProperties(ConfigurationSysPropResolver.java:55)
	at org.jboss.arquillian.config.impl.extension.ConfigurationRegistrar.loadConfiguration(ConfigurationRegistrar.java:58)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.jboss.arquillian.core.impl.ObserverImpl.invoke(ObserverImpl.java:90)
	at org.jboss.arquillian.core.impl.EventContextImpl.invokeObservers(EventContextImpl.java:99)
	at org.jboss.arquillian.core.impl.EventContextImpl.proceed(EventContextImpl.java:81)
	at org.jboss.arquillian.core.impl.ManagerImpl.fire(ManagerImpl.java:134)
	at org.jboss.arquillian.core.impl.ManagerImpl.fire(ManagerImpl.java:114)
	at org.jboss.arquillian.core.impl.ManagerImpl.start(ManagerImpl.java:260)
	at org.jboss.arquillian.test.impl.EventTestRunnerAdaptor.<init>(EventTestRunnerAdaptor.java:56)
	... 17 more
Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:470)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1416)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2792)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
	at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
	at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
	at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
	at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporterImpl.importAsNode(XmlDomNodeImporterImpl.java:67)
	... 34 more
{noformat}

The arquillian.xml file used in the test, is listed below. The file was saved in the UTF-8 encoding.

{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<arquillian xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://jboss.org/schema/arquillian"
    xsi:schemaLocation="http://jboss.org/schema/arquillian http://jboss.org/schema/arquillian/arquillian_1_0.xsd">

    <container qualifier="tomcat" default="true">
        <configuration>
            <!-- <property name="catalinaHome">${CATALINA_HOME:target/apache-tomcat-7.0.20}</property> -->
            <property name="catalinaHome">D:/Apps/apache-tomcat-7.0.14</property>
            <property name="javaHome">${JAVA_HOME:C:/Java/jdk1.6.0_26}</property>
            <property name="javaVmArguments">-Xmx1024m -XX:MaxPermSize=512m</property>
            <property name="jmxPort">8089</property>
            <property name="bindHttpPort">8080</property>
            <property name="user">Schröder</property>
            <property name="pass">script</property>
            <property name="serverConfig">server.xml</property>
        </configuration>
    </container>
</arquillian>
{code}

The underlying root cause is that {{XmlDomNodeDescriptorImporterImpl.from(String)}} creates a {{ByteArrayInputStream]} from a {{String}} decoded using the platform's default encoding i.e. file.encoding. In the test environment, the platform encoding was Cp1252, while the XML prolog stated that the encoding was UTF-8. This resulted in the XML parser attempting to parse the Cp1252 encoded stream, as a UTF-8 sequence resulting in the parser throwing a {{MalformedByteSequenceException}}.

As far as Arquillian is concerned, the fix is to use {{String.getBytes("UTF-8")}} in this context, as the {{ConfigurationSysPropResolver.resolveSystemProperties(ArquillianDescriptor)}} method in Arquillian-core, obtains an exported descriptor (containing an encoding of UTF-8 in the XML declaration) before invoking {{XmlDomNodeDescriptorImporterImpl.from(String)}}. This is probably not a good idea for all circumstances.

I think that either {{XmlDomNodeDescriptorImporterImpl}} or it's super classes {{NodeDescriptorImporterBase}} or {{DescriptorImporterBase}}, or the interface {{DescriptorImporter}} must provide an additional method {{from(String input,String encoding}} for callers to specify the charset of the input XML string, so that the charset may be specified in the {{String.getBytes}} invocation.

----

It should be noted that the JUnit test in {{SyspropReplacementInArqXmlTestCase}} succeeds in this scenario, as the Maven project's property {{project.build.sourceEncoding}} is set to UTF-8 in the {{org.jboss:jboss-parent:6-beta-2}} super POM, resulting in the {{file.encoding}} JVM property set to UTF-8. This is the suggested workaround if the Maven project does not specify an explicit source file encoding. If the project uses a different source file encoding (like ISO-8859-1), this workaround will fail; see the below noted behavior of Xerces.

----

Specifying the encoding as UTF-8 in the XML declaration is a bit pointless as far as the JAXP implementation in Oracle/Sun JRE is concerned. In this scenario where ShrinkWrap provides an InputStream to the parser, the in-built Xerces parser reads the first 4 bytes of the stream (to check for UTF-16BE/LE, UTF-8 with BOM etc.) to determine the decoding to apply. The default encoding, if Xerces does not find a matching charset, is UTF-8. Xerces does not differentiate between ISO-8859-1, Cp1252 and UTF-8 as the values for the first 4 bytes of a valid XML stream in these encodings are the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       



More information about the shrinkwrap-issues mailing list