[shrinkwrap-issues] [JBoss JIRA] (SHRINKDESC-97) XmlDomNodeDescriptorImporterImpl.from(String) uses the platform encoding to parse the String argument resulting in XML parser exceptions when parsing the XML Document

Vineet Reynolds (Updated) (JIRA) jira-events at lists.jboss.org
Sat Oct 15 01:05:16 EDT 2011


     [ https://issues.jboss.org/browse/SHRINKDESC-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vineet Reynolds updated SHRINKDESC-97:
--------------------------------------

    Workaround Description: 
Start the JVM with the property {{file.encoding}} set to UTF-8. In Maven, set the project property {{project.build.sourceEncoding}} to UTF-8:

{code:xml}
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
...
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>
...
</project>
{code}

The arquillian.xml file must also be encoded in UTF-8.

  was:
Start the JVM with the property {{file.encoding}} set to UTF-8. In Maven, set the project property {{project.build.sourceEncoding}} to UTF-8:

{{code:xml}}
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
...
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>
...
</project>
{{code}}

The arquillian.xml file must also be encoded in UTF-8.

           Forum Reference: http://community.jboss.org/thread/173520?tstart=0  (was: http://community.jboss.org/thread/173520?tstart=0)

    
> XmlDomNodeDescriptorImporterImpl.from(String) uses the platform encoding to parse the String argument resulting in XML parser exceptions when parsing the XML Document
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SHRINKDESC-97
>                 URL: https://issues.jboss.org/browse/SHRINKDESC-97
>             Project: ShrinkWrap Descriptors
>          Issue Type: Bug
>         Environment: Windows 7 32-bit, Oracle/Sun Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
>            Reporter: Vineet Reynolds
>
> Going by the forum reference, the {{XmlDomNodeDescriptorImporterImpl.from(String)}} method in ShrinkWrap Descriptors fails when invoked by {{ConfigurationSysPropResolver.resolveSystemProperties(ArquillianDescriptor)}} in Arquillian-Core, when arquillian.xml contains umlauts. The following exception stacktrace was recorded when running the tests in the arquillian-tomcat-managed-7 project (with a modified arquillian.xml, posted below):
> {noformat}
> Running org.jboss.arquillian.container.tomcat.managed_7.TomcatManagedClientTestCase
> org.apache.maven.surefire.booter.SurefireExecutionException: Could not create a new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor see cause.; nested exception is java.lang.RuntimeException: Could not create a new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor see cause.
> java.lang.RuntimeException: Could not create a new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor see cause.
> 	at org.jboss.arquillian.test.spi.SecurityActions.newInstance(SecurityActions.java:170)
> 	at org.jboss.arquillian.test.spi.TestRunnerAdaptorBuilder.build(TestRunnerAdaptorBuilder.java:52)
> 	at org.jboss.arquillian.junit.Arquillian.run(Arquillian.java:72)
> 	at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59)
> 	at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:115)
> 	at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:102)
> 	at org.apache.maven.surefire.Surefire.run(Surefire.java:180)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350)
> 	at org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021)
> Caused by: java.lang.reflect.InvocationTargetException
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at org.jboss.arquillian.test.spi.SecurityActions.newInstance(SecurityActions.java:166)
> 	... 12 more
> Caused by: org.jboss.shrinkwrap.descriptor.api.DescriptorImportException: Could not import XML from stream
> 	at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporterImpl.importAsNode(XmlDomNodeImporterImpl.java:76)
> 	at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporter.importAsNode(XmlDomNodeImporter.java:46)
> 	at org.jboss.shrinkwrap.descriptor.spi.node.NodeDescriptorImporterBase.from(NodeDescriptorImporterBase.java:72)
> 	at org.jboss.shrinkwrap.descriptor.spi.DescriptorImporterBase.from(DescriptorImporterBase.java:142)
> 	at org.jboss.shrinkwrap.descriptor.spi.DescriptorImporterBase.from(DescriptorImporterBase.java:132)
> 	at org.jboss.arquillian.config.impl.extension.ConfigurationSysPropResolver.resolveSystemProperties(ConfigurationSysPropResolver.java:55)
> 	at org.jboss.arquillian.config.impl.extension.ConfigurationRegistrar.loadConfiguration(ConfigurationRegistrar.java:58)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.jboss.arquillian.core.impl.ObserverImpl.invoke(ObserverImpl.java:90)
> 	at org.jboss.arquillian.core.impl.EventContextImpl.invokeObservers(EventContextImpl.java:99)
> 	at org.jboss.arquillian.core.impl.EventContextImpl.proceed(EventContextImpl.java:81)
> 	at org.jboss.arquillian.core.impl.ManagerImpl.fire(ManagerImpl.java:134)
> 	at org.jboss.arquillian.core.impl.ManagerImpl.fire(ManagerImpl.java:114)
> 	at org.jboss.arquillian.core.impl.ManagerImpl.start(ManagerImpl.java:260)
> 	at org.jboss.arquillian.test.impl.EventTestRunnerAdaptor.<init>(EventTestRunnerAdaptor.java:56)
> 	... 17 more
> Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
> 	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
> 	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:470)
> 	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
> 	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1416)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2792)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
> 	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
> 	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
> 	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
> 	at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
> 	at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
> 	at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
> 	at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporterImpl.importAsNode(XmlDomNodeImporterImpl.java:67)
> 	... 34 more
> {noformat}
> The arquillian.xml file used in the test, is listed below. The file was saved in the UTF-8 encoding.
> {code:xml}
> <?xml version="1.0" encoding="UTF-8"?>
> <arquillian xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://jboss.org/schema/arquillian"
>     xsi:schemaLocation="http://jboss.org/schema/arquillian http://jboss.org/schema/arquillian/arquillian_1_0.xsd">
>     <container qualifier="tomcat" default="true">
>         <configuration>
>             <!-- <property name="catalinaHome">${CATALINA_HOME:target/apache-tomcat-7.0.20}</property> -->
>             <property name="catalinaHome">D:/Apps/apache-tomcat-7.0.14</property>
>             <property name="javaHome">${JAVA_HOME:C:/Java/jdk1.6.0_26}</property>
>             <property name="javaVmArguments">-Xmx1024m -XX:MaxPermSize=512m</property>
>             <property name="jmxPort">8089</property>
>             <property name="bindHttpPort">8080</property>
>             <property name="user">Schröder</property>
>             <property name="pass">script</property>
>             <property name="serverConfig">server.xml</property>
>         </configuration>
>     </container>
> </arquillian>
> {code}
> The underlying root cause is that {{XmlDomNodeDescriptorImporterImpl.from(String)}} creates a {{ByteArrayInputStream]} from a {{String}} decoded using the platform's default encoding i.e. file.encoding. In the test environment, the platform encoding was Cp1252, while the XML prolog stated that the encoding was UTF-8. This resulted in the XML parser attempting to parse the Cp1252 encoded stream, as a UTF-8 sequence resulting in the parser throwing a {{MalformedByteSequenceException}}.
> As far as Arquillian is concerned, the fix is to use {{String.getBytes("UTF-8")}} in this context, as the {{ConfigurationSysPropResolver.resolveSystemProperties(ArquillianDescriptor)}} method in Arquillian-core, obtains an exported descriptor (containing an encoding of UTF-8 in the XML declaration) before invoking {{XmlDomNodeDescriptorImporterImpl.from(String)}}. This is probably not a good idea for all circumstances.
> I think that either {{XmlDomNodeDescriptorImporterImpl}} or it's super classes {{NodeDescriptorImporterBase}} or {{DescriptorImporterBase}}, or the interface {{DescriptorImporter}} must provide an additional method {{from(String input,String encoding}} for callers to specify the charset of the input XML string, so that the charset may be specified in the {{String.getBytes}} invocation.
> ----
> It should be noted that the JUnit test in {{SyspropReplacementInArqXmlTestCase}} succeeds in this scenario, as the Maven project's property {{project.build.sourceEncoding}} is set to UTF-8 in the {{org.jboss:jboss-parent:6-beta-2}} super POM, resulting in the {{file.encoding}} JVM property set to UTF-8. This is the suggested workaround if the Maven project does not specify an explicit source file encoding. If the project uses a different source file encoding (like ISO-8859-1), this workaround will fail; see the below noted behavior of Xerces.
> ----
> Specifying the encoding as UTF-8 in the XML declaration is a bit pointless as far as the JAXP implementation in Oracle/Sun JRE is concerned. In this scenario where ShrinkWrap provides an InputStream to the parser, the in-built Xerces parser reads the first 4 bytes of the stream (to check for UTF-16BE/LE, UTF-8 with BOM etc.) to determine the decoding to apply. The default encoding, if Xerces does not find a matching charset, is UTF-8. Xerces does not differentiate between ISO-8859-1, Cp1252 and UTF-8 as the values for the first 4 bytes of a valid XML stream in these encodings are the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       



More information about the shrinkwrap-issues mailing list