[JBoss JIRA] (SHRINKDESC-97) XmlDomNodeDescriptorImporterImpl.from(String) uses the platform encoding to parse the String argument resulting in XML parser exceptions when parsing the XML Document
by Wolfgang Knauf (Jira)
[ https://issues.jboss.org/browse/SHRINKDESC-97?page=com.atlassian.jira.plu... ]
Wolfgang Knauf commented on SHRINKDESC-97:
------------------------------------------
This 8 years old issue is still present - at least in the WildFly14 toolchain, which seems to include Arquillian 1.1.13 (https://repository.jboss.org/nexus/content/groups/public/org/wildfly/bom/...).
For me, reading "arquillian.xml" failed because of an Umlaut in a comment.
I ran my an integration test with the "maven-failsafe" plugin using Eclipse.
The above workaround did not help - "project.build.sourceEncoding" was already set to "UTF-8".
It worked after setting environment variable "JAVA_TOOL_OPTIONS" to "-Dfile.encoding=UTF8" in the environment variables section of the run configuration.
What can be done to get this issue fixed ;-)?
> XmlDomNodeDescriptorImporterImpl.from(String) uses the platform encoding to parse the String argument resulting in XML parser exceptions when parsing the XML Document
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SHRINKDESC-97
> URL: https://issues.jboss.org/browse/SHRINKDESC-97
> Project: ShrinkWrap Descriptors
> Issue Type: Bug
> Environment: Windows 7 32-bit, Oracle/Sun Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Reporter: Vineet Reynolds
> Priority: Major
>
> Going by the forum reference, the {{XmlDomNodeDescriptorImporterImpl.from(String)}} method in ShrinkWrap Descriptors fails when invoked by {{ConfigurationSysPropResolver.resolveSystemProperties(ArquillianDescriptor)}} in Arquillian-Core, when arquillian.xml contains umlauts. The following exception stacktrace was recorded when running the tests in the arquillian-tomcat-managed-7 project (with a modified arquillian.xml, posted below):
> {noformat}
> Running org.jboss.arquillian.container.tomcat.managed_7.TomcatManagedClientTestCase
> org.apache.maven.surefire.booter.SurefireExecutionException: Could not create a new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor see cause.; nested exception is java.lang.RuntimeException: Could not create a new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor see cause.
> java.lang.RuntimeException: Could not create a new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor see cause.
> at org.jboss.arquillian.test.spi.SecurityActions.newInstance(SecurityActions.java:170)
> at org.jboss.arquillian.test.spi.TestRunnerAdaptorBuilder.build(TestRunnerAdaptorBuilder.java:52)
> at org.jboss.arquillian.junit.Arquillian.run(Arquillian.java:72)
> at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59)
> at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:115)
> at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:102)
> at org.apache.maven.surefire.Surefire.run(Surefire.java:180)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350)
> at org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at org.jboss.arquillian.test.spi.SecurityActions.newInstance(SecurityActions.java:166)
> ... 12 more
> Caused by: org.jboss.shrinkwrap.descriptor.api.DescriptorImportException: Could not import XML from stream
> at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporterImpl.importAsNode(XmlDomNodeImporterImpl.java:76)
> at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporter.importAsNode(XmlDomNodeImporter.java:46)
> at org.jboss.shrinkwrap.descriptor.spi.node.NodeDescriptorImporterBase.from(NodeDescriptorImporterBase.java:72)
> at org.jboss.shrinkwrap.descriptor.spi.DescriptorImporterBase.from(DescriptorImporterBase.java:142)
> at org.jboss.shrinkwrap.descriptor.spi.DescriptorImporterBase.from(DescriptorImporterBase.java:132)
> at org.jboss.arquillian.config.impl.extension.ConfigurationSysPropResolver.resolveSystemProperties(ConfigurationSysPropResolver.java:55)
> at org.jboss.arquillian.config.impl.extension.ConfigurationRegistrar.loadConfiguration(ConfigurationRegistrar.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.jboss.arquillian.core.impl.ObserverImpl.invoke(ObserverImpl.java:90)
> at org.jboss.arquillian.core.impl.EventContextImpl.invokeObservers(EventContextImpl.java:99)
> at org.jboss.arquillian.core.impl.EventContextImpl.proceed(EventContextImpl.java:81)
> at org.jboss.arquillian.core.impl.ManagerImpl.fire(ManagerImpl.java:134)
> at org.jboss.arquillian.core.impl.ManagerImpl.fire(ManagerImpl.java:114)
> at org.jboss.arquillian.core.impl.ManagerImpl.start(ManagerImpl.java:260)
> at org.jboss.arquillian.test.impl.EventTestRunnerAdaptor.<init>(EventTestRunnerAdaptor.java:56)
> ... 17 more
> Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
> at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
> at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:470)
> at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
> at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1416)
> at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2792)
> at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
> at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
> at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
> at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
> at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
> at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
> at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
> at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
> at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporterImpl.importAsNode(XmlDomNodeImporterImpl.java:67)
> ... 34 more
> {noformat}
> The arquillian.xml file used in the test, is listed below. The file was saved in the UTF-8 encoding.
> {code:xml}
> <?xml version="1.0" encoding="UTF-8"?>
> <arquillian xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://jboss.org/schema/arquillian"
> xsi:schemaLocation="http://jboss.org/schema/arquillian http://jboss.org/schema/arquillian/arquillian_1_0.xsd">
> <container qualifier="tomcat" default="true">
> <configuration>
> <!-- <property name="catalinaHome">${CATALINA_HOME:target/apache-tomcat-7.0.20}</property> -->
> <property name="catalinaHome">D:/Apps/apache-tomcat-7.0.14</property>
> <property name="javaHome">${JAVA_HOME:C:/Java/jdk1.6.0_26}</property>
> <property name="javaVmArguments">-Xmx1024m -XX:MaxPermSize=512m</property>
> <property name="jmxPort">8089</property>
> <property name="bindHttpPort">8080</property>
> <property name="user">Schröder</property>
> <property name="pass">script</property>
> <property name="serverConfig">server.xml</property>
> </configuration>
> </container>
> </arquillian>
> {code}
> The underlying root cause is that {{XmlDomNodeDescriptorImporterImpl.from(String)}} creates a {{ByteArrayInputStream]} from a {{String}} decoded using the platform's default encoding i.e. file.encoding. In the test environment, the platform encoding was Cp1252, while the XML prolog stated that the encoding was UTF-8. This resulted in the XML parser attempting to parse the Cp1252 encoded stream, as a UTF-8 sequence resulting in the parser throwing a {{MalformedByteSequenceException}}.
> As far as Arquillian is concerned, the fix is to use {{String.getBytes("UTF-8")}} in this context, as the {{ConfigurationSysPropResolver.resolveSystemProperties(ArquillianDescriptor)}} method in Arquillian-core, obtains an exported descriptor (containing an encoding of UTF-8 in the XML declaration) before invoking {{XmlDomNodeDescriptorImporterImpl.from(String)}}. This is probably not a good idea for all circumstances.
> I think that either {{XmlDomNodeDescriptorImporterImpl}} or it's super classes {{NodeDescriptorImporterBase}} or {{DescriptorImporterBase}}, or the interface {{DescriptorImporter}} must provide an additional method {{from(String input,String encoding}} for callers to specify the charset of the input XML string, so that the charset may be specified in the {{String.getBytes}} invocation.
> ----
> It should be noted that the JUnit test in {{SyspropReplacementInArqXmlTestCase}} succeeds in this scenario, as the Maven project's property {{project.build.sourceEncoding}} is set to UTF-8 in the {{org.jboss:jboss-parent:6-beta-2}} super POM, resulting in the {{file.encoding}} JVM property set to UTF-8. This is the suggested workaround if the Maven project does not specify an explicit source file encoding. If the project uses a different source file encoding (like ISO-8859-1), this workaround will fail; see the below noted behavior of Xerces.
> ----
> Specifying the encoding as UTF-8 in the XML declaration is a bit pointless as far as the JAXP implementation in Oracle/Sun JRE is concerned. In this scenario where ShrinkWrap provides an InputStream to the parser, the in-built Xerces parser reads the first 4 bytes of the stream (to check for UTF-16BE/LE, UTF-8 with BOM etc.) to determine the decoding to apply. The default encoding, if Xerces does not find a matching charset, is UTF-8. Xerces does not differentiate between ISO-8859-1, Cp1252 and UTF-8 as the values for the first 4 bytes of a valid XML stream in these encodings are the same.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 6 months