[JBoss JIRA] (SHRINKDESC-97) XmlDomNodeDescriptorImporterImpl.from(String) uses the platform encoding to parse the String argument resulting in XML parser exceptions when parsing the XML Document
by Vineet Reynolds (Created) (JIRA)
XmlDomNodeDescriptorImporterImpl.from(String) uses the platform encoding to parse the String argument resulting in XML parser exceptions when parsing the XML Document
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Key: SHRINKDESC-97
URL: https://issues.jboss.org/browse/SHRINKDESC-97
Project: ShrinkWrap Descriptors
Issue Type: Bug
Environment: Windows 7 32-bit, Oracle/Sun Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Reporter: Vineet Reynolds
Going by the forum reference, the {{XmlDomNodeDescriptorImporterImpl.from(String)}} method in ShrinkWrap Descriptors fails when invoked by {{ConfigurationSysPropResolver.resolveSystemProperties(ArquillianDescriptor)}} in Arquillian-Core, when arquillian.xml contains umlauts. The following exception stacktrace was recorded when running the tests in the arquillian-tomcat-managed-7 project (with a modified arquillian.xml, posted below):
{noformat}
Running org.jboss.arquillian.container.tomcat.managed_7.TomcatManagedClientTestCase
org.apache.maven.surefire.booter.SurefireExecutionException: Could not create a new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor see cause.; nested exception is java.lang.RuntimeException: Could not create a new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor see cause.
java.lang.RuntimeException: Could not create a new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor see cause.
at org.jboss.arquillian.test.spi.SecurityActions.newInstance(SecurityActions.java:170)
at org.jboss.arquillian.test.spi.TestRunnerAdaptorBuilder.build(TestRunnerAdaptorBuilder.java:52)
at org.jboss.arquillian.junit.Arquillian.run(Arquillian.java:72)
at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59)
at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:115)
at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:102)
at org.apache.maven.surefire.Surefire.run(Surefire.java:180)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350)
at org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.jboss.arquillian.test.spi.SecurityActions.newInstance(SecurityActions.java:166)
... 12 more
Caused by: org.jboss.shrinkwrap.descriptor.api.DescriptorImportException: Could not import XML from stream
at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporterImpl.importAsNode(XmlDomNodeImporterImpl.java:76)
at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporter.importAsNode(XmlDomNodeImporter.java:46)
at org.jboss.shrinkwrap.descriptor.spi.node.NodeDescriptorImporterBase.from(NodeDescriptorImporterBase.java:72)
at org.jboss.shrinkwrap.descriptor.spi.DescriptorImporterBase.from(DescriptorImporterBase.java:142)
at org.jboss.shrinkwrap.descriptor.spi.DescriptorImporterBase.from(DescriptorImporterBase.java:132)
at org.jboss.arquillian.config.impl.extension.ConfigurationSysPropResolver.resolveSystemProperties(ConfigurationSysPropResolver.java:55)
at org.jboss.arquillian.config.impl.extension.ConfigurationRegistrar.loadConfiguration(ConfigurationRegistrar.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.jboss.arquillian.core.impl.ObserverImpl.invoke(ObserverImpl.java:90)
at org.jboss.arquillian.core.impl.EventContextImpl.invokeObservers(EventContextImpl.java:99)
at org.jboss.arquillian.core.impl.EventContextImpl.proceed(EventContextImpl.java:81)
at org.jboss.arquillian.core.impl.ManagerImpl.fire(ManagerImpl.java:134)
at org.jboss.arquillian.core.impl.ManagerImpl.fire(ManagerImpl.java:114)
at org.jboss.arquillian.core.impl.ManagerImpl.start(ManagerImpl.java:260)
at org.jboss.arquillian.test.impl.EventTestRunnerAdaptor.<init>(EventTestRunnerAdaptor.java:56)
... 17 more
Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:470)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1416)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2792)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporterImpl.importAsNode(XmlDomNodeImporterImpl.java:67)
... 34 more
{noformat}
The arquillian.xml file used in the test, is listed below. The file was saved in the UTF-8 encoding.
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<arquillian xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://jboss.org/schema/arquillian"
xsi:schemaLocation="http://jboss.org/schema/arquillian http://jboss.org/schema/arquillian/arquillian_1_0.xsd">
<container qualifier="tomcat" default="true">
<configuration>
<!-- <property name="catalinaHome">${CATALINA_HOME:target/apache-tomcat-7.0.20}</property> -->
<property name="catalinaHome">D:/Apps/apache-tomcat-7.0.14</property>
<property name="javaHome">${JAVA_HOME:C:/Java/jdk1.6.0_26}</property>
<property name="javaVmArguments">-Xmx1024m -XX:MaxPermSize=512m</property>
<property name="jmxPort">8089</property>
<property name="bindHttpPort">8080</property>
<property name="user">Schröder</property>
<property name="pass">script</property>
<property name="serverConfig">server.xml</property>
</configuration>
</container>
</arquillian>
{code}
The underlying root cause is that {{XmlDomNodeDescriptorImporterImpl.from(String)}} creates a {{ByteArrayInputStream]} from a {{String}} decoded using the platform's default encoding i.e. file.encoding. In the test environment, the platform encoding was Cp1252, while the XML prolog stated that the encoding was UTF-8. This resulted in the XML parser attempting to parse the Cp1252 encoded stream, as a UTF-8 sequence resulting in the parser throwing a {{MalformedByteSequenceException}}.
As far as Arquillian is concerned, the fix is to use {{String.getBytes("UTF-8")}} in this context, as the {{ConfigurationSysPropResolver.resolveSystemProperties(ArquillianDescriptor)}} method in Arquillian-core, obtains an exported descriptor (containing an encoding of UTF-8 in the XML declaration) before invoking {{XmlDomNodeDescriptorImporterImpl.from(String)}}. This is probably not a good idea for all circumstances.
I think that either {{XmlDomNodeDescriptorImporterImpl}} or it's super classes {{NodeDescriptorImporterBase}} or {{DescriptorImporterBase}}, or the interface {{DescriptorImporter}} must provide an additional method {{from(String input,String encoding}} for callers to specify the charset of the input XML string, so that the charset may be specified in the {{String.getBytes}} invocation.
----
It should be noted that the JUnit test in {{SyspropReplacementInArqXmlTestCase}} succeeds in this scenario, as the Maven project's property {{project.build.sourceEncoding}} is set to UTF-8 in the {{org.jboss:jboss-parent:6-beta-2}} super POM, resulting in the {{file.encoding}} JVM property set to UTF-8. This is the suggested workaround if the Maven project does not specify an explicit source file encoding. If the project uses a different source file encoding (like ISO-8859-1), this workaround will fail; see the below noted behavior of Xerces.
----
Specifying the encoding as UTF-8 in the XML declaration is a bit pointless as far as the JAXP implementation in Oracle/Sun JRE is concerned. In this scenario where ShrinkWrap provides an InputStream to the parser, the in-built Xerces parser reads the first 4 bytes of the stream (to check for UTF-16BE/LE, UTF-8 with BOM etc.) to determine the decoding to apply. The default encoding, if Xerces does not find a matching charset, is UTF-8. Xerces does not differentiate between ISO-8859-1, Cp1252 and UTF-8 as the values for the first 4 bytes of a valid XML stream in these encodings are the same.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira