[jbossws-issues] [JBoss JIRA] Updated: (JBWS-1716) Erroneous UTF-8 character encoding when marshalling on machines with non-UTF-8 default encoding
Thomas Diesler (JIRA)
jira-events at lists.jboss.org
Fri Jul 6 08:29:08 EDT 2007
[ http://jira.jboss.com/jira/browse/JBWS-1716?page=all ]
Thomas Diesler updated JBWS-1716:
---------------------------------
Component/s: jbossws-jaxrpc
jbossws-jaxws
> Erroneous UTF-8 character encoding when marshalling on machines with non-UTF-8 default encoding
> -----------------------------------------------------------------------------------------------
>
> Key: JBWS-1716
> URL: http://jira.jboss.com/jira/browse/JBWS-1716
> Project: JBoss Web Services
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: jbossws-jaxws, jbossws-jaxrpc
> Affects Versions: jbossws-1.2.1
> Reporter: floe fliep
> Fix For: jbossws-2.0.1
>
> Attachments: a-jbossws1.2.1GA-utf8-patch.jar
>
>
> When sending a client request which includes a non-ASCII UTF-8 character such as the "ç" in "Français" on a machine which has the default character encoding set to something different than UTF-8, the encoding is erroneous. For example, the "ç" in the example above is marshalled on the network stream as 0xC3 0x83 0xC2 0xA7 instead of the legal UTF-8 sequence being 0xC3 0xA7, when the machine's default character set is set to MS1252 in this case (Windows).
> A fix for this is setting the system property file.encoding=utf-8, but this causes as many problems elsewhere as it fixes (especially in the case of legacy platform-specific file reading) ... .
> A forum post is highly likely to expose the same phenomenon: http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4030510#4030510
> After some good hours of stepping through the JBossWS code, I discovered what I guess must be the culprit in the method XMLFragment.writeSourceInternal(Writer writer):
> ....
> if (reader == null)
> reader = new InputStreamReader(streamSource.getInputStream());
> Here streamSource.getInputStream() is an already UTF-8 encoded stream. However, when a new instance of InputStreamReader is created around it, it will be set to the machine's default character encoding, thus effectively interpreting bytes from the UTF-8 stream in a different encoding scheme, resulting in corrupted data.
> Each time data passes through the marschalling corruption is added, effectively worsening wrong character count when data is passed back and forth.
> I would suggest attaching a reader to the StreamSource source instance var so that it keeps track of its encoding, but that might break things elsewhere ...
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jbossws-issues
mailing list