David Ward [
http://community.jboss.org/people/dward] replied to the discussion
"Problem in retrieving WSDL from remote endpoint"
To view the discussion, visit:
http://community.jboss.org/message/539893#539893
--------------------------------------------------------------
The problem is more pervasive than I realized, but I do have a fix. There are a couple
issues that need to be discussed, however, so in the end it is a good thing that this is
in the developer forum. ;)
First, the fix:
1. StreamUtils.readStreamString(stream, charset):String was incorrectly implemented. It
just read in the bytes and created a new String object with the bytes and the specified
charset. This will +not+ convert the bytes into the specified charset. So, if the bytes
you are reading in are character encoded with KOI8-R instead of UTF-8, for example, then
when you output that String, the characters will be garbled. (For example, in Firefox,
they will become question marks.) Unfortunately, there is +nothing+ in the JDK that can
determine what character encoding a set of bytes or stream is, and this is paramount to
creating a String that can be outputted correctly. There is InputStreamReader, but you
have to know the encoding of the stream first. :( Luckily, there is a utility out
there we can use:
http://site.icu-project.org/ ICU4J. What I did to fix the method was
call upon a
http://icu-project.org/apiref/icu4j/com/ibm/icu/text/CharsetDetector.html...,
java.lang.String) helper method in CharsetDetector.
2. Next, the HttpGatewayServlet (which is sometimes used for exposing WSDL via
<http-gateway/>, for example, for SOAPProxy) had to be changed to get the now UTF-8
encoded bytes, and output that, making sure to set the Content-Length response header to
the length of the byte array, +not+ the length of the String:
http://mark.koli.ch/2009/09/remember-kids-an-http-content-length-is-the-n...
http://mark.koli.ch/2009/09/remember-kids-an-http-content-length-is-the-n...
3. Finally, the contract.jsp page (which is sometimes used for exposing WSDL via
<jbr-provider/>) had to be changed to declare <%@page contentType="text/xml;
charset=UTF-8"%> at the top of the page, so that <%=contractData=> would be
outputted correctly. See notes on JSP page directive and its effect here:
http://www.w3.org/International/O-HTTP-charset
http://www.w3.org/International/O-HTTP-charset
After making the changes above, I tested successfully using both JDK 5 + ESB in AS4, and
JDK 6 + ESB in AS 5. I also ran a clean integration build.
Next, the two issues that have to be discussed:
1. Are we allowed to add icu4j-4_4.jar and icu4j-charsets-4_4.jar as project dependencies,
and ship them with both our community ESB and supported SOA Platform? I can't seem to
find out what the ICU4J license is...
2. While correcting the implementation of StreamUtils.readStreamString(stream,
charset):String, any code that called that method will now be guaranteed correct character
encoding. +However+, any code which calls the underlying
StreamUtils.readStream(stream):byte[] method, and decides to create it's +own+ String
from the returned byte array (instead of just using readStreamString) could have a
misinterpretted character encoding problem. This is all over the place in our code, and
I'm a bit hesitant to go change all of it. Maybe for the purpose of this bug, we just
stick to fixing the WSDL encoding problem? Luckily, this problem only rears its head if
the stream being read contains an extended character set and not UTF-8 (like KOI8-R or
cp1251, for example).
Feedback appreciated. Thanks!
--------------------------------------------------------------
Reply to this message by going to Community
[
http://community.jboss.org/message/539893#539893]
Start a new discussion in JBoss ESB Development at Community
[
http://community.jboss.org/choose-container!input.jspa?contentType=1&...]