[jboss-dev-forums] [JBoss ESB Development] - Problem in retrieving WSDL from remote endpoint

David Ward do-not-reply at jboss.com
Tue Apr 27 17:12:48 EDT 2010


David Ward [http://community.jboss.org/people/dward] replied to the discussion

"Problem in retrieving WSDL from remote endpoint"

To view the discussion, visit: http://community.jboss.org/message/539893#539893

--------------------------------------------------------------
The problem is more pervasive than I realized, but I do have a fix.  There are a couple issues that need to be discussed, however, so in the end it is a good thing that this is in the developer forum.   ;) 

First, the fix:
1. StreamUtils.readStreamString(stream, charset):String was incorrectly implemented.  It just read in the bytes and created a new String object with the bytes and the specified charset.  This will +not+ convert the bytes into the specified charset.  So, if the bytes you are reading in are character encoded with KOI8-R instead of UTF-8, for example, then when you output that String, the characters will be garbled.  (For example, in Firefox, they will become question marks.)  Unfortunately, there is +nothing+ in the JDK that can determine what character encoding a set of bytes or stream is, and this is paramount to creating a String that can be outputted correctly.  There is InputStreamReader, but you have to know the encoding of the stream first.    :(   Luckily, there is a utility out there we can use:  http://site.icu-project.org/ ICU4J.  What I did to fix the method was call upon a  http://icu-project.org/apiref/icu4j/com/ibm/icu/text/CharsetDetector.html#getString(byte[], java.lang.String) helper method in CharsetDetector.
2. Next, the HttpGatewayServlet (which is sometimes used for exposing WSDL via <http-gateway/>, for example, for SOAPProxy) had to be changed to get the now UTF-8 encoded bytes, and output that, making sure to set the Content-Length response header to the length of the byte array, +not+ the length of the String:  http://mark.koli.ch/2009/09/remember-kids-an-http-content-length-is-the-number-of-bytes-not-the-number-of-characters.html http://mark.koli.ch/2009/09/remember-kids-an-http-content-length-is-the-number-of-bytes-not-the-number-of-characters.html
3. Finally, the contract.jsp page (which is sometimes used for exposing WSDL via <jbr-provider/>) had to be changed to declare <%@page contentType="text/xml; charset=UTF-8"%> at the top of the page, so that <%=contractData=> would be outputted correctly.  See notes on JSP page directive and its effect here:  http://www.w3.org/International/O-HTTP-charset http://www.w3.org/International/O-HTTP-charset
After making the changes above, I tested successfully using both JDK 5 + ESB in AS4, and JDK 6 + ESB in AS 5.  I also ran a clean integration build.

Next, the two issues that have to be discussed:
1. Are we allowed to add icu4j-4_4.jar and icu4j-charsets-4_4.jar as project dependencies, and ship them with both our community ESB and supported SOA Platform?  I can't seem to find out what the ICU4J license is...
2. While correcting the implementation of StreamUtils.readStreamString(stream, charset):String, any code that called that method will now be guaranteed correct character encoding.  +However+, any code which calls the underlying StreamUtils.readStream(stream):byte[] method, and decides to create it's +own+ String from the returned byte array (instead of just using readStreamString) could have a misinterpretted character encoding problem.  This is all over the place in our code, and I'm a bit hesitant to go change all of it.  Maybe for the purpose of this bug, we just stick to fixing the WSDL encoding problem?  Luckily, this problem only rears its head if the stream being read contains an extended character set and not UTF-8 (like KOI8-R or cp1251, for example).

Feedback appreciated.  Thanks!

--------------------------------------------------------------

Reply to this message by going to Community
[http://community.jboss.org/message/539893#539893]

Start a new discussion in JBoss ESB Development at Community
[http://community.jboss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2032]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/jboss-dev-forums/attachments/20100427/307d64d7/attachment.html 


More information about the jboss-dev-forums mailing list