Re: [jboss-dev-forums] [JBoss ESB Development] - Problem in retrieving WSDL from remote endpoint
by David Ward
David Ward [http://community.jboss.org/people/dward] replied to the discussion
"Problem in retrieving WSDL from remote endpoint"
To view the discussion, visit: http://community.jboss.org/message/540047#540047
--------------------------------------------------------------
Well, if you look at everywhere that calls that method, our code consistently hardcodes "UTF-8". So maybe we should question why are we passing in a charset anyway?
Regarding determining the charset, I have done a lot of reading on this, and you can't always depend on magic bytes to tell you the encoding, so some amount of the bytes have to be reviewed and matched with conversion tables to find a match. On ICU4J's homepage, "+ICU's conversion tables are based on *charset data collected by IBM over the course of many decades*, and is the most complete available anywhere.+" I would be likely to believe this, as I had read elsewhere that many of the I18N code in JDK >=1.1 came from them and was actually fed directly into the Java language via partnership. "IBM and the ICU team played a key role in providing globalization technology into Sun's Java".
--------------------------------------------------------------
Reply to this message by going to Community
[http://community.jboss.org/message/540047#540047]
Start a new discussion in JBoss ESB Development at Community
[http://community.jboss.org/choose-container!input.jspa?contentType=1&cont...]
14 years, 10 months
Re: [jboss-dev-forums] [JBoss ESB Development] - Problem in retrieving WSDL from remote endpoint
by David Ward
David Ward [http://community.jboss.org/people/dward] replied to the discussion
"Problem in retrieving WSDL from remote endpoint"
To view the discussion, visit: http://community.jboss.org/message/540036#540036
--------------------------------------------------------------
Magesh,
I disagree that you believe StreamUtils.readStreamString(stream, charset):String is correctly implemented. It creates a String with the passed bytes and passed encoding, however the passed bytes contain character data which might not be in the passed encoding! Using ICU4J can fix this.
Specifically, change this:
public static String readStreamString(InputStream stream, String charset) throws UnsupportedEncodingException {
return new String(StreamUtils.readStream(stream), charset);
}
to this:
public static String readStreamString(InputStream stream, String charset) {
return new CharsetDetector().getString(StreamUtils.readStream(stream), charset);
}
Regarding the statement: "The user application may be expecting only the specified encoding, whereas due to intelligent scanning we might read all encoded streams", I do worry about this a bit from a performance perspective (although it has not been substantiated yet), however I do not worry about it as a security issue. Either way we're building up a String. I should also note that I didn't change any methods that just read an InputStream into a byte[] array. I only changed the methods readStreamString and getResourceAsString - the two utility methods that want those bytes +as a *String*+, and not just as a data-bucket.
I would be okay with just sticking to fixing the WSDL encoding problem, which would mean removing my change to StreamUtils and convert the characters "closer" to the WSDL handling code, however:
1. I wouldn't add a property called "wsdl-encoding" to any Action. Again, we would be depending on the user to specificy it correctly, which we've already seen here they don't do.
2. What do we do about all the other areas of our code that need to do String handling/outputting that are assuming the bytes contain characters encoded as UTF-8, but they're not???
I think the thing that intrigues me the most about your reply is that non-UTF-8 WSDL should not be allowed according to Basic Profile. This could change everything. I have to admit I was ignorant of this. Now, should we:
1. Say "all WSDL must adhere to Basic Profile so we are rejecting this jira." - Seems a bit harsh, but might be acceptable.
2. Understand that many people (myself included up until now) don't know about this, or +don't *want* to adhere to it+, so we should just handle the possibility of non-UTF-8 or UTF-16 encoded documents just in case?
Thanks for the feedback,
David
--------------------------------------------------------------
Reply to this message by going to Community
[http://community.jboss.org/message/540036#540036]
Start a new discussion in JBoss ESB Development at Community
[http://community.jboss.org/choose-container!input.jspa?contentType=1&cont...]
14 years, 10 months
Re: [jboss-dev-forums] [JBoss ESB Development] - Problem in retrieving WSDL from remote endpoint
by David Ward
David Ward [http://community.jboss.org/people/dward] replied to the discussion
"Problem in retrieving WSDL from remote endpoint"
To view the discussion, visit: http://community.jboss.org/message/540034#540034
--------------------------------------------------------------
Keith,
Yes, the original example WSDL declares UTF-8, incorrectly. However, my mention of KOI8-R above is not directly related to the original example WSDL. A fellow Red-Hatter sent me various other Russian WSDL examples, with encodings in CP-1251, KOI8-R and UTF-8, and that's what I've been testing with and got working correctly by reading the byte stream into a String using ICU4J. My change also handles the situation of the original example WSDL, incorrect declaration and all.
There is shared code for retrieving the WSDL, so I don't want to do something different for http:// http:// (looking for the Content-Type field) than I do for reading in from file:// or classpath://. Besides, the Content-Type field could be wrong, just as it we saw the declaration was wrong in the original example WSDL.
David
--------------------------------------------------------------
Reply to this message by going to Community
[http://community.jboss.org/message/540034#540034]
Start a new discussion in JBoss ESB Development at Community
[http://community.jboss.org/choose-container!input.jspa?contentType=1&cont...]
14 years, 10 months
Re: [jboss-dev-forums] [JBoss ESB Development] - Problem in retrieving WSDL from remote endpoint
by Magesh Bojan
Magesh Bojan [http://community.jboss.org/people/mageshbk%40jboss.com] replied to the discussion
"Problem in retrieving WSDL from remote endpoint"
To view the discussion, visit: http://community.jboss.org/message/540032#540032
--------------------------------------------------------------
> StreamUtils.readStreamString(stream, charset):String was incorrectly implemented.
Well it is correctly implemented, it is meant to read bytes to String using the passed encoding and not for conversion.
> Unfortunately, there is nothing in the JDK that can determine what character encoding a set of bytes or stream is, and this is paramount to creating a String that can be outputted correctly.
Yes, that is why most applications do not try to guess the encoding themselves. Guessing/Intelligently identifying the encoding could pose problems, for e.g.,
"The user application may be expecting only the specified encoding, whereas due to intelligent scanning we might read all encoded streams"
Will this lead to Security issue? I do not know ATM.
> Maybe for the purpose of this bug, we just stick to fixing the WSDL encoding problem?
I would do this. A setting in the SOAPProxy action like this would be sufficient for downloading and converting the file.
<action name="proxy" ..>
<property name="wsdl-encoding" value="ISO-8859-1" />
</action>
But the major setback is this[1] as the BasicProfile mandates the WSDL and SOAP be only UTF-8 or UTF-16.
One more thing is that the attached ItemService.wsdl is actually "ISO-8859-1" encoded file, but just has been specified as "UTF-8":
This type of WSDL should not be allowed according to BP!
[1] http://www.ws-i.org/Profiles/BasicProfile-1.0-2004-04-16.html#refinement1... http://www.ws-i.org/Profiles/BasicProfile-1.0-2004-04-16.html#refinement1...
--------------------------------------------------------------
Reply to this message by going to Community
[http://community.jboss.org/message/540032#540032]
Start a new discussion in JBoss ESB Development at Community
[http://community.jboss.org/choose-container!input.jspa?contentType=1&cont...]
14 years, 10 months