David Ward [
http://community.jboss.org/people/dward] replied to the discussion
"Problem in retrieving WSDL from remote endpoint"
To view the discussion, visit:
http://community.jboss.org/message/540036#540036
--------------------------------------------------------------
Magesh,
I disagree that you believe StreamUtils.readStreamString(stream, charset):String is
correctly implemented. It creates a String with the passed bytes and passed encoding,
however the passed bytes contain character data which might not be in the passed
encoding! Using ICU4J can fix this.
Specifically, change this:
public static String readStreamString(InputStream stream, String charset) throws
UnsupportedEncodingException {
return new String(StreamUtils.readStream(stream), charset);
}
to this:
public static String readStreamString(InputStream stream, String charset) {
return new CharsetDetector().getString(StreamUtils.readStream(stream), charset);
}
Regarding the statement: "The user application may be expecting only the specified
encoding, whereas due to intelligent scanning we might read all encoded streams", I
do worry about this a bit from a performance perspective (although it has not been
substantiated yet), however I do not worry about it as a security issue. Either way
we're building up a String. I should also note that I didn't change any methods
that just read an InputStream into a byte[] array. I only changed the methods
readStreamString and getResourceAsString - the two utility methods that want those bytes
+as a *String*+, and not just as a data-bucket.
I would be okay with just sticking to fixing the WSDL encoding problem, which would mean
removing my change to StreamUtils and convert the characters "closer" to the
WSDL handling code, however:
1. I wouldn't add a property called "wsdl-encoding" to any Action. Again,
we would be depending on the user to specificy it correctly, which we've already seen
here they don't do.
2. What do we do about all the other areas of our code that need to do String
handling/outputting that are assuming the bytes contain characters encoded as UTF-8, but
they're not???
I think the thing that intrigues me the most about your reply is that non-UTF-8 WSDL
should not be allowed according to Basic Profile. This could change everything. I have
to admit I was ignorant of this. Now, should we:
1. Say "all WSDL must adhere to Basic Profile so we are rejecting this jira." -
Seems a bit harsh, but might be acceptable.
2. Understand that many people (myself included up until now) don't know about this,
or +don't *want* to adhere to it+, so we should just handle the possibility of
non-UTF-8 or UTF-16 encoded documents just in case?
Thanks for the feedback,
David
--------------------------------------------------------------
Reply to this message by going to Community
[
http://community.jboss.org/message/540036#540036]
Start a new discussion in JBoss ESB Development at Community
[
http://community.jboss.org/choose-container!input.jspa?contentType=1&...]