Hi,
I have a customer who regularly cuts text from Word documents before pasting them into
forms I created for him on his web site. The text often contains non-UTF-8 characters such
as u2019 for single quotes or u201C for double-quotes. We were having some problems
storing these characters in our database, so I added a filter that replaces them with the
standard quotes from the UTF-8 set.
I tested my work by deploying to a local copy of JBoss on my workstation, which is a
Windows XP computer, and it worked fine. I did the conversion using the String.replace
function, for example:
s = s.replace('\u201C', '"');
However, when I deployed this to my production environment - which has the same version of
Java, and the same version of JBoss, but is Linux - it failed. To see what was going on, I
tried logging all the characters of the input string using s.codePointAt(). It turns out
that instead of getting characters 201C and 2019, I'm getting character FFFD in both
cases.
Does anyone understand why this is happening? I have been working with Java for almost 7
years, and I have never encountered an inconsistency between its behavior on Linux and
Windows before.
Thanks,
Frank
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4001916#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...