[Tomcat, HTTPD, Servlets & JSP] - Unicode character issue - happens only on Linux - jboss-user

Monday, 15 January 2007

Hi,

I have a customer who regularly cuts text from Word documents before pasting them into
forms I created for him on his web site. The text often contains non-UTF-8 characters such
as u2019 for single quotes or u201C for double-quotes. We were having some problems
storing these characters in our database, so I added a filter that replaces them with the
standard quotes from the UTF-8 set.

I tested my work by deploying to a local copy of JBoss on my workstation, which is a
Windows XP computer, and it worked fine. I did the conversion using the String.replace
function, for example:

s = s.replace('\u201C', '"');

However, when I deployed this to my production environment - which has the same version of
Java, and the same version of JBoss, but is Linux - it failed. To see what was going on, I
tried logging all the characters of the input string using s.codePointAt(). It turns out
that instead of getting characters 201C and 2019, I'm getting character FFFD in both
cases.

Does anyone understand why this is happening? I have been working with Java for almost 7
years, and I have never encountered an inconsistency between its behavior on Linux and
Windows before.

Thanks,
Frank

View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4001916#...

Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Tomcat, HTTPD, Servlets & JSP] - Unicode character issue - happens only on Linux