Took a look at this yesterday and could confirm the problem. Seems to be
isolated to MySQL as it works fine on H2 and PostgreSQL at least.
Can you create a JIRA for it please?
On 16 August 2016 at 15:38, Igor Zuk <igor.zuk(a)qualitytaskforce.com> wrote:
Hi Stian
I hope you had a good time during holiday. Your autoresponse asked me to
resend you the email when you’re back, so here it is.
*From:* Igor Zuk
*Sent:* Wednesday, August 03, 2016 4:37 PM
*To:* 'stian(a)redhat.com' <stian(a)redhat.com>
*Cc:* keycloak-user(a)lists.jboss.org
*Subject:* RE: [keycloak-user] Keycloak user data encoding
Sorry for a delayed response, I had to temporarily suspend the
investigation.
Yes, I can reproduce the issue anytime using Docker.
I set up as default as possible environment with latest MySQL:
*docker run --name mysql -e MYSQL_ROOT_PASSWORD=root -e
MYSQL_DATABASE=keycloak -e MYSQL_USER=keycloak -e MYSQL_PASSWORD=keycloak
-P -d mysql:latest*
and Keycloak in the same version as where I found the issue, 1.9.2.Final:
*docker run --name keycloak -e MYSQL_DATABASE=keycloak -e
MYSQL_USER=keycloak -e MYSQL_PASSWORD=keycloak -e
MYSQL_PORT_3306_TCP_ADDR=192.168.99.100 -e MYSQL_PORT_3306_TCP_PORT=32779
-P -d jboss/keycloak-mysql:1.9.2.Final*
The results were identical, special letters in names were replaced with
question marks. It turned out, that Keycloak created all its tables with
the DB's default encoding, latin1 (ISO-8859-1). I've checked it with a
query:
*SELECT character_set_name FROM information_schema.`COLUMNS` WHERE
table_name = "USER_ENTITY" AND column_name = "FIRST_NAME";*
Once again I've manually changed the encoding of a single column:
*ALTER TABLE `USER_ENTITY` MODIFY `FIRST_NAME` VARCHAR(255) CHARACTER SET
utf8;*
It worked, but Keycloak was still putting there names with question marks,
so the issue was fully reproduced.
Because it seems, that Keycloak uses the DB's default encoding, I tried
changing it in MySQL. I've removed its container entirely and started it
with two parameters appended to Docker run command:
*--character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci*
It seems, that it worked, but then Keycloak refused to start at all
throwing exception with error:
*Row size too large. The maximum row size for the used table type, not
counting BLOBs, is 65535. This includes storage overhead, check the manual.
You have to change some columns to TEXT or BLOBs [Failed SQL: ALTER TABLE
keycloak.REALM MODIFY CERTIFICATE VARCHAR(4000)]*
It looks that Keycloak is not able to work with UTF-8 in databases at all!
The full startup log is here:
http://pastebin.com/VMTARqgF
Because 1.9.2.Final is quite dated, I've checked the latest available
MySQL-preconfigured version, 2.0.0.Final. I've repeated all the steps and
the results were identical.
The example name I'm working with is Mściwy Żółw. Only the letter 'ó' is
working, that's because it's encodable in ISO-8859-1.
*From:* Stian Thorgersen [mailto:sthorger@redhat.com <sthorger(a)redhat.com>]
*Sent:* Friday, July 15, 2016 6:48 AM
*To:* Igor Zuk <igor.zuk(a)qualitytaskforce.com>
*Cc:* keycloak-user(a)lists.jboss.org
*Subject:* Re: [keycloak-user] Keycloak user data encoding
It's strange that no one else has reported this. We had several people
report the issue with umlats, but no one else seems to have the issue with
the database encoding. Maybe there's something different with your database
config? Could you try with a default MySQL database installation and see if
you can reproduce the issue? Also, can you give me a sample name that shows
the problems.
I added a test for umlats to registration and account management, see
https://github.com/keycloak/keycloak/pull/3036. Once it's in I'll
schedule a run with CI, which tests with a range of different databases.
On 12 July 2016 at 16:13, Igor Zuk <igor.zuk(a)qualitytaskforce.com> wrote:
Thank you for a quick response.
I’m using 1.9.2.Final and the problem is a bit different, it’s not limited
to registration screen.
I’m saying, that ISO-8859-1 is the default encoding, because all the text
columns in USER_ENTITY table had encoding latin1. The table was created
completely by Keycloak as the database was empty in the beginning. I
manually switched encoding of FIRST_NAME to UTF-8 and modified it so it
contained special letters. I started the user editor in Keycloak admin
console and this name was displayed correctly. I added a single character
to it, saved, and then the name got messed up with question marks instead
of all special characters.
*From:* Stian Thorgersen [mailto:sthorger@redhat.com]
*Sent:* Tuesday, July 12, 2016 3:43 PM
*To:* Stian Thorgersen <stian(a)redhat.com>
*Cc:* Igor Zuk <igor.zuk(a)qualitytaskforce.com>;
keycloak-user(a)lists.jboss.org
*Subject:* Re: [keycloak-user] Keycloak user data encoding
By the way this was fixed in 1.6.0.Final, see
https://issues.jboss.org/
browse/KEYCLOAK-1830?jql=project%20%3D%20KEYCLOAK%20AND%20text%20~%20%
22encoding%22
Are you using an old version?
On 12 July 2016 at 15:37, Stian Thorgersen <sthorger(a)redhat.com> wrote:
Why are you saying the default encoding is ISO-8859-1? All forms are
encoded as UTF-8 and all strings passed to the database should be UTF-8
encoded as well.
The only thing that is ISO-8859-1 is the message properties, but those are
converted to UTF-8 when added to HTML pages.
On 12 July 2016 at 14:58, Igor Zuk <igor.zuk(a)qualitytaskforce.com> wrote:
Hi
I have an encoding problem. By default users' data fields (e.g. first name
and last name) are encoded using ISO-8859-1. People from many countries
can't properly create accounts as their personal data is silently messed
up. How can I fix it?
· The MySQL DB receives already damaged names. By default all
columns are ISO-8859-1-encoded, but manually converting them to UTF-8
doesn't help.
· Manual account modification from admin console has same effect.
· Change of default server (Wildfly) encoding to UTF-8 doesn't do
anything.
Best regards
Igor Żuk
_______________________________________________
keycloak-user mailing list
keycloak-user(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/keycloak-user