CLOB truncation on DB2 when using 2 or 3 byte chars (UTF8)
----------------------------------------------------------
Key: HHH-2401
URL:
http://opensource.atlassian.com/projects/hibernate/browse/HHH-2401
Project: Hibernate3
Type: Bug
Components: core
Versions: 3.2.0.ga
Environment: Hibernate 3.2.0 GA + JBoss 4.0.4.GA on Windows XP + DB2 UDB for ISeries
V5R3 + IBM JT Open driver 4.9
Reporter: Simon Jongsma
Priority: Minor
Attachments: ClobTruncated.zip
A CLOB column is used in DB2 mapped to a String in Java with the Hibernate
"Text" mapping.
The column in DB2 has a CCSID 1208 which means "UTF8" (= Unicode).
The CLOB truncation occurs when characters are used that are UTF8 coded in more than one
byte.
In that the string is truncated when persisted in the database.
For example the string
"Granpré Molière†; 0123456789". This string has three diacritical marks in it.
The é and è are coded in two bytes in UTF8 and the † in three bytes.
This string will be stored as "Granpré Molière†; 012345".
So "6789" is not stored.
It appears as though Hibernate does not take into account that a character can be more
than 1 byte in UTF8.
The number of missing char's at the end is exactly:
string.getBytes("UTF-8").length minus string.length()
It is not a problem of DB2 or the JT Open driver:
Storing and retrieving (from a Java program) the same String directly via Jdbc into the
DB2 table and retrieving it, works 100% fine.
So this clearly points to a problem somewhere in Hibernate.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://opensource.atlassian.com/projects/hibernate/secure/Administrators....
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira