[hibernate-issues] [Hibernate-JIRA] Commented: (HHH-2401) CLOB truncation on DB2 when using 2 or 3 byte chars (UTF8)

Thu Jan 29 03:25:38 EST 2009

    [ http://opensource.atlassian.com/projects/hibernate/browse/HHH-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=32217#action_32217 ] 

Simon Jongsma commented on HHH-2401:
------------------------------------

This issue was logged internally at IBM as PMR 56030 (data  truncation prb).
Solution was provided to me in april 2008.
Fix was tested by me and worked.

I don't think the problem was logged at sourceforge. 
At least I couldn't find it.

A JT Open version from let say june/juli 2008 or later should contain the fix.

I have added a very simple test program to check if the jt400.jar contains the solution.

> CLOB truncation on DB2 when using 2 or 3 byte chars (UTF8)
> ----------------------------------------------------------
>
>                 Key: HHH-2401
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HHH-2401
>             Project: Hibernate Core
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.2.0.ga
>         Environment: Hibernate 3.2.0 GA  + JBoss 4.0.4.GA on Windows XP + DB2 UDB for ISeries V5R3 + IBM JT Open driver 4.9
>            Reporter: Simon Jongsma
>            Priority: Minor
>         Attachments: ClobTruncated.zip
>
>
> A CLOB column is used in DB2 mapped to a String in Java with the Hibernate "Text" mapping. 
> The column in DB2 has a CCSID 1208 which means "UTF8" (= Unicode).
> The CLOB truncation occurs when characters are used that are UTF8 coded in more than one byte. 
> In that the string is truncated when persisted in the database.
> For example the string
> "Granpré Molière†; 0123456789". This string has three diacritical marks in it. 
> The é and è are coded in two bytes in UTF8 and the † in three bytes.
> This string will be stored as "Granpré Molière†; 012345".
> So "6789" is not stored.
> It appears as though Hibernate does not take into account that a character can be more than 1 byte in UTF8.
> The number of missing char's at the end is exactly:  string.getBytes("UTF-8").length minus string.length()
> It is not a problem of DB2 or the JT Open driver:
> Storing and retrieving (from a Java program) the same String directly via Jdbc into the DB2 table and retrieving it, works 100% fine. 
> So this clearly points to a problem somewhere in Hibernate.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira