[hibernate-issues] [Hibernate-JIRA] Commented: (HHH-2401) CLOB truncation on DB2 when using 2 or 3 byte chars (UTF8)

Wed Jul 15 09:54:12 EDT 2009

    [ http://opensource.atlassian.com/projects/hibernate/browse/HHH-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=33613#action_33613 ] 

Guillermo Turk commented on HHH-2401:
-------------------------------------

I still have a problem that I was able to solve by going around it, but I think there is still a problem.
We have a program that uses both Oracle and DB2400 and stores XML string is CLOBs. Our data includes spanish, arabic and chinese characters. 

When we use a prepared statement ins :

ins.(columnIndex, rs.getObject(columnIndex), attribute.datatype) 

and attribute.datatype is Types.CLOB, the problem described here still occurs. And since these are XML strings, they are persisted without the last few characters so they are impossible to work when recovered from DB2400.

We have gone around the problem like this:

                for (TableAttribute attribute : destination.getAttributes()) {
                        if    (attribute.datatype == Types.CLOB) {
                              Clob data = (Clob)rs.getObject(columnIndex);
                              String inData = data.getSubString(1, (int)data.length());
                              ins.setString(columnIndex, inData);
                        } else {
                              ins.setObject(columnIndex, rs.getObject(columnIndex), attribute.datatype);
                        }     
                        columnIndex++;
                  }

> CLOB truncation on DB2 when using 2 or 3 byte chars (UTF8)
> ----------------------------------------------------------
>
>                 Key: HHH-2401
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HHH-2401
>             Project: Hibernate Core
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.2.0.ga
>         Environment: Hibernate 3.2.0 GA  + JBoss 4.0.4.GA on Windows XP + DB2 UDB for ISeries V5R3 + IBM JT Open driver 4.9
>            Reporter: Simon Jongsma
>            Priority: Minor
>         Attachments: clob_bug_demo.java, ClobTruncated.zip
>
>
> A CLOB column is used in DB2 mapped to a String in Java with the Hibernate "Text" mapping. 
> The column in DB2 has a CCSID 1208 which means "UTF8" (= Unicode).
> The CLOB truncation occurs when characters are used that are UTF8 coded in more than one byte. 
> In that the string is truncated when persisted in the database.
> For example the string
> "Granpré Molière†; 0123456789". This string has three diacritical marks in it. 
> The é and è are coded in two bytes in UTF8 and the † in three bytes.
> This string will be stored as "Granpré Molière†; 012345".
> So "6789" is not stored.
> It appears as though Hibernate does not take into account that a character can be more than 1 byte in UTF8.
> The number of missing char's at the end is exactly:  string.getBytes("UTF-8").length minus string.length()
> It is not a problem of DB2 or the JT Open driver:
> Storing and retrieving (from a Java program) the same String directly via Jdbc into the DB2 table and retrieving it, works 100% fine. 
> So this clearly points to a problem somewhere in Hibernate.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira