[jboss-jira] [JBoss JIRA] (JGRP-2062) TYPE_STRING does not handle unicode

Tue May 10 12:41:00 EDT 2016

     [ https://issues.jboss.org/browse/JGRP-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cody Ebberson updated JGRP-2062:
--------------------------------
    Description: 
In several places throughout the org.jgroups.util.Util class, it is assumed that Strings are one byte per character.

For example, see objectToByteBuffer lines 561-567:

https://github.com/belaban/JGroups/blob/master/src/org/jgroups/util/Util.java#L561-L567

{{case TYPE_STRING:
    String str=(String)obj;
    int len=str.length();
    ByteBuffer retval=ByteBuffer.allocate(Global.BYTE_SIZE + len).put(TYPE_STRING);
    for(int i=0; i < len; i++)
        retval.put((byte)str.charAt(i));
    return retval.array();}}

This code will incorrectly encode any String with non ASCII encoding.

There are several options to fix.  You could use str.getBytes(StandardCharsets.UTF_8) to get a proper byte encoding, or you could use the existing TYPE_SERIALIZABLE code path.

  was:
In several places throughout the org.jgroups.util.Util class, it is assumed that Strings are one byte per character.

For example, see objectToByteBuffer lines 561-567:

https://github.com/belaban/JGroups/blob/master/src/org/jgroups/util/Util.java#L561-L567

{{
case TYPE_STRING:
    String str=(String)obj;
    int len=str.length();
    ByteBuffer retval=ByteBuffer.allocate(Global.BYTE_SIZE + len).put(TYPE_STRING);
    for(int i=0; i < len; i++)
        retval.put((byte)str.charAt(i));
    return retval.array();
}}

This code will incorrectly encode any String with non ASCII encoding.

There are several options to fix.  You could use str.getBytes(StandardCharsets.UTF_8) to get a proper byte encoding, or you could use the existing TYPE_SERIALIZABLE code path.

> TYPE_STRING does not handle unicode
> -----------------------------------
>
>                 Key: JGRP-2062
>                 URL: https://issues.jboss.org/browse/JGRP-2062
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Cody Ebberson
>            Assignee: Bela Ban
>
> In several places throughout the org.jgroups.util.Util class, it is assumed that Strings are one byte per character.
> For example, see objectToByteBuffer lines 561-567:
> https://github.com/belaban/JGroups/blob/master/src/org/jgroups/util/Util.java#L561-L567
> {{case TYPE_STRING:
>     String str=(String)obj;
>     int len=str.length();
>     ByteBuffer retval=ByteBuffer.allocate(Global.BYTE_SIZE + len).put(TYPE_STRING);
>     for(int i=0; i < len; i++)
>         retval.put((byte)str.charAt(i));
>     return retval.array();}}
> This code will incorrectly encode any String with non ASCII encoding.
> There are several options to fix.  You could use str.getBytes(StandardCharsets.UTF_8) to get a proper byte encoding, or you could use the existing TYPE_SERIALIZABLE code path.

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)