[jboss-jira] [JBoss JIRA] (JGRP-2062) TYPE_STRING does not handle unicode

Cody Ebberson (JIRA) issues at jboss.org
Tue May 17 13:47:00 EDT 2016


    [ https://issues.jboss.org/browse/JGRP-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238875#comment-13238875 ] 

Cody Ebberson commented on JGRP-2062:
-------------------------------------

No worries.  The workaround is straightforward enough.

I haven't tried it, but simply removing "PRIMITIVE_TYPES.put(String.class,TYPE_STRING);" on line 111 of Util.java might fix the issue entirely by falling back to TYPE_SERIALIZABLE.  There could be a small performance impact, but I assume that would be negligible compared to network transmission, etc.  The built in serialization for java.lang.String appears to be UTF-8 plus a ~7 byte header, so total transmission size would be roughly the same.

If the guidance is to use manual marshalling, then you could consider deprecating the org.jgroups.Message constructors that take Object obj.  That would make it explicit that the user must implement their own marshalling.  As a newcomer to JGroups, the general purpose constructor looks like an obvious option.  I imagine this could be a common pitfall.

Thanks for taking a look!  And thanks for JGroups overall, I think it's great.

> TYPE_STRING does not handle unicode
> -----------------------------------
>
>                 Key: JGRP-2062
>                 URL: https://issues.jboss.org/browse/JGRP-2062
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Cody Ebberson
>            Assignee: Bela Ban
>            Priority: Minor
>             Fix For: 4.0
>
>
> In several places throughout the org.jgroups.util.Util class, it is assumed that Strings are one byte per character.
> For example, see objectToByteBuffer lines 561-567:
> https://github.com/belaban/JGroups/blob/master/src/org/jgroups/util/Util.java#L561-L567
> {{case TYPE_STRING:
>     String str=(String)obj;
>     int len=str.length();
>     ByteBuffer retval=ByteBuffer.allocate(Global.BYTE_SIZE + len).put(TYPE_STRING);
>     for(int i=0; i < len; i++)
>         retval.put((byte)str.charAt(i));
>     return retval.array();}}
> This code will incorrectly encode any String with non ASCII encoding.
> There are several options to fix.  You could use str.getBytes(StandardCharsets.UTF_8) to get a proper byte encoding, or you could use the existing TYPE_SERIALIZABLE code path.



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)


More information about the jboss-jira mailing list