[jboss-jira] [JBoss JIRA] (JGRP-1772) Optimize marshalling of strings

Mon Feb 17 06:15:48 EST 2014

    [ https://issues.jboss.org/browse/JGRP-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12945112#comment-12945112 ] 

Bela Ban edited comment on JGRP-1772 at 2/17/14 6:13 AM:
---------------------------------------------------------

Re less space in memory: if we have 1000 "hello" strings, then the memory requirements are (not taking CompressedOOPS into account):
* For String: 
** 1000 pointers to String instances: 8000 bytes
** 1000 pointers to the char[] arrays: 8000 bytes
** 1000 5-chars (2 bytes each): 10000 bytes
** 1000 {{hash}} ints (4 bytes): 4000 bytes
** 1000 {{hash32}} ints (4 bytes): 4000 bytes
*** TOTAL: *34000 bytes*
* For AsciiString:
** 1000 pointers to AsciiString instances: 8000 bytes
** 1000 pointers to the byte[] arrays: 8000 bytes
** 1000 5-byte 'chars': 5000 bytes
*** TOTAL: *21000 bytes*

*This saves us roughly 13000 bytes for 1000 strings*. Memory padding has not been taken into account though...
If we consider that we might have hundreds of thousands of messages (each message having a TpHeader with the cluster name) in memory, then the savings might be significant.

      was (Author: belaban):
    Re less space in memory: if we have 1000 "hello" strings, then the memory requirements are (not taking CompressedOOPS into account):
* For String: 
** 1000 pointers to String instances: 8000 bytes
** 1000 pointers to the char[] arrays: 8000 bytes
** 1000 5-chars (2 bytes each): 10000 bytes
** 1000 {{hash}} ints (4 bytes): 4000 bytes
** 1000 {{hash32}} ints (4 bytes): 4000 bytes
*** TOTAL: *34000 bytes*
* For AsciiString:
** 1000 pointers to AsciiString instances: 8000 bytes
** 1000 pointers to the byte[] arrays: 8000 bytes
** 1000 5-byte 'chars': 5000 bytes
*** TOTAL: *21000 bytes*

This saves roughly 13000 bytes for 1000 strings. Memory padding has not been taken into account though...
If we consider that we might have hundreds of thousands of messages (each message having a TpHeader with the cluster name) in memory, then the savings might be significant.

> Optimize marshalling of strings
> -------------------------------
>
>                 Key: JGRP-1772
>                 URL: https://issues.jboss.org/browse/JGRP-1772
>             Project: JGroups
>          Issue Type: Enhancement
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.5
>
>
> Currently, we use DataOutput.writeUTF() for all sorts of strings. This is implemented inefficiently and potentially uses more than 1 byte per char.
> Add another method writeString() which converts double byte chars to single byte chars so that only ASCII is supported. This can be used by a lot of internal code which never uses chars above 128.
> For external code, such as {{JChannel.connect(String cluster_name)}}, we need to see whether this is ok. Since cluster names are mainly used to differentiate clusters, perhaps it is ok to mangle the names to chars below 128, although this would change cluster names which use multi-byte chars.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira