JBM 1.4 uses UTF-8 encoding for all strings sent in messages, e.g. properties, text
message bodies etc.
This provides a good compression if using higher unicode characters a lot (e.g. chinese),
however the java UTF-8 encoding is *really slow*.
For JBM 2.0 we're currently the SimpleString class I wrote (which doesn't copy
itself on the drop of a hat like String) and we marshall it as a simple sequence of
bytes.
In my tests this is about 40 times faster than UTF-8 encoding the same string. :)
Problem is SimpleString currently only stores each character as two bytes, which is fine
for the vast majority of unicode characters but won't encode the far reaches of
unicode which require 4 bytes.
I can change SImpleString to use 4 bytes per character but this is going to make the
marshalled form big - especially in the case of standard latin characters or european -
about 4 times the size as encoded!
How do you think we should deal with this?
One possibility is we write our own UTF-like encoding implementation...
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4149433#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...