UTF-8 is a little more complex than that. Chars 0-0x7F are represented by one byte, 0-0x7F. Beyond that characters are represented with 2 to 4 bytes. This means that for every character there are multiple comparisons and shifts performed, with some extra bits being set or cleared for certain characters.
UTF-16 on the other hand, being the native encoding for Java, is written one char at a time without transcoding - no shifts, no comparisons, no bitmasks. It's just a straight write of chars. You can't possibly do better than that in terms of processing speed.
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4150822#4150822
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4150822
"trustin" wrote :
|
| for (int i = 0; i < str.length(); i ++) {
| buf.putChar(str.charAt(i));
| }
Well... UTF-8 is just putting a sequence of bytes generated from bitwise operations on the chars? So should be fast right?
Maybe the overhead is due to the creation of the encoder class etc, I don't really know I haven't profiled it yet...
Maybe we should just write our own encoding that writes directly onto a SimpleString...
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4150819#4150819
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4150819
"timfox" wrote : "trustin" wrote : I'd prefer David's suggestion to use UTF-16 although it's not so efficient for ASCII strings.
|
| I'm not really sure I understand why UTF-16 is going to be more performant than UTF-8.
It's because it's as simple as writing a series of short integers? It should look like this for example:
for (int i = 0; i < str.length(); i ++) {
buf.putChar(str.charAt(i));
}
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4150810#4150810
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4150810
"jmesnil" wrote :
| It seems the bottleneck is the generation of the UUID.
|
| When using UUID.randomUUID(), the implementation uses a SecureRandom.
| If I use Random instead and creates the UUID with new UUID(rand.nextLong(), nextLong(), the perf increases significantely.
|
I don't think we should be using random UUIDs at all, since they're... well random (so can clash).
Instead I think we should use a variant 2 UUID.
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4150802#4150802
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4150802
"timfox" wrote : 2) Text messages. Currently text message bodies are encoded using UTF-8 with the encoding methods on the MINA IoBuffer class. This is slow (as mentioned in another thread).
One of my friends told me just writing a byte array generated by String.getBytes(enc) performs better. YMMV though. I'd prefer David's suggestion to use UTF-16 although it's not so efficient for ASCII strings.
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4150795#4150795
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4150795