[jboss-jira] [JBoss JIRA] (JGRP-1718) Message corruption under heavy load

Thu Oct 17 21:48:01 EDT 2013

    [ https://issues.jboss.org/browse/JGRP-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12823066#comment-12823066 ] 

Rich DiCroce commented on JGRP-1718:
------------------------------------

I figured it out! Mostly.

After digging around in the Kryo source code, I discovered that when deserializing a byte array, Kryo flips a bit in order to read a String, then flips the bit back. I created a test where a bunch of threads all tried to deserialize a byte array at the same time, and was able to reproduce the corruption issue perfectly.

So that then leads to the question: why are multiple threads using the same byte array at the same time? I think the answer lies in Message.copy(). That method ultimately calls copy(boolean, boolean), which calls setBuffer() to copy the buffer. But setBuffer() doesn't make a copy of the byte array; instead it just copies the reference to the byte array. This explains why the corruption happened more often with loopback = true, because the byte array got passed back up the stack where Kryo mucked with it at the same time as the transport was reading the byte array to multicast it. It also explains why the corruption still happened with loopback = false, because presumably a similar situation could occur if the Message got lost and needed to be re-transmitted.

The fact that the byte array isn't being deep-copied seems like a bit of a bug to me. :-) Regardless, I don't like the fact that Kryo changes something that it's reading, so I think I'll be investigating other options for serialization.

> Message corruption under heavy load
> -----------------------------------
>
>                 Key: JGRP-1718
>                 URL: https://issues.jboss.org/browse/JGRP-1718
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.2.12
>         Environment: JBoss AS 7.2.0.Final (includes JGroups 3.2.7.Final, upgrading to 3.2.12.Final had no effect)
>            Reporter: Rich DiCroce
>            Assignee: Bela Ban
>
> In my project, I'm using the Kryo serialization library to serialize/deserialize objects to/from byte[]. Some of these byte arrays appear to be getting corrupted during transmission.
> The problem happens very infrequently and only under load. I have to send tens of thousands of messages from hundreds of threads just to get two or three errors. The exception occurs when Kryo tries to deserialize the byte[] on the receiving node. So far, all of the errors I've seen happen when Kryo tries to find a class to deserialize an object, and the stack traces end with something like:
> Caused by: java.lang.ClassNotFoundException: java.util.Date[SOH][GS]
> Note that the [SOH] and [GS] are not literally those strings, they're unprintable ASCII 0x1 and 0x1D.
> I'm confident that Kryo itself is not the source of the problem. I added some code to my project to deserialize the message on the sending node, immediately after serializing it, and log any exception thrown from the deserialization. The sending node did not log an exception, but the receiving node still did. My code passes the byte[] to the Message constructor without doing anything to it.
> It also appears that I am not the first person to encounter this issue. See https://code.google.com/p/kryo/issues/detail?id=102

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira