[jbosscache-dev] Critical CacheMarshaller issue

Manik Surtani manik at jboss.org
Mon Nov 12 13:17:41 EST 2007


On 12 Nov 2007, at 17:18, Jason T. Greene wrote:

> Manik Surtani wrote:
>> On 8 Nov 2007, at 03:00, Jason T. Greene wrote:
>>> Manik Surtani wrote:
>>>> A nasty bug, spotted by someone in the user forum (initially as a  
>>>> CCE)
>>>>   http://jira.jboss.org/jira/browse/JBCACHE-1211
>>>> Copying from the JIRA:
>>>> "This is a nasty. What started life as an optimisation for  
>>>> certain types of objects in a marshalled stream (Fqn,  
>>>> GlobalTransactio, String and Serializable) has become a major  
>>>> limitation in that a single stream can only hold up to 32767  
>>>> different (not equal()) instances of such objects.
>>>> Basically the optimisation was, for example, instead of writing  
>>>> "hello" to a stream twice, just write it once and use a reference  
>>>> for all subsequent times. Unfortunately this reference was  
>>>> encoded as a short, hence the limitation of 32767.
>>>> Fixing this will definitely break wire compatibility with JBoss  
>>>> Cache 2.0.0, although JBC does allow backward compatibility by  
>>>> specifying replication version in your configuration, thanks to  
>>>> the VersionAwareMarshaller. "
>>>> So I guess this mandates the need for a CacheMarshaller210.  The  
>>>> question is how do we fix this.  The obvious thing is to replace  
>>>> the short references with integers.  The 2 ^ 31 - 1 number of  
>>>> references this would allow should be plenty!  The drawback  
>>>> though, is larger streams.  4-byte refs instead of 2-byte refs  
>>>> can be an unnecessary overhead especially if objects aren't  
>>>> repeated much.
>>>
>>> I wouldn't worry too much about the extra bytes. However, you  
>>> could maintain backwards compatibility, and save the 2 bytes, by  
>>> stealing the sign bit on the short. If byte1 & 0x80 then read 3  
>>> more bytes, else read only 1 more.
>>>
>> Still wouldn't help if you needed a million Strings in a  
>> collection.  :-)
>
> Sure it would, since you get the full positive rang of a signed int  
> (2^31 - 1). The only difference is that if its <= 32767 you write  
> only two bytes, and when it's greater you write an encoded int that  
> can be detected (only 4 bytes).

Of course, yeah, you'd read 3 more bytes.  But that would mean (with  
the adding of more bytes) this would break backward compatibility for  
 > 32767 refs anyway.  Existing code wouldn't be able to deserialize  
such a stream.  Then again, for such cases, it is currently *broken*  
and even existing code wouldn't be able to deserialize such a stream  
anyway!!

Still I'd prefer to make the change to the stream explicit though, as  
a separate marshaller for 2.1.0 - I do like the variable int approach  
since for a small number of refs (< 128, which is probably the  
majority of use cases) I'd just encode a single byte.

Cheers,
--
Manik Surtani
Lead, JBoss Cache
manik at jboss.org









More information about the jbosscache-dev mailing list