[jbosscache-dev] Critical CacheMarshaller issue

Jason T. Greene jason.greene at redhat.com
Mon Nov 12 14:12:13 EST 2007


Manik Surtani wrote:
> 
> On 12 Nov 2007, at 17:18, Jason T. Greene wrote:
> 
>> Manik Surtani wrote:
>>> On 8 Nov 2007, at 03:00, Jason T. Greene wrote:
>>>> Manik Surtani wrote:
>>>>> A nasty bug, spotted by someone in the user forum (initially as a CCE)
>>>>>   http://jira.jboss.org/jira/browse/JBCACHE-1211
>>>>> Copying from the JIRA:
>>>>> "This is a nasty. What started life as an optimisation for certain 
>>>>> types of objects in a marshalled stream (Fqn, GlobalTransactio, 
>>>>> String and Serializable) has become a major limitation in that a 
>>>>> single stream can only hold up to 32767 different (not equal()) 
>>>>> instances of such objects.
>>>>> Basically the optimisation was, for example, instead of writing 
>>>>> "hello" to a stream twice, just write it once and use a reference 
>>>>> for all subsequent times. Unfortunately this reference was encoded 
>>>>> as a short, hence the limitation of 32767.
>>>>> Fixing this will definitely break wire compatibility with JBoss 
>>>>> Cache 2.0.0, although JBC does allow backward compatibility by 
>>>>> specifying replication version in your configuration, thanks to the 
>>>>> VersionAwareMarshaller. "
>>>>> So I guess this mandates the need for a CacheMarshaller210.  The 
>>>>> question is how do we fix this.  The obvious thing is to replace 
>>>>> the short references with integers.  The 2 ^ 31 - 1 number of 
>>>>> references this would allow should be plenty!  The drawback though, 
>>>>> is larger streams.  4-byte refs instead of 2-byte refs can be an 
>>>>> unnecessary overhead especially if objects aren't repeated much.
>>>>
>>>> I wouldn't worry too much about the extra bytes. However, you could 
>>>> maintain backwards compatibility, and save the 2 bytes, by stealing 
>>>> the sign bit on the short. If byte1 & 0x80 then read 3 more bytes, 
>>>> else read only 1 more.
>>>>
>>> Still wouldn't help if you needed a million Strings in a collection.  
>>> :-)
>>
>> Sure it would, since you get the full positive rang of a signed int 
>> (2^31 - 1). The only difference is that if its <= 32767 you write only 
>> two bytes, and when it's greater you write an encoded int that can be 
>> detected (only 4 bytes).
> 
> Of course, yeah, you'd read 3 more bytes.  But that would mean (with the 
> adding of more bytes) this would break backward compatibility for > 
> 32767 refs anyway.  Existing code wouldn't be able to deserialize such a 
> stream.  Then again, for such cases, it is currently *broken* and even 
> existing code wouldn't be able to deserialize such a stream anyway!!

Right, there is no way to fix the limitation on old code. The only 
advantage is that you only need 1 marshaller.

> Still I'd prefer to make the change to the stream explicit though, as a 
> separate marshaller for 2.1.0 - I do like the variable int approach 
> since for a small number of refs (< 128, which is probably the majority 
> of use cases) I'd just encode a single byte.

Probably the most efficient change, would be to introduce a new magic 
number for each size:
MAGICNUMBER_REF
MAGICNUMBER_REF16
MAGICNUMBER_REF32
MAGICNUMBER_REF64

This would give you the full unsigned size of each type (provided you 
changed the reference generation logic to support it).
0 - 255 refs : 1 byte
256 - 65535 refs : 2 bytes
65536 - 4294967295 : 4 bytes
4294967296 - 18446744073709551616 : 8 bytes

-Jason

-- 
Jason T. Greene
JBoss, a division of Red Hat



More information about the jbosscache-dev mailing list