On 12 Nov 2007, at 17:18, Jason T. Greene wrote:
Manik Surtani wrote:
> On 8 Nov 2007, at 03:00, Jason T. Greene wrote:
>> Manik Surtani wrote:
>>> A nasty bug, spotted by someone in the user forum (initially as a
>>> CCE)
>>>
http://jira.jboss.org/jira/browse/JBCACHE-1211
>>> Copying from the JIRA:
>>> "This is a nasty. What started life as an optimisation for
>>> certain types of objects in a marshalled stream (Fqn,
>>> GlobalTransactio, String and Serializable) has become a major
>>> limitation in that a single stream can only hold up to 32767
>>> different (not equal()) instances of such objects.
>>> Basically the optimisation was, for example, instead of writing
>>> "hello" to a stream twice, just write it once and use a reference
>>> for all subsequent times. Unfortunately this reference was
>>> encoded as a short, hence the limitation of 32767.
>>> Fixing this will definitely break wire compatibility with JBoss
>>> Cache 2.0.0, although JBC does allow backward compatibility by
>>> specifying replication version in your configuration, thanks to
>>> the VersionAwareMarshaller. "
>>> So I guess this mandates the need for a CacheMarshaller210. The
>>> question is how do we fix this. The obvious thing is to replace
>>> the short references with integers. The 2 ^ 31 - 1 number of
>>> references this would allow should be plenty! The drawback
>>> though, is larger streams. 4-byte refs instead of 2-byte refs
>>> can be an unnecessary overhead especially if objects aren't
>>> repeated much.
>>
>> I wouldn't worry too much about the extra bytes. However, you
>> could maintain backwards compatibility, and save the 2 bytes, by
>> stealing the sign bit on the short. If byte1 & 0x80 then read 3
>> more bytes, else read only 1 more.
>>
> Still wouldn't help if you needed a million Strings in a
> collection. :-)
Sure it would, since you get the full positive rang of a signed int
(2^31 - 1). The only difference is that if its <= 32767 you write
only two bytes, and when it's greater you write an encoded int that
can be detected (only 4 bytes).
Of course, yeah, you'd read 3 more bytes. But that would mean (with
the adding of more bytes) this would break backward compatibility for
32767 refs anyway. Existing code wouldn't be able to deserialize
such a stream. Then again, for such cases, it is currently *broken*
and even existing code wouldn't be able to deserialize such a stream
anyway!!
Still I'd prefer to make the change to the stream explicit though, as
a separate marshaller for 2.1.0 - I do like the variable int approach
since for a small number of refs (< 128, which is probably the
majority of use cases) I'd just encode a single byte.
Cheers,
--
Manik Surtani
Lead, JBoss Cache
manik(a)jboss.org