[infinispan-dev] Let me understand DIST

Fri Mar 16 07:42:21 EDT 2012

On Fri, Mar 16, 2012 at 9:36 AM, Bela Ban <bban at redhat.com> wrote:
>
>
> On 3/15/12 4:25 PM, Manik Surtani wrote:
>>
>> On 15 Mar 2012, at 04:31, Bela Ban wrote:
>>
>>> If we touch a lot of keys, then sending *all* of the keys to all owners
>>> may be sub-optimal; as an optimization, we may want to send only the
>>> keys to the nodes which need to store them. This would make the PREPARES
>>> potentially much smaller.
>>
>>
>> Not really - if we are sending different prepares to different nodes (containing different writes) then the serialisation overhead goes up since each recipient gets a different byte buffer.  In the current case we reuse the byte buffer.
>
>
> In my experience, serialization is never the bottleneck, it is
> *de-serialization*, so this shouldn't be a problem. Also, I don't see
> why serialization of keys [1..9] shouldn't take roughly the same time as
> serialization of keys [1..3], [4..6], [7..9].
>

Bela, we would split the keys into [1..3], [4..6], [7..9] only if numOwners==1.
With numOwners==2, it would more likely be something like this:
[1..6], [4..6], [7..9], [7-9, 1..3]. Because each key has 2 owners, it
has to be serialized twice (and kept in memory on the originator in 3
copies).
With numOwners==3 the serialization cost will be triple, both in CPU
and in memory usage.
And so on...

So when the cluster is small and all the owners would receive most of
the keys anyway, the savings in deserialization are offset by the
increased costs in serialization. In fact, we'd have to special case
clusters with <= numOwner nodes or we'd always spend more time in
serialization + deserialization than we do now.

> If we have a TX which touches a lot of keys, unless we break the key set
> down into their associated targets, we would have to potentially
> transfer a *big* state !
>

If we serialize the keys separately for each target, the originator
will have to hold in memory numOwners * numKeys serialized entries
until the messages are garbage collected. With the current approach,
we only have to hold in memory numKeys serialized entries on the
originator - at the expense of extra load on the recipients.