[infinispan-dev] Store as binary

Galder Zamarreño galder at redhat.com
Tue Feb 4 02:14:23 EST 2014


On 21 Jan 2014, at 17:45, Mircea Markus <mmarkus at redhat.com> wrote:

> 
> On Jan 21, 2014, at 2:13 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
> 
>> On 21 January 2014 13:37, Mircea Markus <mmarkus at redhat.com> wrote:
>>> 
>>> On Jan 21, 2014, at 1:21 PM, Galder Zamarreño <galder at redhat.com> wrote:
>>> 
>>>>> What's the point for these tests?
>>>> 
>>>> +1
>>> 
>>> To validate if storing the data in binary format yields better performance than store is as a POJO.
>> 
>> That will highly depend on the scenarios you want to test for. AFAIK
>> this started after Paul described how session replication works in
>> WildFly, and we already know that both strategies are suboptimal with
>> the current options available: in his case the active node will always
>> write on the POJO, while the backup node will essentially only need to
>> store the buffer "just in case" he might need to take over.
> 
> Indeed as it is today, it doesn't make sense for WildFly's session replication.
> 
>> 
>> Sure, one will be slower, but if you want to make a suggestion to him
>> about which configuration he should be using, we should measure his
>> use case, not a different one.
>> 
>> Even then as discussed in Palma, an in memory String representation
>> might be way more compact because of pooling of strings and a very
>> high likelihood for repeated headers (as common in web frameworks),
> 
> pooling like in String.intern()? 
> Even so, if most of your access to the String is to serialize it and sent is remotely then you have a serialization cost(CPU) to pay for the reduced size.

Serialization has a cost, but nothing compared with the transport itself, and you don’t have to go very far to see the impact of transport. Just recently we were chasing some performance regression and even though there were some changes in serialization, the impact of my improvements was minimal, max 2-3%. Optimal network and transport configuration is more important IMO, and once again, misconfiguration in that layer is what was causing us to be ~20% slower.

> 
>> so
>> you might want to measure the CPU vs storage cost on the receiving
>> side.. but then again your results will definitely depend on the input
>> data and assumptions on likelihood of failover, how often is being
>> written on the owner node vs on the other node (since he uses
>> locality), etc.. many factors I'm not seeing being considered here and
>> which could make a significant difference.
> 
> I'm looking for the default setting of storeAsBinary in the configurations we ship. I think the default configs should be optimized for distribution, random key access (every reads/writes for any key executes on every node of the cluster with the same probability) for both read an write.

I’m with Sanne on this. I still think this is not a useful exercise really, since serialization is not huge cost in total time spent. Our latency is driven by waiting for others to reply to our requests, and that’s the driver on sync mode. In async, you can forget about the serialization cost if you use putAsync(). 

I find it way more useful to look at Infinispan all the time and consider what things we should be ditching to make our configuration smaller, our memory consumption smaller, and a smaller code base.

> 
>> 
>>> As of now, it doesn't so I need to check why.
>> 
>> You could play with the test parameters until it produces an output
>> you like better, but I still see no point?
> 
> the point is to provide the best defaults params for the default config, and see what's the usefulness of storeAsBinary.  
> 
>> This is not a realistic
>> scenario, at best it could help us document suggestions about which
>> scenarios you'd want to keep the option enabled vs disabled, but then
>> again I think we're wasting time as we could implement a better
>> strategy for Paul's use case: one which never deserializes a value
>> received from a remote node until it's been requested as a POJO, but
>> keeps the POJO as-is when it's stored locally.
> 
> I disagree: Paul's scenario, whilst very important, is quite specific. For what I consider the general case (random key access, see above), your approach is suboptimal.  
> 
> 
>> I believe that would
>> make sense also for OGM and probably most other users of Embedded.
>> Basically, that would re-implement something similar to the previous
>> design but simplifying it a bit so that it doesn't allow for a
>> back-and-forth conversion between storage types but rather dynamically
>> favors a specific storage strategy.
> 
> It all boils down to what we want to optimize for: random key access or some degree of affinity. I think the former is the default.
> One way or the other, from the test Radim ran with random key access, the storeAsBinary doesn't bring any benefit and it should: http://lists.jboss.org/pipermail/infinispan-dev/2009-October/004299.html
> 
>> 
>> Cheers,
>> Sanne
>> 
>>> 
>>> Cheers,
>>> --
>>> Mircea Markus
>>> Infinispan lead (www.infinispan.org)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarreño
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org




More information about the infinispan-dev mailing list