On 21 Jan 2014, at 17:45, Mircea Markus <mmarkus(a)redhat.com> wrote:
On Jan 21, 2014, at 2:13 PM, Sanne Grinovero <sanne(a)infinispan.org> wrote:
> On 21 January 2014 13:37, Mircea Markus <mmarkus(a)redhat.com> wrote:
>>
>> On Jan 21, 2014, at 1:21 PM, Galder Zamarreño <galder(a)redhat.com> wrote:
>>
>>>> What's the point for these tests?
>>>
>>> +1
>>
>> To validate if storing the data in binary format yields better performance than
store is as a POJO.
>
> That will highly depend on the scenarios you want to test for. AFAIK
> this started after Paul described how session replication works in
> WildFly, and we already know that both strategies are suboptimal with
> the current options available: in his case the active node will always
> write on the POJO, while the backup node will essentially only need to
> store the buffer "just in case" he might need to take over.
Indeed as it is today, it doesn't make sense for WildFly's session replication.
>
> Sure, one will be slower, but if you want to make a suggestion to him
> about which configuration he should be using, we should measure his
> use case, not a different one.
>
> Even then as discussed in Palma, an in memory String representation
> might be way more compact because of pooling of strings and a very
> high likelihood for repeated headers (as common in web frameworks),
pooling like in String.intern()?
Even so, if most of your access to the String is to serialize it and sent is remotely
then you have a serialization cost(CPU) to pay for the reduced size.
Serialization has a cost, but nothing compared with the transport itself, and you don’t
have to go very far to see the impact of transport. Just recently we were chasing some
performance regression and even though there were some changes in serialization, the
impact of my improvements was minimal, max 2-3%. Optimal network and transport
configuration is more important IMO, and once again, misconfiguration in that layer is
what was causing us to be ~20% slower.
> so
> you might want to measure the CPU vs storage cost on the receiving
> side.. but then again your results will definitely depend on the input
> data and assumptions on likelihood of failover, how often is being
> written on the owner node vs on the other node (since he uses
> locality), etc.. many factors I'm not seeing being considered here and
> which could make a significant difference.
I'm looking for the default setting of storeAsBinary in the configurations we ship. I
think the default configs should be optimized for distribution, random key access (every
reads/writes for any key executes on every node of the cluster with the same probability)
for both read an write.
I’m with Sanne on this. I still think this is not a useful exercise really, since
serialization is not huge cost in total time spent. Our latency is driven by waiting for
others to reply to our requests, and that’s the driver on sync mode. In async, you can
forget about the serialization cost if you use putAsync().
I find it way more useful to look at Infinispan all the time and consider what things we
should be ditching to make our configuration smaller, our memory consumption smaller, and
a smaller code base.
>
>> As of now, it doesn't so I need to check why.
>
> You could play with the test parameters until it produces an output
> you like better, but I still see no point?
the point is to provide the best defaults params for the default config, and see
what's the usefulness of storeAsBinary.
> This is not a realistic
> scenario, at best it could help us document suggestions about which
> scenarios you'd want to keep the option enabled vs disabled, but then
> again I think we're wasting time as we could implement a better
> strategy for Paul's use case: one which never deserializes a value
> received from a remote node until it's been requested as a POJO, but
> keeps the POJO as-is when it's stored locally.
I disagree: Paul's scenario, whilst very important, is quite specific. For what I
consider the general case (random key access, see above), your approach is suboptimal.
> I believe that would
> make sense also for OGM and probably most other users of Embedded.
> Basically, that would re-implement something similar to the previous
> design but simplifying it a bit so that it doesn't allow for a
> back-and-forth conversion between storage types but rather dynamically
> favors a specific storage strategy.
It all boils down to what we want to optimize for: random key access or some degree of
affinity. I think the former is the default.
One way or the other, from the test Radim ran with random key access, the storeAsBinary
doesn't bring any benefit and it should:
http://lists.jboss.org/pipermail/infinispan-dev/2009-October/004299.html
>
> Cheers,
> Sanne
>
>>
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (
www.infinispan.org)
>>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Cheers,
--
Mircea Markus
Infinispan lead (
www.infinispan.org)
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Galder Zamarreño
galder(a)redhat.com
twitter.com/galderz
Project Lead, Escalante
http://escalante.io
Engineer, Infinispan
http://infinispan.org