Hi all,
during design documents review I remember having some objections on
the wording used between "performance" "time" and "state
transfer",
but since I then stopped following this aspect I don't know how far
the implementation picked this up.
I have been pointing out that the goal should not to be to minimize
the time it takes to complete the state transfer, but to minimize the
performance impact during state transfers.
The design should be able to deal with multiple state transfers
triggering while the current one(s) didn't complete; this is doable as
long as we don't abort partial state transfers and "move on" from the
intermediate states towards the stable state. I think it's also easy
to demonstrate that such a design minimizes the bytes flying around
the network, as the intermediate states are certainly "closer" (in
terms of distribution of data) to the end state than the initial
state.
A consequence is that the longer it takes to finalize a state
transfer, the higher the chances to aggregate multiple view changes in
less data shipping events, the less work is done.
In short it's good to see the state transfer doesn't take too long -
as data might be lost if multiple failures happen in that timespan -
but I definitely think we should *not* have it as a goal to minimize
this time, the goal should be to minimize response time; ideally this
should be as good during ST as when we're stable: clients expect some
degree of reliability measured in terms of maximum & average response
times.
Cheers,
Sanne
On 12 October 2012 16:23, Bela Ban <bban(a)redhat.com> wrote:
Rather than (or in addition to) using log parsing, how about using
Byteman to inject aspects measuring things such as join time, state
transfer time, view installation time, state transfer time / state size
and so on ?
On 10/12/12 1:52 PM, Radim Vansa wrote:
> Hi,
>
> yes, we did this kind of tests for ispn 5.1 releases. There was pretty easy to
analyze the join time parsing the logs for debug messages from CacheViewsManagerImpl, one
such example is
>
>
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-60-radargun-elas...
>
> However, currently there is no such obvious start/end: I have created a log parser
isolating some info (note that this is from two consecutive runs in radargun with two
configurations)
>
>
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/ispn-52-radargun-res...
>
> It is not as nice, but still better than the logs itself.
> Therefore, if we should benchmark some interval, we have to exactly state which
events should be the start and end. Could you suggest anything?
> We should also define the type of load. Should be the load random, or should we let
each client query for one key and check when it is able to acquire the key and when we
have to wait for long (because the segment with this key is transferred)?
>
> Radim
>
> ----- Original Message -----
> | From: "Mircea Markus"<mircea.markus(a)jboss.com>
> | To: "infinispan"<infinispan-dev(a)lists.jboss.org>, "Martin
Gencur"<mgencur(a)redhat.com>
> | Sent: Friday, October 12, 2012 1:16:04 PM
> | Subject: [infinispan-dev] State transfer performance
> |
> |
> |
> |
> | Hi ,
> |
> | One of the targets of NBST is to minimise the downturn in throughput
> | during topology changes. And now that NBST is getting there, I think
> | that a test to measure how long does it take to a node to join,
> | under different levels of cluster load, is very desirable in order
> | to see where we are and also to help us profile and improve the
> | state transfer performance.
> | Martin, are we doing this kind of performance testing? It would be
> | nice to have it integrated in Radargun or something similar in order
> | to be able to quickly run it.
--
Bela Ban, JGroups lead (
http://www.jgroups.org)
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev