Hi Sanne,
thanks ! My first YouTube video, so next time I might venture into
editing the video... :-)
The video was initially meant to show the perf when adding 2 backup
sites, but then (because I had to include config info) 'degenerated'
into a tutorial on xsite. I know, bad job on that, but we should come up
with a tutorial that does this in a much more comprehensive way...
Comments inline
On 12/15/12 2:01 AM, Sanne Grinovero wrote:
That was very very nice to see.
Assuming you also asked for feedback to improve this as a talk:
1# you stress several times that reads are going to be local, so very
fast. I think you meant "local to the site" ? as some ~33% of entries
will need to be fetched from peers on the same site.
Yes, 'local' always refers to the local site (cluster). The scenario
we're looking at is dist-sync within the site and async xsite repl
between sites. I should also have mentioned that latency within a site
is very small (e.g. 0.05ms) whereas we might have up to 60 *ms* between
sites.
2# you aren't actually running this on multiple sites are you?
Correct. This was all within our Boston lab, every node was running on a
different box though. The ultimate goal is to inject latency into the
system, e.g. using the DELAY protocol in the global cluster.
But as a first step, I wanted to get the base performance for xsite repl.
When pointing out the different IP addresses you say something
about
needing them to be different, but I didn't understand if you needed
them different because they are in different places, or to separate
otherwise local machines to have them appear as in different places.
The reason to separate mcast_addr/mcast_port was to simulate 3 sites on
the same box. Had I not used different addresses/ports for the 3 sites,
all nodes of the 3 sites would have found each other and formed a
cluster of 9.
3# Since get operations are always local (site), they are as you say
not meaningful for the benchmark; now since put operations are also
not meaningful as it's async .. what is the benchmark measuring?
Well, in the meantime I modified the test and now we do support reads;
you can define a read/write ratio.
This scenario mimics how a large prospective xsite customer will use
xsite repl: dist-sync for intra-site and xsite async repl for inter-site
communication. One thing we ran into was that there was a 20% perf
degradation if we added async xsite repl *per site* even if that site
was down ! The root cause was that async xsite repl does not mark a site
as offline in Infinispan even if it is down in JGroups. This will get
fixed in Infinispan and should increase async xsite repl for down sites,
see [1] for details.
Note that the test can also measure sync xsite replication between
sites; this is just a matter of configuring the cache differently. But
as the scenario this will be used initially is async xseit repl, that's
what we're focusing on for now.
4# There seems to be some degree of redundancy when explaining
LON/SFO/NYC setting as the local site vs the backup sites. Wouldn't it
make more sense to be able to configure all backup sites the same and
have it automatically ignore the "self" element as a backup site? So
your script would only need to specify what the local site is. If that
makes any sense it would even be nice to extend this to the IP
addresses being defined in the zones area, so that they are applied
both to the JGroups configuration for the local cluster and to the
bridge configuration.
Regarding the mcast_addr/mcast_port settings: yes, I could have used
only 1 config file (local.xml) and set these properties as variables.
I've already changed and committed this.
Regarding the setup of the self and backup sites: yes, this could have
been done. Again, this is just a matter of setup and lazyness on my
behalf :-)
5# I was initially surprised to see x-site configuration as part of a
cache configuration; I understand the reasons for options like
"strategy" which one might want to specify differently on each cache,
but what about "take offline" ?
Take offline/online is currently available via a JMX operations. Taking
a site offline is also done automatically, but currently only when xsite
repl is *sync*. There's a JIRA that'll fix this for async xsite repl.
that sounds more something which
should be globally managed at the channel level - not sure if in
JGroups directly but if it's to be handled in Infinispan I would
expect to have all caches use the same policy, consistent with FD.
Actually this doesn't use JGroups failure detection as we can't use it
across sites. This is where the <takeOffline ...> element comes in (as I
mentioned, currently only for async)
Also it doesn't looks like you have much of a choice in to which
sites
you want to replicate, as relay is setup at the jgroups level so
affecting all caches: is relay going to be ignored by caches having no
x-site enabled?
Actually, you can define the backupSites, so LON could choose *not* to
replicate to a backup site at all, and NYC could pick only SFO as backup
site.
Yes, RELAY2 can be ignored on a per-message basis: we have a NO_RELAY
flag that AFAIR Mircea uses to exclude certain messages from getting
relayed.
Note that in the demo I defined xsite repl in the defaults section and
clusteredCache simply used it. I can define an empty <sites/> inside
clusteredCache if I don't want xsite repl for that particular cache. Or
it could be done the other way round: don't define a default xsite
config, but define it per cache that wants xsite repl.
And is it going to be relayed only to one site if the
Infinispan configuration lists a single site?
Yes
Not sure if this makes any sense, I just found it contrasting with
my
naive expectations of how such a configuration would look like.
thanks a lot, I hope this is proof enough that your video was pretty catchy :)
Thanks for the feedback !
[1]
https://issues.jboss.org/browse/JGRP-1543
--
Bela Ban, JGroups lead (
http://www.jgroups.org)