Hi all,
Since my requirements for x-datacenter replication are intertwined with
discussions of the various approaches, I was asked to lay them out
independently of any solution.
- I have 3 data centers. I'm modeling both session and resource data in my
data grid usage. Resources are modeled off of physical devices.
- There are no backing stores on any of the caches. We're operating as a
true PaaS model.
- In normal usage, this data is geographically partitioned so there won't be
concurrent writes *on the same key* across the three sites
- Requesting clients will be routed to their "local" data center. These
client will look up and cache the load balancer address of their local data
center. A network failure of a data center will cause an immediate
failover. However, because this is a legacy vendor providing the client, a
fail-back can take ~5 minutes, as clients slowly bleed off their backup data
center. Because of this, there is a need that between-site consistency
should not be lost for transactional writes.
- My customer wants all sessions and resources requested by these clients to
be "owned" by the local data center, with replicas of data on BOTH backup
data centers.
- The sessions are mostly ephemeral data and are a good candidate for async
replication. The resource data is highly transactional and MUST be
synchronous. This does not really present a problem for my customer, as the
round trip time between data centers is at most 70ms.
- My customer wants a data center outage to be as minimally obtrusive as
possible (which really goes without saying). In addition, there are a lot
of planned outages as well. Right now, this means the clients would be
routed to the next proximate data center to slowly take the affected data
center "out of service."
- Any state transfer between nodes *should* be localized to the data center.
Cross-data center traffic should be minimized as much as possible.
Furthermore, the "local" data center should have 2 replicas of a key, while
the backup data centers have 1.
Here are the design challenges with the proposed solutions, as I see it.
#1. Virtual Cluster.
- RELAY needs to support > 2 sites. Like 3 =)
- We would need a custom CH that can place data into its local site.
- We would need a configuration that extends numOwners into primary and
backup sites.
- In the current state transfer model, a node joining means that all the
nodes from all the sites are involved in rehashing. The rehash operation
should be enhanced in this case to prefer using the local nodes as state
providers. I know that this cannot be necessarily guaranteed, as a loss of
multiple nodes at a site might mean the state only exists on another data
center. But for a minimum of 10M records, I can't afford the existing
limitations on state transfer. (And it goes without saying the NBST is an
absolute must)
- A data center outage will result in a rehash in this model. My customer
doesn't want this. If we can write to 2 of the 3 data centers, we can
always retransmit the replicas once the data center comes back online.
- Conversely, this model DOES handle the case where there are concurrent
writes due to clients migrating to their data center. This simply will
result in more latency for these requests during this process, as they would
have to be routed to their owner for processing.
#2. Hot Rod
This isn't an option, as I have confirmed I cannot open up this many ports.
Plus HR is missing many features I take advantage of in embedded mode.
#3. Cluster Bridge.
- Most of the design challenges in #1 are greatly mitigated by #3 because
they are independent clusters, e.g. there is no need for a custom CH, state
transfer only affects one data center, etc.
- However, the largest challenge here is guaranteeing that cross-data center
writes remain consistent in the period that the clients migrate.
- There is also the question of the management of the cluster. My customer,
in particular, would want to manage the aggregate of all three clusters.
Hopefully this facilitates design discussion for the meetings this week.
Let me know if you need anything else.
Erik
-----Original Message-----
From: infinispan-dev-bounces(a)lists.jboss.org
[mailto:infinispan-dev-bounces@lists.jboss.org] On Behalf Of Mircea Markus
Sent: Thursday, February 16, 2012 9:53 AM
To: infinispan -Dev List
Subject: Re: [infinispan-dev] Use cases for x-datacenter replication
Yes, this makes sense.
Although I doubt you will lose an entire data center. But an
intermediate switch can always crash, making the DC inaccessible,
which
is the same problem for clients as if the entire DC crashed. Also, if
we
implement the follow-the-sun approach, then clients need to be
switched
over gracefully from one DC to another one, so this functionality
definitely needs to be provided.
We probaby need to differentiate between a crash-failover and a
graceful
failover; in the latter case, clients can be switched over to the new
DC
within a certain time frame, so we minimize load spikes that might be
caused by switching over all clients at exactly the same time.
I added your comment to
https://community.jboss.org/wiki/CrossDatacenterReplication-Design,
so
we can discuss this next week (we have a meeting on cross-data center
clustering WED-FRI in London).
Keep the suggestions coming !
+100
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev