Hi all,
Bela was kind enough to have a discussion with me last week regarding my data center
replication requirements.
At a high level, I have 3 independent data centers (sites A, B, C). The latency between
data centers is high. (~200ms round trip) So initially I was thinking about using a
backing store (like Cassandra) to handle the replication between data centers. Each
center would have its own individual grid to manage "local" resources. So when
a local TX is committed successfully, it is replicated to the stores in the other data
centers. That way, on a data center failure, the requests can be directed to the other
data centers by loading from the backing store.
The largest drawback: Certain distributed applications require highly serialized access
to resources in the grid. This means lots of explicit locking of keys in a single
transaction. In the event that a request is directed to, say, Data Center B because of an
intermittent failure of Data Center A, as it stands there exists the possibility that a
stale resource could still be resident in that grid. It naturally follows that there will
have to be application logic to for the grid in each data center to know which resources
it "owns". And once the backing store gets an update from another data center,
it will need to aggressively evict non-owned resources from the grid.
I (and the customer) would like to use a single data grid across multiple data centers.
Bela detailed an option based off of JGroups RELAY that is a candidate solution.
- When doing a 2PC, Infinispan broadcasts the PREPARE to all nodes (in A, B and C).
*However*, it only expects responses from *local* nodes (in this case nodes in data center
A). Infinispan knows its own siteId and can extract the siteId from every address, so it
can grab the current view (say A1, A2, A3... A10, B1-B10, C1-C10) and remove non-local
nodes, to arrive at a sanitized list A1-10. This means it expects responses to its PREPARE
message only from A1-10. When it receives a response from non-local nodes, it simply
discards them.
- On rollback, a ROLLBACK(TX) message is broadcast to the entire virtual cluster (A, B and
C)
- On commit, a COMMIT(TX) is broadcast to the entire virtual cluster (A, B and C).
The downside here is that the 2PC won't be atomic, in the sense that it is only atomic
for A, but not for B or C. A PREPARE might fail on a node in B and C, but the 2PC
won't get rolled back as long as all nodes in A sent back a successful PREPARE-OK
response. This is the same though in the current solution.
Comments? Thoughts?
Erik Salter
esalter(a)bnivideo.com
Software Architect
BNI Video
Cell: (404) 317-0693
________________________________
The information contained in this message is legally privileged and confidential, and is
intended for the individual or entity to whom it is addressed (or their designee). If this
message is read by anyone other than the intended recipient, please be advised that
distribution of this message, in any form, is strictly prohibited. If you have received
this message in error, please notify the sender immediately and delete or destroy all
copies of this message.