[infinispan-dev] Use cases for x-datacenter replication

Mon Feb 13 13:47:01 EST 2012

----- Original Message -----
> From: "Erik Salter" <an1310 at hotmail.com>
> To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
> Sent: Monday, February 13, 2012 6:07:37 PM
> Subject: Re: [infinispan-dev] Use cases for x-datacenter replication
> 
> Hi all,
> 
> Since I'm deploying my applications into 3 data centers, I commented
> at the
> wiki site.  Since the bulk of the discussion seems to be here, I'll
> offer
> some additional thoughts.
> 
> 
> - First and foremost, I have three data centers (JGRP-1313).  All of
> my data
> centers follow a model where the data is partitioned between them
> with
> geographic failover.

how does the backup work in your use case: A backup data to B, B to C and C to A. Or everybody is a data owner? 

> - These data centers have about 50-70 ms latency (RTT) between them.
>  That
> means I (and the customer) can live with synchronous replication for
> the
> time being, especially for the transactional scenario.  In fact, in
> one of
> my cache use cases, this is almost required.
> - In the current state transfer model, a node joining means that all
> the
> nodes from all the sites are involved in rehashing.  The rehash
> operation
> should be enhanced in this case to prefer using the local nodes as
> state
> providers.  I know that this cannot be necessarily guaranteed, as a
> loss of
> multiple nodes at a site might mean the state only exists on another
> data
> center.  But for a minimum of 10M records, I can't afford the
> existing
> limitations on state transfer.  (And it goes without saying the NBST
> is an
> absolute must)
Indeed that would be the case if we decide to go with the #1 solution (virtual nodes) as described here: https://community.jboss.org/wiki/CrossDatacenterReplication-Design  
Othewise the resilience to local failures should be handled through numOwners. 

> - I will need a custom consistent hash that is data center-aware that
> can
> group keys into a local site.  All of my data is site-specific.
seems like you've decided to implement your use case based onsolution #1. If #2 or #3 is used you don't have this custom CH problem.  

> - I mentioned a +1 model for local key ownership on the wiki,
"+1" means that main datacenter has numOwners copies and the backup datacenter has always 1? 

  Taking
> that a
> step further, has there been any thought to a quorum model?  Here's
> what I'm
> concerned about.  Most of the time, a data center won't fail --
> rather
> there'll be some intermittent packet loss between the two.
the bridge between the datacenters should take care of retransmission and fault tolerance of the bridge-ends. I don't think that packet loss is possible.. 

  If I have
> 2
> owners on the key's local data center and 1 each on the backup sites,
> I may
> not want to rehash if I can't reach a data center.  I can understand
> if a
> prepared tx fails -- those I can retry in application code.
you  shouldn't have to. the bridge should do the retry/retransmission of the lost packets transparently.
> 
> I'm sure I'll have more =)

Thanks again for your input Erik!