[infinispan-dev] [Cloudtm-discussion] [SPAM] Re: Primary-Backup replication scheme in Infinispan
Paolo Romano
romano at inesc-id.pt
Wed Mar 9 13:58:56 EST 2011
On 3/9/11 11:41 AM, Manik Surtani wrote:
> Paolo,
>
>
> On 17 Feb 2011, at 19:51, Paolo Romano wrote:
>
>> First an important premise, which I think was not clear in their previous message. We are here considering the *full replication mode* of Infinispan, in which every node maintains all copies of the same data items. This implies that there is no need to fetch data from remote nodes during transaction execution. Also, Infinispan was configured NOT TO use eager locking. In other words, during transaction's execution Infinispan acquires locks only locally.
> So are you suggesting that this scheme maintains a single, global master node for the entire cluster, for *all* keys? Doesn't this become a bottleneck, and how do you deal with the master node failing?
Hi Manik,
of course the primary (or master) can become a bottleneck if the number
of update transactions is very large. If the % of write transactions is
very high, however, then we have to distinguish two cases: low vs high
contention.
At high contention, in fact, the 2PC-based replication scheme used by
Infinispan (2PC from now for the sake of brevity ;-) ) falls prey of
deadlocks and starts trashing. This is the reasons why 2PC's performance
is so poor in the plot attached to Diego and Sebastiano's mail for the
case of 1000 keys. Using the primary-backup, being concurrency regulated
locally at the primary, and much more efficiently, the actual performace
is overall much better.
Clearly 2PC *can* scale much better if there's no contention and a write
intensive workload, as it can use the horsepower of more nodes to
process writes... but this does depend on the workload.
One of the results we would like to achieve with Cloud-TM is designing
mechanisms that adaptively commute between multiple replication schemes
depending on the scale of the platform and its current workload. That's
just a first step in this direction!
Concerning failures of the master, this is not an issue. In fact, it
waits synchronously for the replies of the backups. Thus, if it fails,
it will be purged by the current view, and as a new one is delivered we
can elect a new primary. The only glitch that may occur is on more
complicated failure scenario:
Primary sends an update, say "u",to the backups {B1,....Bn}.
Primary crashes while doing so.
B1 receives u, while the others B2,..,Bn do not.
B1 runs some read-only transaction that see u (and money is dispensed,
missiles are fired and other horrible things happen due to that)
B1 crashes
A new view is delivered which excludes the former primary and B1.
B2 (for instance) is elected the new primary, but "u" is lost.
This can be avoided by having the backups acknowledging each other the
reception of the updates before committing them (more formally, we
should have the primary disseminating updates via Uniform Reliable
Broadcast [1]).... but at the moment we're not doing this mainly for the
sake of simplicity, but we expect the results to be very similar.
Cheers,
Paolo
[1] Uniform Reliable Multicast in a Virtually Synchronous Environment,
citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.5421
> Cheers
> Manik
>
> --
> Manik Surtani
> manik at jboss.org
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Paolo Romano, PhD
Senior Researcher
INESC-ID
Rua Alves Redol, 9
1000-059, Lisbon Portugal
Tel. + 351 21 3100300
Fax + 351 21 3145843
Webpage http://www.gsd.inesc-id.pt/~romanop
More information about the infinispan-dev
mailing list