<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    Hi Mark,<br>

    <br>

    let me try to clarify the scope and goals of the work Sebastiano and

    Diego are doing, as I think there may have been some

    misunderstanding here.<br>

    <br>

    First an important premise, which I think was not clear in their

    previous message. We are here considering the *full replication

    mode* of Infinispan, in which every node maintains all copies of the

    same data items. This implies that there is no need to fetch data

    from remote nodes during transaction execution. Also, Infinispan was

    configured NOT TO use eager locking. In other words, during

    transaction's execution Infinispan acquires locks only locally.<br>

    <br>

    What it is being compared is a single-master replication protocol

    (the classic primary-backup scheme) vs the 2PC based multi-master

    replication protocol (which is the one currently implemented by

    Infinispan). <br>

    <br>

    The two protocols behave identically during transactions' execution.

    What differs between the two protocols is the handling of the commit

    phase. <br>

    <br>

    In the single master mode, as no remote conflict can occur and local

    conflicts are managed by local concurrency control, data updates can

    be simply pushed to the backups. More precisely (this should answer

    also a question of Mircea), the primary DURING the commit phase, in

    a synchronous fashion, propagates the updates to the backups, waits

    for their ack, and only then returns from the commit request to the

    application.<br>

    <br>

    In the multi-master mode, 2PC is used to materialize conflicts among

    transactions executed at different nodes, and to ensure agreement on

    the transaction's outcome. Thus 2PC serves&nbsp; to enforce both

    atomicity and isolation.<br>

    <br>

    The plots attached in the mail from Diego and Sebastiano are

    comparing the performance of two replication strategies that provide

    the same degree of consistency/failure resiliency...at least

    assuming perfect failure detection, I'd prefer to avoid delving into

    a (actually interesting) discussion of non-blocking guarantees of

    the two protocols in partially or eventually synchronous

    environments in this mailing list. Thus I disagree when you say that

    they are not comparable. <br>

    <br>

    IMO, the only arguable unfairness in the comparison is that the

    multi-master scheme allows processing update transactions at every

    node, whereas the single-master scheme should&nbsp; rely either on 1) a

    load balancing scheme to ensure redirecting all the update

    transactions to the master, or 2) rely on a redirection scheme to

    redirect towards the master any update transactions "originated" by

    the backups. The testbed considered in the current evaluation is

    "emulating" option&nbsp; 1, which does favour the PB scheme. On the other

    hand, we are currently studying a non-intrusive way to implement a

    transparent redirection scheme (option 2) in Infinispan, so

    suggestions are very welcome! ;-)<br>

    <br>

    I hope this post clarified your perplexities!! :-)<br>

    <br>

    Cheers,<br>

    <br>

    &nbsp;&nbsp;&nbsp; Paolo<br>

    <br>

    <br>

    <br>

    On 2/17/11 3:53 PM, Mark Little wrote:

    <blockquote

      cite="mid:2F42CB44-FDCF-4D7C-BF13-39BD2259210B@redhat.com"

      type="cite">Hi

      <div><br>

        <div>

          <div>On 11 Feb 2011, at 17:56, Sebastiano Peluso wrote:</div>

          <br class="Apple-interchange-newline">

          <blockquote type="cite">

            <div>Hi all,<br>

              <br>

              first let us introduce ourselves given that it's the first

              time we write in this mailing list.<br>

              <br>

              Our names are Sebastiano Peluso and Diego Didona, and we

              are working at INESC-ID Lisbon in the context of the

              Cloud-TM project. Our work is framed in the context of

              self-adaptive replication mechanisms (please refer to

              previous message by Paolo on this mailing list and to the

              <a moz-do-not-send="true" href="http://www.cloudtm.eu">www.cloudtm.eu</a>

              website for additional details on this project).<br>

              <br>

              Up to date we have been working on developing a relatively

              simple primary-backup (PB) replication mechanism, which we

              integrated within Infinispan 4.2 and 5.0. In this kind of

              scheme only one node (called the primary) is allowed to

              process update transactions, whereas the remaining nodes

              only process read-only transactions. This allows coping

              very efficiently with write transactions, as the primary

              does not have to incur in the cost of remote coordination

              schemes nor in distributed deadlocks, which can hamper

              performance at high contention with two phase commit

              schemes. With PB, in fact, the primary can serialize

              transactions locally, and simply propagate the updates to

              the backups for fault-tolerance as well as to allow them

              to process read-only transactions on fresh data of course.

              On the other hand, the primary is clearly prone to become

              the bottleneck especially in large clusters and write

              intensive workloads.<br>

            </div>

          </blockquote>

          <div><br>

          </div>

          So when you say "serialize transactions locally" does this

          mean you don't do distributed locking? Or is "locally" not

          referring simply to the primary?</div>

        <div><br>

          <blockquote type="cite">

            <div><br>

              Thus, this scheme does not really represent a replacement

              for the default 2PC protocol, but rather an alternative

              approach that results particularly attractive (as we will

              illustrate in the following with some radargun-based

              performance results) in small scale clusters, or, in

              "elastic" cloud scenarios, in periods where the workload

              is either read dominated or not very intense.</div>

          </blockquote>

          <div><br>

          </div>

          <div>I'm not sure what you mean by a "replacement" for 2PC?

            2PC is for atomicity (consensus). It's the A in ACID. It's

            not the C, I or D.</div>

          <br>

          <blockquote type="cite">

            <div> Being this scheme particularly efficient in these

              scenarios, in fact, its adoption would allow to minimize

              the number of resources acquired from the cloud provider

              in these periods, with direct benefits in terms of cost

              reductions. In Cloud-TM, in fact, we aim at designing

              autonomic mechanisms that would dynamically switch among

              multiple replication mechanisms depending on the current

              workload characteristics.<br>

            </div>

          </blockquote>

          <div><br>

          </div>

          Replication and transactions aren't mutually inconsistent

          things. They can be merged quite well :-)</div>

        <div><br>

        </div>

        <div><a moz-do-not-send="true"

            href="http://www.cs.ncl.ac.uk/publications/articles/papers/845.pdf">http://www.cs.ncl.ac.uk/publications/articles/papers/845.pdf</a></div>

        <div><a moz-do-not-send="true"

            href="http://www.cs.ncl.ac.uk/publications/inproceedings/papers/687.pdf">http://www.cs.ncl.ac.uk/publications/inproceedings/papers/687.pdf</a></div>

        <div><a moz-do-not-send="true"

            href="http://www.cs.ncl.ac.uk/publications/inproceedings/papers/586.pdf">http://www.cs.ncl.ac.uk/publications/inproceedings/papers/586.pdf</a></div>

        <div><a moz-do-not-send="true"

            href="http://www.cs.ncl.ac.uk/publications/inproceedings/papers/596.pdf">http://www.cs.ncl.ac.uk/publications/inproceedings/papers/596.pdf</a></div>

        <div><a moz-do-not-send="true"

            href="http://www.cs.ncl.ac.uk/publications/trnn/papers/117.pdf">http://www.cs.ncl.ac.uk/publications/trnn/papers/117.pdf</a></div>

        <div><a moz-do-not-send="true"

            href="http://www.cs.ncl.ac.uk/publications/inproceedings/papers/621.pdf">http://www.cs.ncl.ac.uk/publications/inproceedings/papers/621.pdf</a></div>

        <div><br>

        </div>

        <div>The reason I mention these is because I don't quite

          understand what your goal is/was with regards to transactions

          and replication.</div>

        <div><br>

          <blockquote type="cite">

            <div><br>

              Before discussing the results of our preliminary

              benchmarking study, we would like to briefly overview how

              we integrated this replication mechanism within

              Infinispan. Any comment/feedback is clearly highly

              appreciated. First of all we have defined a new command,

              namely PassiveReplicationCommand, that is a subclass of

              PrepareCommand. We had to define a new command because we

              had to design customized "visiting" methods for the

              interceptors. Note that our protocol only affects the

              commit phase of a transaction, specifically, during the

              prepare phase, in the prepare method of the

              TransactionXaAdapter class, if the Primary Backup mode is

              enabled, then a PassiveReplicationCommand is built by the

              CommandsFactory and it is passed to the invoke method of

              the invoker. The PassiveReplicationCommand is then visited

              by the all the interceptors in the chain, by means of the

              visitPassiveReplicationCommand methods. We describe more

              in detail the operations performed by the non-trivial

              interceptors:<br>

              <br>

              -TxInterceptor: like in the 2PC protocol, if the context

              is not originated locally, then for each modification

              stored in the PassiveReplicationCommand the chain of

              interceptors is invoked.<br>

              -LockingInterceptor: first the next interceptor is called,

              then the cleanupLocks is performed with the second

              parameter set to true (commit the operations). This

              operation is always safe: on the primary it is called only

              after that the acks from all the slaves are received &nbsp;(see

              the ReplicationInterceptor below); on the slave there are

              no concurrent conflicting writes since these are already

              locally serialized by the locking scheme performed at the

              primary.<br>

              -ReplicationInterceptor: first it invokes the next

              iterceptor; then if the predicate

              shouldInvokeRemoteTxCommand(ctx) is true, then the method

              rpcManager.broadcastRpcCommand(command, true, false) is

              performed, that replicates the modifications in a

              synchronous mode (waiting for an explicit ack from all the

              backups).<br>

              <br>

              As for the commit phase:<br>

              -Given that in the Primary Backup the prepare phase works

              as a commit too, the commit method on a

              TransactionXaAdapter object in this case simply returns.<br>

              <br>

              On the resulting extended Infinispan version a subset of

              unit/functional tests were executed and successfully

              passed:<br>

              - commands.CommandIdUniquenessTest<br>

              - replication.SyncReplicatedAPITest<br>

              - replication.SyncReplImplicitLockingTest<br>

              - replication.SyncReplLockingTest<br>

              - replication.SyncReplTest<br>

              - replication.SyncCacheListenerTest<br>

              - replication.ReplicationExceptionTest<br>

              <br>

              <br>

              We have tested this solution using a customized version of

              Radargun. Our customizations were first of all aimed at

              having each thread accessing data within transactions,

              instead of executing single put/get operations. In

              addition, now every Stresser thread accesses with uniform

              probability all of the keys stored by Infinispan, thus

              generating conflicts with a probability proportional to

              the number of concurrently active threads and inversely

              proportional to the total number of keys maintained.<br>

              <br>

              As already hinted, our results highlight that, depending

              on the current workload/number of nodes in the system, it

              is possible to identify scenarios where the PB scheme

              significantly outperforms the current 2PC scheme, and vice

              versa.</div>

          </blockquote>

          <div><br>

          </div>

          I'm obviously missing something here, because it still seems

          like you are attempting to compare incomparable things. It's

          not like comparing apples and oranges (at least they are both

          fruit), but more like apples and broccoli ;-)</div>

        <div><br>

          <blockquote type="cite">

            <div> Our experiments were performed in a cluster of

              homogeneous 8-core (<a class="moz-txt-link-abbreviated" href="mailto:Xeon@2.16GhZ">Xeon@2.16GhZ</a>) nodes interconnected via

              a Gigabit Ethernet and running a Linux Kernel 64 bit

              version 2.6.32-21-server. The results were obtained by

              running for 30 seconds 8 parallel Stresser threads per

              nodes, and letting the number of node vary from 2 to 10.

              In 2PC, each thread executes transactions which consist of

              10 (get/put) operations, with a 10% of probability of

              generating a put operation. With PB, the same kind of

              transactions are executed on the primary, but the backups

              execute read-only transactions composed of 10 get

              operations. This allows to compare the maximum throughput

              of update transactions provided by the two compared

              schemes, without excessively favoring PB by keeping the

              backups totally idle.<br>

              <br>

              The total write transactions' throughput exhibited by the

              cluster (i.e. not the throughput per node) is shown in the

              attached plots, relevant to caches containing 1000, 10000

              and 100000 keys. As already discussed, the lower the

              number of keys, the higher the chance of contention and

              the probability of aborts due to conflicts. In particular

              with the 2PC scheme the number of failed transactions

              steadily increases at high contention up to 6% (to the

              best of our understanding in particular due to distributed

              deadlocks). With PB, instead the number of failed txs due

              to contention is always 0.<br>

            </div>

          </blockquote>

          <div><br>

          </div>

          Maybe you don't mean 2PC, but pessimistic locking? I'm still

          not too sure. But I am pretty sure it's not 2PC you mean.</div>

        <div><br>

          <blockquote type="cite">

            <div><br>

              Note that, currently, we are assuming that backup nodes do

              not generate update transactions. In practice this

              corresponds to assuming the presence of some load

              balancing scheme which directs (all and only) update

              transactions to the primary node, and read transactions to

              the backup.</div>

          </blockquote>

          <div><br>

          </div>

          Can I say it again? Serialisability?</div>

        <div><br>

          <blockquote type="cite">

            <div> In the negative case (a put operation is generated on

              a backup node), we simply throw a

              PassiveReplicationException at the CacheDelegate level.

              This is probably suboptimal/undesirable in real settings,

              as update transactions may be transparently rerouted (at

              least in principle!) to the primary node in a RPC-style.

              Any suggestion on how to implement such a redirection

              facility in a transparent/non-intrusive manner would be

              highly appreciated of course! ;-)<br>

              <br>

              To conclude, we are currently working on a statistical

              model that is able to predict the best suited replication

              scheme given the current workload/number of machines, as

              well as on a mechanism to dynamically switch from one

              replication scheme to the other.<br>

              <br>

              <br>

              We'll keep you posted on our progresses!<br>

              <br>

              Regards,<br>

              &nbsp;&nbsp;Sebastiano Peluso, Diego Didona<br>

              <span>&lt;PCvsPR_WRITE_TX_1000keys.png&gt;</span><span>&lt;PCvsPR_WRITE_TX_10000keys.png&gt;</span><span>&lt;PCvsPR_WRITE_TX_100000keys.png&gt;</span>------------------------------------------------------------------------------<br>

              The ultimate all-in-one performance toolkit: Intel(R)

              Parallel Studio XE:<br>

              Pinpoint memory and threading errors before they happen.<br>

              Find and fix more than 250 security defects in the

              development cycle.<br>

              Locate bottlenecks in serial and parallel code that limit

              performance.<br>

              <a moz-do-not-send="true"

href="http://p.sf.net/sfu/intel-dev2devfeb_______________________________________________">http://p.sf.net/sfu/intel-dev2devfeb_______________________________________________</a><br>

              Cloudtm-discussion mailing list<br>

              <a class="moz-txt-link-abbreviated" href="mailto:Cloudtm-discussion@lists.sourceforge.net">Cloudtm-discussion@lists.sourceforge.net</a><br>

<a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/cloudtm-discussion">https://lists.sourceforge.net/lists/listinfo/cloudtm-discussion</a><br>

            </div>

          </blockquote>

        </div>

        <br>

        <div>

          <div style="word-wrap: break-word; font-size: 12px;">

            <div>

              <div>---</div>

              <div>Mark Little</div>

              <div><a moz-do-not-send="true"

                  href="mailto:mlittle@redhat.com">mlittle@redhat.com</a></div>

              <div><br class="webkit-block-placeholder">

              </div>

              <div>JBoss, by Red Hat</div>

              <div>Registered Address: Red Hat UK Ltd, Amberley Place,

                107-111 Peascod Street, Windsor, Berkshire, SI4 1TE,

                United Kingdom.</div>

              <div>Registered in UK and Wales under Company Registration

                No. 3798903 Directors: Michael Cunningham (USA), Charlie

                Peters (USA), Matt Parsons (USA) and Brendan Lane

                (Ireland).</div>

            </div>

            <div><br>

            </div>

          </div>

          <br class="Apple-interchange-newline">

          <br class="Apple-interchange-newline">

        </div>

        <br>

      </div>

      <pre wrap="">

<fieldset class="mimeAttachmentHeader"></fieldset>

------------------------------------------------------------------------------

The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:

Pinpoint memory and threading errors before they happen.

Find and fix more than 250 security defects in the development cycle.

Locate bottlenecks in serial and parallel code that limit performance.

<a class="moz-txt-link-freetext" href="http://p.sf.net/sfu/intel-dev2devfeb">http://p.sf.net/sfu/intel-dev2devfeb</a></pre>

      <pre wrap="">

<fieldset class="mimeAttachmentHeader"></fieldset>

_______________________________________________

Cloudtm-discussion mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Cloudtm-discussion@lists.sourceforge.net">Cloudtm-discussion@lists.sourceforge.net</a>

<a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/cloudtm-discussion">https://lists.sourceforge.net/lists/listinfo/cloudtm-discussion</a>

</pre>

    </blockquote>

    <br>

    <br>

  </body>

</html>