[infinispan-issues] [JBoss JIRA] (ISPN-5570) Cross-site: retry backup commands

Dan Berindei (JIRA) issues at jboss.org
Mon Jun 22 06:50:05 EDT 2015


Dan Berindei created ISPN-5570:
----------------------------------

             Summary: Cross-site: retry backup commands
                 Key: ISPN-5570
                 URL: https://issues.jboss.org/browse/ISPN-5570
             Project: Infinispan
          Issue Type: Bug
          Components: Core, Cross-Site Replication
    Affects Versions: 7.2.3.Final
            Reporter: Dan Berindei
             Fix For: 8.0.0.Final


There are 3 phases in a backup RPC:

1. Sender -> Local site master: caused by the site master is shutting down or crashing, or by a network split.
2. Local site master -> Remote site master:
2.1. Local site master is no longer a site master, e.g. because it's shutting down or because it's no longer coordinator after a merge.
2.2. Remote site master is not longer a site master.
2.3. Link between local site and remote site is down.
3. Remote site master -> Backup targets

Replication failures in phase 3 are handled by retrying (except for TimeoutExceptions), because {{BaseBackupReceiver}} uses regular cache methods to perform the updates.

But replication failures in phases 1 and 2 are not handled in any way, except for causing the remote site to be taken offline after a certain number of replication failures (if backup is synchronous). We should instead retry backup RPCs when we get a {{SuspectException}} or {{UnreachableException}}, and perhaps even when we get no response (2.2?), and only stop when the timeout expires or when the backup is taken offline.

Async backup probably needs retrying as well, and perhaps even a more sophisticated approach like I-RAC (ISPN-2634).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


More information about the infinispan-issues mailing list