[keycloak-dev] [Keycloak Operator] Backup and restore

Tue Oct 29 06:45:04 EDT 2019

Thank you very much for the write-up, Sebastian.

We need to meet all of the requirements listed by Dave Martin.
Additionally, one more requirement is it needs to work with OpenShift 3
(Kubernetes 1.11.x) and OpenShift 4. I am in favour of your last sentence,
take the path of least resistance and implement backups the same as the
existing operator but architecting it to it can be somewhat extensible in
the future, however, I am generally in favour of refactoring and only
abstracting out when it is needed.

DAVID FFRENCH

Principal software engineer, CLOUD SERVICES

Red Hat Waterford <https://www.redhat.com/>

Communications House, Cork Road

Waterford, Ireland

dffrench at redhat.com

On Mon, Oct 28, 2019 at 9:57 AM Sebastian Laskawiec <slaskawi at redhat.com>
wrote:

> Hey!
>
> Over the last few days I've been investigating different options for
> performing backup and restore for Keycloak Operator. I've been discussing
> different parts of this functionality with some of you, and now I'd like to
> bring everybody on the same page.
>
> Grab a large cup of coffee and let's dig in!
>
> 1. Old Operator's behavior
>
> The Old Keycloak Operator used to use a CronJob to schedule database
> backup and upload its result into AWS S3 [1]. However, it seems (at least I
> can not find it) the Operator doesn't perform a restore operation.
>
> The biggest advantage of this approach is consistency (across other
> Operators created by Integreately Team [2]). But there are some downsides
> are well - it's limited to AWS, it uploads large images across wide open
> Internet, there's no way to ensure any retention policy (as we just upload
> the backup and forget about it).
>
> 3. Related work
>
> There's ongoing work on Persistent Volumes Snapshots [3], which is
> targeted to Kubernetes 1.16, which is probably OpenShift 4.3+ [4]. The
> functionality allows us to create a Persistent Volume Snapshot, which could
> be used as a backup. Later on, we could use such a snapshot to mount it
> into Postgresql Pod. However, I'm not sure what will happen with the
> in-flight transactions when we take such a snapshot from Postgresql. Once
> the snapshot functionality is in, we can test it out.
>
> I asked the Storage SIG if there are some plans to create an automatic
> tool that makes creating backups for the whole cluster a bit easier but
> I've been told the Storage SIG will create only building blocks (like the
> snapshots) for it. Nothing more.
>
> The Snapshot functionality could be taken one step further and we could
> imagine creating a backup for whole namespace (with all ConfigMaps, Secrets
> etc). It seems there's no out of the box tool that does this. The closest
> is Valero [5] by Heptio/VMware. I've also heard (from my private channels)
> about companies selling closed-source projects like that.
>
> The Persistent Volumes in Kubernetes use Storage Classes [6] to indicate,
> what underlying storage mechanism to use. Some of them (like GlusterFS or
> AzureDisk) may natively support backups. Externally configured backups
> might be tightly connected to budget (e.g. in AWS the slower the disk, the
> less money you pay for it).
>
> 4. An implementation idea with pg_dump
>
> I did some experiments with spinning up a new Pod (or a Job), with a brand
> new Persistent Volume and using gp_dumpall utility for backing up
> Postgresql [7]. We may also let the user decide on the storage class at
> this point (e.g. use a slow and cheap storage for backups).
>
> The idea is to act when a user creates a KeycloakBackup CR:
>
> apiVersion: keycloak.org/v1alpha1
> kind: KeycloakBackup
> metadata:
>   name: example-keycloakbackup
> spec:
>   # This field will be used for restoring a backup, I will explain it a
> bit later
>   #restore: true
>   instanceSelector:
>     matchLabels:
>       app: keycloak
>
> This triggers an Operator to create a Pod (or a Job) with a new Persistent
> Volume mounted and use pg_dumpall to create a backup. Once the backup is
> created, we leave the Persistent Volume in the cluster. Sending it to an
> external storage would use a user's responsibility. At this point we also
> don't care about periodic backups. If someone wishes to create them - he
> needs to create a CronJob that will be creating KeycloakBackups on his
> behalf (here's a link showing how to call Kubernetes API from a Job/Pod
> [8]). Once a user decides to restore a backup, he just sets the restore
> flag to `true`. Then the CR is in its terminal state - you can't do
> anything with a restored backup.
>
> This solution has some advantages - it creates a nice 1:1 mapping between
> a CR and a backup. It also maintains this mapping with each restore.
> Finally, we don't need to care about retention policy or scheduled backups
> - it's users (or K8s admin) responsibility to do that. We just create an
> additional Persistent Volume that contains a database backup. Of course,
> lack of retention policy and scheduling might be considered as a drawback -
> it's a valid point of view.
>
> 5. Integreately Team requirements
>
> @David Martin <davmarti at redhat.com> sent me a set of requirements for the
> Keycloak Operator around backups:
>
> - The operator can do backups (scheduled and manually triggered)
> - The backup process should push resources offsite. Otherwise I have to
> code that bit. If it doesn't do this, why bother with any backup logic at
> all
> - The operator should make the configuration of this as easy as possible
> e.g. allow a schedule to be configured, and a location/credentials to push
> - The operator should make a restore easy to do e.g. point to offsite
> location & credentials
> - Not directly related to this thread, but mentioning it for the larger
> picture. Problems with the backup should trigger an alert in Prometheus..
>
> 6. Final thoughts
>
> Unfortunately there's no ultimate solution for backing up the whole
> namespace yet. We are at the point where all necessary building blocks are
> being built as we speak (like Persistent Volume Snapshots). But we're
> months if not years from the final solution.
>
> I like the idea of backups I explained in #4. I believe it's very
> extensible but it doesn't fulfill most of the requirements from David's
> list. However, we could do a few tricks to make the situation a bit better:
> - we might implement a CronJob that will be creating KeycloakBackup
> according to the given schedule.
> - we might reuse (or slightly modify) the Integreately upload utility to
> support Persistent Voluments that already contain a backup. In other words
> - the utility will need to skip the pg_dump call.
>
> Alternatively - we may take the path of least resistance and implement
> backups the same way as in other Operators but separating the
> implementation with a clean and nice interface (so that we could extend it
> in the future).
>
> Thanks,
> Sebastian
>
> [1]
> https://github.com/integr8ly/keycloak-operator/blob/d4aa7f0fdcf765b578ed1107733e85b5aa17a1bb/pkg/keycloak/phaseHandler.go#L225
> [2] https://github.com/search?q=integreatly%2Fbackup-container&type=Code
> [3] https://kubernetes-csi.github.io/docs/snapshot-restore-feature.html
> [4]
> https://blog.openshift.com/wp-content/uploads/Red-Hat-OpenShift-4.0-Roadmap-Public-Feb-2019-Ali.pdf
> [5] https://velero.io/
> [6]
> https://kubernetes.io/docs/concepts/storage/storage-classes/#the-storageclass-resource
> [7]
> https://github.com/slaskawi/keycloak-operator/blob/INTLY-3367-Backups/backup.yaml
> [8]
> https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#without-using-a-proxy
>