[keycloak-dev] [Keycloak Operator] Backup and restore

Thu Oct 31 07:19:28 EDT 2019

A draft implementation has been pushed here:
https://github.com/keycloak/keycloak-operator/pull/57

It supports:
* One time backup uploaded to S3
* Periodic backups uploaded to S3
* One time backup created in a local Persistent Volume

Thanks,
Sebastian

On Tue, Oct 29, 2019 at 11:45 AM David Ffrench <dffrench at redhat.com> wrote:

> Thank you very much for the write-up, Sebastian.
>
> We need to meet all of the requirements listed by Dave Martin.
> Additionally, one more requirement is it needs to work with OpenShift 3
> (Kubernetes 1.11.x) and OpenShift 4. I am in favour of your last sentence,
> take the path of least resistance and implement backups the same as the
> existing operator but architecting it to it can be somewhat extensible in
> the future, however, I am generally in favour of refactoring and only
> abstracting out when it is needed.
>
> DAVID FFRENCH
>
> Principal software engineer, CLOUD SERVICES
>
> Red Hat Waterford <https://www.redhat.com/>
>
> Communications House, Cork Road
>
> Waterford, Ireland
>
> dffrench at redhat.com
>
>
>
>
>
> On Mon, Oct 28, 2019 at 9:57 AM Sebastian Laskawiec <slaskawi at redhat.com>
> wrote:
>
>> Hey!
>>
>> Over the last few days I've been investigating different options for
>> performing backup and restore for Keycloak Operator. I've been discussing
>> different parts of this functionality with some of you, and now I'd like to
>> bring everybody on the same page.
>>
>> Grab a large cup of coffee and let's dig in!
>>
>> 1. Old Operator's behavior
>>
>> The Old Keycloak Operator used to use a CronJob to schedule database
>> backup and upload its result into AWS S3 [1]. However, it seems (at least I
>> can not find it) the Operator doesn't perform a restore operation.
>>
>> The biggest advantage of this approach is consistency (across other
>> Operators created by Integreately Team [2]). But there are some downsides
>> are well - it's limited to AWS, it uploads large images across wide open
>> Internet, there's no way to ensure any retention policy (as we just upload
>> the backup and forget about it).
>>
>> 3. Related work
>>
>> There's ongoing work on Persistent Volumes Snapshots [3], which is
>> targeted to Kubernetes 1.16, which is probably OpenShift 4.3+ [4]. The
>> functionality allows us to create a Persistent Volume Snapshot, which could
>> be used as a backup. Later on, we could use such a snapshot to mount it
>> into Postgresql Pod. However, I'm not sure what will happen with the
>> in-flight transactions when we take such a snapshot from Postgresql. Once
>> the snapshot functionality is in, we can test it out.
>>
>> I asked the Storage SIG if there are some plans to create an automatic
>> tool that makes creating backups for the whole cluster a bit easier but
>> I've been told the Storage SIG will create only building blocks (like the
>> snapshots) for it. Nothing more.
>>
>> The Snapshot functionality could be taken one step further and we could
>> imagine creating a backup for whole namespace (with all ConfigMaps, Secrets
>> etc). It seems there's no out of the box tool that does this. The closest
>> is Valero [5] by Heptio/VMware. I've also heard (from my private channels)
>> about companies selling closed-source projects like that.
>>
>> The Persistent Volumes in Kubernetes use Storage Classes [6] to indicate,
>> what underlying storage mechanism to use. Some of them (like GlusterFS or
>> AzureDisk) may natively support backups. Externally configured backups
>> might be tightly connected to budget (e.g. in AWS the slower the disk, the
>> less money you pay for it).
>>
>> 4. An implementation idea with pg_dump
>>
>> I did some experiments with spinning up a new Pod (or a Job), with a
>> brand new Persistent Volume and using gp_dumpall utility for backing up
>> Postgresql [7]. We may also let the user decide on the storage class at
>> this point (e.g. use a slow and cheap storage for backups).
>>
>> The idea is to act when a user creates a KeycloakBackup CR:
>>
>> apiVersion: keycloak.org/v1alpha1
>> kind: KeycloakBackup
>> metadata:
>>   name: example-keycloakbackup
>> spec:
>>   # This field will be used for restoring a backup, I will explain it a
>> bit later
>>   #restore: true
>>   instanceSelector:
>>     matchLabels:
>>       app: keycloak
>>
>> This triggers an Operator to create a Pod (or a Job) with a new
>> Persistent Volume mounted and use pg_dumpall to create a backup. Once the
>> backup is created, we leave the Persistent Volume in the cluster. Sending
>> it to an external storage would use a user's responsibility. At this point
>> we also don't care about periodic backups. If someone wishes to create them
>> - he needs to create a CronJob that will be creating KeycloakBackups on his
>> behalf (here's a link showing how to call Kubernetes API from a Job/Pod
>> [8]). Once a user decides to restore a backup, he just sets the restore
>> flag to `true`. Then the CR is in its terminal state - you can't do
>> anything with a restored backup.
>>
>> This solution has some advantages - it creates a nice 1:1 mapping between
>> a CR and a backup. It also maintains this mapping with each restore.
>> Finally, we don't need to care about retention policy or scheduled backups
>> - it's users (or K8s admin) responsibility to do that. We just create an
>> additional Persistent Volume that contains a database backup. Of course,
>> lack of retention policy and scheduling might be considered as a drawback -
>> it's a valid point of view.
>>
>> 5. Integreately Team requirements
>>
>> @David Martin <davmarti at redhat.com> sent me a set of requirements for
>> the Keycloak Operator around backups:
>>
>> - The operator can do backups (scheduled and manually triggered)
>> - The backup process should push resources offsite. Otherwise I have to
>> code that bit. If it doesn't do this, why bother with any backup logic at
>> all
>> - The operator should make the configuration of this as easy as possible
>> e.g. allow a schedule to be configured, and a location/credentials to push
>> - The operator should make a restore easy to do e.g. point to offsite
>> location & credentials
>> - Not directly related to this thread, but mentioning it for the larger
>> picture. Problems with the backup should trigger an alert in Prometheus..
>>
>> 6. Final thoughts
>>
>> Unfortunately there's no ultimate solution for backing up the whole
>> namespace yet. We are at the point where all necessary building blocks are
>> being built as we speak (like Persistent Volume Snapshots). But we're
>> months if not years from the final solution.
>>
>> I like the idea of backups I explained in #4. I believe it's very
>> extensible but it doesn't fulfill most of the requirements from David's
>> list. However, we could do a few tricks to make the situation a bit better:
>> - we might implement a CronJob that will be creating KeycloakBackup
>> according to the given schedule.
>> - we might reuse (or slightly modify) the Integreately upload utility to
>> support Persistent Voluments that already contain a backup. In other words
>> - the utility will need to skip the pg_dump call.
>>
>> Alternatively - we may take the path of least resistance and implement
>> backups the same way as in other Operators but separating the
>> implementation with a clean and nice interface (so that we could extend it
>> in the future).
>>
>> Thanks,
>> Sebastian
>>
>> [1]
>> https://github.com/integr8ly/keycloak-operator/blob/d4aa7f0fdcf765b578ed1107733e85b5aa17a1bb/pkg/keycloak/phaseHandler.go#L225
>> [2] https://github.com/search?q=integreatly%2Fbackup-container&type=Code
>> [3] https://kubernetes-csi.github.io/docs/snapshot-restore-feature.html
>> [4]
>> https://blog.openshift.com/wp-content/uploads/Red-Hat-OpenShift-4.0-Roadmap-Public-Feb-2019-Ali.pdf
>> [5] https://velero.io/
>> [6]
>> https://kubernetes.io/docs/concepts/storage/storage-classes/#the-storageclass-resource
>> [7]
>> https://github.com/slaskawi/keycloak-operator/blob/INTLY-3367-Backups/backup.yaml
>> [8]
>> https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#without-using-a-proxy
>>
>