[jboss-jira] [JBoss JIRA] (WFWIP-207) UX: Force removal of Operator upon delete - do not hang due to finalizers

Ondrej Chaloupka (Jira) issues at jboss.org
Tue Sep 24 08:03:03 EDT 2019


    [ https://issues.jboss.org/browse/WFWIP-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788411#comment-13788411 ] 

Ondrej Chaloupka commented on WFWIP-207:
----------------------------------------

[~jmesnil] I would like to get feedback on this issue if I can ask.

Currently, the operator starts the recovery in two cases
* during scaledown (number of replicas decreases)
* when the operator resource is deleted (all resources of the {{wildflyserver}} CR is to be removed).

The {{wildflyserver}} removal ({{oc delete wildflyserver --all}}) is managed by finalizers. When they are set-up at the CR then the resource can't be removed. The operator is responsible to finish all transaction with the recovery process and then remove the finalizer.

This issue (WFWIP-207) asks to remove the finalizers completely. That means the {{oc delete wildflyserver --all}} will just remove forcibly all data without any transactional checking.

My idea was that the recovery should ensure that even the deletion is safe from the transaction consistency perspective. But if people understand the {{oc delete wildflyserver --all}} is the same as {{rm -rf}} then maybe the finalizers should not be part of the process. 

In such case the documentation would need to say that the removal is not transactional safe without scaling down to `0`. That means the user would need to scale down and then remove if he wants the recovery to clean transactions.

I'm not sure if it's not a bit confusing for users but still I understand that my point of view of "bulletproof" transactional consistency could not be the right approach.

> UX: Force removal of Operator upon delete - do not hang due to finalizers
> -------------------------------------------------------------------------
>
>                 Key: WFWIP-207
>                 URL: https://issues.jboss.org/browse/WFWIP-207
>             Project: WildFly WIP
>          Issue Type: Bug
>          Components: OpenShift
>            Reporter: Petr Kremensky
>            Assignee: Ondrej Chaloupka
>            Priority: Blocker
>              Labels: operator
>
> We run yet into another use case where finalizers prevent users from deleting the project - the delete operation hangs.
> pods:
> {noformat}
> $ oc get all
> NAME                                    READY   STATUS             RESTARTS   AGE
> pod/simple-jaxrs-operator-0             0/1     ImagePullBackOff   0          9m11s
> pod/simple-jaxrs-operator-1             0/1     ImagePullBackOff   0          9m11s
> pod/wildfly-operator-686846d6fb-db9sj   1/1     Running
> $ oc delete wildflyserver simple-jaxrs-operator 
> wildflyserver.wildfly.org "simple-jaxrs-operator" deleted
> ... hangs forever     
> {noformat}
> operator log:
> {noformat}
> {"level":"info","ts":1569308322.2926116,"logger":"controller_wildflyserver","msg":"Reconciling WildFlyServer","Request.Namespace":"pkremens-namespace","Request.Name":"simple-jaxrs-operator"}
> {"level":"info","ts":1569308322.2927597,"logger":"controller_wildflyserver","msg":"WildflyServer is marked for deletion. Waiting for finalizers to clean the workspace","Request.Namespace":"pkremens-namespace","Request.Name":"simple-jaxrs-operator"}
> {"level":"info","ts":1569308322.2929516,"logger":"controller_wildflyserver","msg":"Transaction recovery scaledown processing","Request.Namespace":"pkremens-namespace","Request.Name":"simple-jaxrs-operator","Pod Name":"simple-jaxrs-operator-0","IP Address":"10.128.0.227","Pod State":"SCALING_DOWN_RECOVERY_INVESTIGATION","Pod Phase":"Pending"}
> {"level":"info","ts":1569308322.2931426,"logger":"controller_wildflyserver","msg":"Transaction recovery scaledown processing","Request.Namespace":"pkremens-namespace","Request.Name":"simple-jaxrs-operator","Pod Name":"simple-jaxrs-operator-1","IP Address":"10.128.0.226","Pod State":"SCALING_DOWN_RECOVERY_INVESTIGATION","Pod Phase":"Pending"}
> {"level":"error","ts":1569308322.294659,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"wildflyserver-controller","request":"pkremens-namespace/simple-jaxrs-operator","error":"Finalizer processing: failed transaction recovery for WildflyServer pkremens-namespace:simple-jaxrs-operator name Error: Found 2 errors:\n [[Pod 'simple-jaxrs-operator-0' / 'simple-jaxrs-operator' is in pending phase Pending. It will be hopefully started in a while. Transaction recovery needs the pod being fully started to be capable to mark it as clean for the scale down.]], [[Pod 'simple-jaxrs-operator-1' / 'simple-jaxrs-operator' is in pending phase Pending. It will be hopefully started in a while. Transaction recovery needs the pod being fully started to be capable to mark it as clean for the scale down.]],","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr at v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime at v0.1.12/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime at v0.1.12/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery at v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery at v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery at v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"}
> {noformat}
> This is a call between safety vs. usability, but we believe that these issues (hanging delete command due to EAP7-1192) could be a serious usability problem for users.
> *actual*
>  * scale down can require manual user interaction forced by finalizers
>  * delete can hang, requiring manual user interaction (delete deployment object, remove finalizer from operator CR, run delete again)
> *expected*
>  * scale down can require manual user interaction forced by finalizers
>  * delete should never hang, it should be treated like a pulling a plug (rm -rf), in case users needs to make s graceful shutdown, he make a proper scale down to 0 prior the project deletion - this should be properly documented



--
This message was sent by Atlassian Jira
(v7.13.8#713008)



More information about the jboss-jira mailing list