Hey Radim,

Moving to dev mailing list.

Comments inlined.

Thanks,

Sebastian

On Tue, May 2, 2017 at 5:28 PM Radim Vansa <rvansa@redhat.com> wrote:

Hi Sebastian,

I am currently getting acquainted with OpenShift so I have been reading
your blogposts about that. Couple of questions:

http://blog.infinispan.org/2016/10/openshift-and-node-affinity.html

- so you need to have different deployment config for each rack/site?

Yes. A while ago I read an article about managing scheduler using labels:

https://blog.openshift.com/deploying-applications-to-specific-nodes/

So I think it can be optimized to 1 DeploymentConfig + some magic in spec.template. But that's only my intuition. I haven't played with this yet.

http://blog.infinispan.org/2017/03/checking-infinispan-cluster-health-and.html

maxUnavailable: 1 and maxSurge: 1 don't sound too good to me - if you
can't fit all the data into single pod, you need to set maxUnavailable:
0 (to not bring any nodes down before the rolling upgrade completes) and
maxSurge: 100% to have enough nodes started. + Some post-hook to make
sure all data are in new cluster before you bring down the old one. Am I
missing something?

Before answering those questions, let me show you two examples:

maxUnavailable: 1, maxSurge 1

oc logs transactions-repository-2-deploy -f

--> Scaling up transactions-repository-2 from 0 to 3, scaling down transactions-repository-1 from 3 to 0 (keep 2 pods available, don't exceed 4 pods)
Scaling transactions-repository-2 up to 1
Scaling transactions-repository-1 down to 2
Scaling transactions-repository-2 up to 2
Scaling transactions-repository-1 down to 1
Scaling transactions-repository-2 up to 3
Scaling transactions-repository-1 down to 0
--> Success

maxUnavailable: 0, maxSurge 100%

oc logs transactions-repository-3-deploy -f

--> Scaling up transactions-repository-3 from 0 to 3, scaling down transactions-repository-2 from 3 to 0 (keep 3 pods available, don't exceed 6 pods)
Scaling transactions-repository-3 up to 3
Scaling transactions-repository-2 down to 1
Scaling transactions-repository-2 down to 0
--> Success

So we are talking about Kubernetes Rolling Update here. You have a new version of your deployment (e.g. with updated parameters, labels etc) and you want update your deployment in Kubernetes (do not mess it up with Infinispan Rolling Upgrade where the intention is to roll out a new Infinispan cluster).

The former approach (maxUnavailable: 1, maxSurge 1) allocates additional Infinispan node for greater cluster capacity. Then it scales the old cluster down. This results in sending KILL [1] signal to the Pod so it gets a chance to shut down gracefully. As a side effect, this also triggers cluster rebalance (since 1 node leaves the cluster). And we go like this on and on until we replace old cluster with new one.

The latter approach spins a new cluster up. Then Kubernetes sends KILL signal too all old cluster members.

Both approaches should work if configured correctly (the former relies heavily on readiness probes and the latter on moving data off the node after receiving KILL signal). However I would assume the latter generates much more network traffic in a short period of time which I consider a bit more risky.

Regarding to to a hook which ensures all data has been migrated - I'm not sure how to build such a hook. The main idea is to keep cluster in operational state so that none of the client would notice the rollout. It works like a charm with the former approach.

[1] https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

Radim

--
Radim Vansa <rvansa@redhat.com>
JBoss Performance Team