[infinispan-issues] [JBoss JIRA] (ISPN-6673) Implement Rolling Upgrades with Kubernetes

Wed Jul 27 08:55:00 EDT 2016

    [ https://issues.jboss.org/browse/ISPN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271311#comment-13271311 ] 

Sebastian Łaskawiec edited comment on ISPN-6673 at 7/27/16 8:54 AM:
--------------------------------------------------------------------

The Rolling update for Kubernetes and OpenShift looks the following:
# Create a new app for infinispan (I'm using my own image with additional health and readiness checks) - {{slaskawi/infinispan-ru-1}}:
{code}
/opt/jboss/infinispan-server/bin/is_ready.sh
#!/bin/bash
for i in `seq 1 10`;
do
  sleep 1s
  /opt/jboss/infinispan-server/bin/ispn-cli.sh -c --controller=$(hostname -i):9990 '/subsystem=datagrid-infinispan/cache-container=clustered/distributed-cache=*:read-attribute(name=cache-rebalancing-status)' | awk '/result/{gsub("\"", "", $3); print $3}' | awk '{if(NR>1)print}' | grep -v 'PENDING\|IN_PROGRESS\|SUSPENDED'
  if [ $? -eq 0 ]; then
    exit 0
  fi
done
exit 1
{code}
{code}
/opt/jboss/infinispan-server/bin/is_healthy.sh
#!/bin/bash
for i in `seq 1 10`;
do
  sleep 1s
  /opt/jboss/infinispan-server/bin/ispn-cli.sh -c --controller=$(hostname -i):9990 '/:read-attribute(name=server-state)' | awk '/result/{gsub("\"", "", $3); print $3}' | grep running
  if [ $? -eq 0 ]; then
    exit 0
  fi
done
exit 1
{code}
Since the rebalance status might vary from run to run (imagine a node joining the cluster), there are two ways to deal with it - either use wait as I did or set {{successThreshold}} to a number larger than 1 in the deployment configuration.
# Update the deployment configuration:
{code}
apiVersion: v1
kind: DeploymentConfig
metadata:
  name: infinispan-ru-1
  namespace: myproject
  selfLink: /oapi/v1/namespaces/myproject/deploymentconfigs/infinispan-ru-1
  uid: 6def5411-53e2-11e6-97aa-54ee751d46e3
  resourceVersion: '6570'
  generation: 28
  creationTimestamp: '2016-07-27T10:11:05Z'
  labels:
    app: infinispan-ru-1
  annotations:
    openshift.io/deployment.instantiated: 'true'
    openshift.io/generated-by: OpenShiftNewApp
spec:
  strategy:
    type: Rolling
    rollingParams:
      updatePeriodSeconds: 1
      intervalSeconds: 1
      timeoutSeconds: 600
      maxUnavailable: 0%
      maxSurge: 25%
    resources:
  triggers:
    -
      type: ConfigChange
    -
      type: ImageChange
      imageChangeParams:
        automatic: true
        containerNames:
          - infinispan-ru-1
        from:
          kind: ImageStreamTag
          namespace: myproject
          name: 'infinispan-ru-1:latest'
        lastTriggeredImage: 'slaskawi/infinispan-ru-1 at sha256:6d2de3cad2970fcb1207df2b7f947a74c990f5be2e02bc9aaf9671098547bc82'
  replicas: 5
  test: false
  selector:
    app: infinispan-ru-1
    deploymentconfig: infinispan-ru-1
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: infinispan-ru-1
        deploymentconfig: infinispan-ru-1
      annotations:
        openshift.io/container.infinispan-ru-1.image.entrypoint: '["/bin/sh","-c","/opt/jboss/infinispan-server/bin/standalone.sh -c cloud.xml -Djboss.default.jgroups.stack=kubernetes \t-b `hostname -i` \t-bmanagement `hostname -i`   --debug"]'
        openshift.io/generated-by: OpenShiftNewApp
    spec:
      containers:
        -
          name: infinispan-ru-1
          image: 'slaskawi/infinispan-ru-1 at sha256:6d2de3cad2970fcb1207df2b7f947a74c990f5be2e02bc9aaf9671098547bc82'
          ports:
            -
              containerPort: 8080
              protocol: TCP
            -
              containerPort: 8888
              protocol: TCP
            -
              containerPort: 8181
              protocol: TCP
            -
              containerPort: 9990
              protocol: TCP
            -
              containerPort: 11211
              protocol: TCP
            -
              containerPort: 11222
              protocol: TCP
          env:
            -
              name: OPENSHIFT_KUBE_PING_NAMESPACE
              value: myproject
          resources:
          livenessProbe:
            exec:
              command: [/opt/jboss/infinispan-server/bin/is_ready.sh]
            initialDelaySeconds: 60
            timeoutSeconds: 180
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          readinessProbe:
            exec:
              command: [/opt/jboss/infinispan-server/bin/is_healthy.sh]
            initialDelaySeconds: 60
            timeoutSeconds: 180
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          imagePullPolicy: Always
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      securityContext:
status:
  latestVersion: 18
  observedGeneration: 28
  replicas: 5
  updatedReplicas: 5
  availableReplicas: 5
  details:
    causes:
      -
        type: ConfigChange
{code}
Key features:
** Use proper configuration for liveness and readiness probs
** Use {{maxUnavailable: 0%}} and {{maxSurge: 25%}} which means that OpenShift will first create some new nodes, wait for rebalance and then will start destroying existing
# Redeploy (update config, image whatever) the application:
{code}
oc deploy infinispan-ru-1 --latest -n myproject
{code}
# Check if the number of entries is the same at the end of the procedure.
# Observations:
* It takes some time for the node to properly join the cluster. Readiness probe should probably pass more than once in production configuration
* Even though the readiness probe passes - it doesn't necessarly mean that the joined the cluster. During the testing I once had a slit brain (4 nodes vs 1 node). This is a very dangereus situation. A readiness and health check should always validate if number of nodes in the cluster is ok.
* The nodes are currently not killed properly (they should always perform a graceful shutdown


was (Author: sebastian.laskawiec):
The Rolling update for Kubernetes and OpenShift looks the following:
# Create a new app for infinispan (I'm using my own image with additional health and readiness checks) - {{slaskawi/infinispan-ru-1}}:
{code}
/opt/jboss/infinispan-server/bin/is_ready.sh
#!/bin/bash
for i in `seq 1 10`;
do
  sleep 1s
  /opt/jboss/infinispan-server/bin/ispn-cli.sh -c --controller=$(hostname -i):9990 '/subsystem=datagrid-infinispan/cache-container=clustered/distributed-cache=*:read-attribute(name=cache-rebalancing-status)' | awk '/result/{gsub("\"", "", $3); print $3}' | awk '{if(NR>1)print}' | grep -v 'PENDING\|IN_PROGRESS\|SUSPENDED'
  if [ $? -eq 0 ]; then
    exit 0
  fi
done
exit 1
{code}
{code}
/opt/jboss/infinispan-server/bin/is_healthy.sh
#!/bin/bash
for i in `seq 1 10`;
do
  sleep 1s
  /opt/jboss/infinispan-server/bin/ispn-cli.sh -c --controller=$(hostname -i):9990 '/:read-attribute(name=server-state)' | awk '/result/{gsub("\"", "", $3); print $3}' | grep running
  if [ $? -eq 0 ]; then
    exit 0
  fi
done
exit 1
{code}
Since the rebalance status might vary from run to run (imagine a node joining the cluster), there are two ways to deal with it - either use wait as I did or set {{successThreshold}} to a number larger than 1 in the deployment configuration.
# Update the deployment configuration:
{code}
apiVersion: v1
kind: DeploymentConfig
metadata:
  name: infinispan-ru-1
  namespace: myproject
  selfLink: /oapi/v1/namespaces/myproject/deploymentconfigs/infinispan-ru-1
  uid: 6def5411-53e2-11e6-97aa-54ee751d46e3
  resourceVersion: '6570'
  generation: 28
  creationTimestamp: '2016-07-27T10:11:05Z'
  labels:
    app: infinispan-ru-1
  annotations:
    openshift.io/deployment.instantiated: 'true'
    openshift.io/generated-by: OpenShiftNewApp
spec:
  strategy:
    type: Rolling
    rollingParams:
      updatePeriodSeconds: 1
      intervalSeconds: 1
      timeoutSeconds: 600
      maxUnavailable: 0%
      maxSurge: 25%
    resources:
  triggers:
    -
      type: ConfigChange
    -
      type: ImageChange
      imageChangeParams:
        automatic: true
        containerNames:
          - infinispan-ru-1
        from:
          kind: ImageStreamTag
          namespace: myproject
          name: 'infinispan-ru-1:latest'
        lastTriggeredImage: 'slaskawi/infinispan-ru-1 at sha256:6d2de3cad2970fcb1207df2b7f947a74c990f5be2e02bc9aaf9671098547bc82'
  replicas: 5
  test: false
  selector:
    app: infinispan-ru-1
    deploymentconfig: infinispan-ru-1
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: infinispan-ru-1
        deploymentconfig: infinispan-ru-1
      annotations:
        openshift.io/container.infinispan-ru-1.image.entrypoint: '["/bin/sh","-c","/opt/jboss/infinispan-server/bin/standalone.sh -c cloud.xml -Djboss.default.jgroups.stack=kubernetes \t-b `hostname -i` \t-bmanagement `hostname -i`   --debug"]'
        openshift.io/generated-by: OpenShiftNewApp
    spec:
      containers:
        -
          name: infinispan-ru-1
          image: 'slaskawi/infinispan-ru-1 at sha256:6d2de3cad2970fcb1207df2b7f947a74c990f5be2e02bc9aaf9671098547bc82'
          ports:
            -
              containerPort: 8080
              protocol: TCP
            -
              containerPort: 8888
              protocol: TCP
            -
              containerPort: 8181
              protocol: TCP
            -
              containerPort: 9990
              protocol: TCP
            -
              containerPort: 11211
              protocol: TCP
            -
              containerPort: 11222
              protocol: TCP
          env:
            -
              name: OPENSHIFT_KUBE_PING_NAMESPACE
              value: myproject
          resources:
          livenessProbe:
            exec:
              command: [/opt/jboss/infinispan-server/bin/is_ready.sh]
            initialDelaySeconds: 60
            timeoutSeconds: 180
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          readinessProbe:
            exec:
              command: [/opt/jboss/infinispan-server/bin/is_healthy.sh]
            initialDelaySeconds: 60
            timeoutSeconds: 180
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          imagePullPolicy: Always
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      securityContext:
status:
  latestVersion: 18
  observedGeneration: 28
  replicas: 5
  updatedReplicas: 5
  availableReplicas: 5
  details:
    causes:
      -
        type: ConfigChange
{code}
Key features:
* Use proper configuration for liveness and readiness probs
* Use {{maxUnavailable: 0%}} and {{maxSurge: 25%}} which means that OpenShift will first create some new nodes, wait for rebalance and then will start destroying existing
# Redeploy (update config, image whatever) the application:
{code}
oc deploy infinispan-ru-1 --latest -n myproject
{code}
# Check if the number of entries is the same at the end of the procedure.
# Observations:
* It takes some time for the node to properly join the cluster. Readiness probe should probably pass more than once in production configuration
* Even though the readiness probe passes - it doesn't necessarly mean that the joined the cluster. During the testing I once had a slit brain (4 nodes vs 1 node). This is a very dangereus situation. A readiness and health check should always validate if number of nodes in the cluster is ok.
* The nodes are currently not killed properly (they should always perform a graceful shutdown

> Implement Rolling Upgrades with Kubernetes
> ------------------------------------------
>
>                 Key: ISPN-6673
>                 URL: https://issues.jboss.org/browse/ISPN-6673
>             Project: Infinispan
>          Issue Type: Feature Request
>          Components: Cloud Integrations
>            Reporter: Sebastian Łaskawiec
>            Assignee: Sebastian Łaskawiec
>
> There are 2 mechanisms which seems to do the same but are totally different:
> * [Kubernetes Rolling Update|http://kubernetes.io/docs/user-guide/rolling-updates/] - replaces Pods in controllable fashon
> * [Infinispan Rolling Updgrate|http://infinispan.org/docs/stable/user_guide/user_guide.html#_Rolling_chapter] - a procedure for upgrading Infinispan or changing the configuration
> Kubernetes Rolling Updates can be used very easily for changing the configuration however if changes are not runtime-compatible, one might loss data. Potential way to avoid this is to use a Cache Store. All other changes must be propagated using Infinispan Rolling Upgrade procedure.


--
This message was sent by Atlassian JIRA
(v6.4.11#64026)