Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 6 Next »

Prerequisites

  1. Access to all nodes of the cluster through one of the following methods
    - Rancher
    - SSH protocol
    - AWS Session Manager

  2. The K3s version tag you wish to upgrade to: https://github.com/k3s-io/k3s/releases

  3. The system-upgrade-controller file that will be used to upgrade the K3s cluster:
    https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.10.0/system-upgrade-controller.yaml

  4. The Bundle file for the K3s upgrade in the Air-Gap Environment

  5. Make sure you push all new docker images to the ECR gv-public docker registry that you need to install the new k3s version. See here /wiki/spaces/GS/pages/293011459

Focus/Synergy services

Updates and custom settings are automatically applied to all backend services using Fleet as long as the cluster has access to the public internet and can connect to the management server.

In case there’s no internet connection or the management server is down, the cluster agent will keep trying to reach the management server until a connection can be established.

Upgrading K3s to 1.24

  1. Log in to Rancher or one of the master nodes of the cluster to use kubectl CLI

  2. List the node name and the K3s version:

    kubectl get nodes
  3. Add the label k3s-upgrade=true to the nodes:
    Note: In the case of a multi-node cluster, each node will be updated with the label mentioned above

    kubectl label node --all k3s-upgrade=true
  4. Deploy the system-upgrade-controller :

    kubectl apply -f https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.10.0/system-upgrade-controller.yaml
  5. Create upgrade-plan.yaml file.
    Note: the key version has the version of the K3s that the cluster will be upgraded to.

    cat > upgrade-plan.yaml << EOF
    ---
    apiVersion: upgrade.cattle.io/v1
    kind: Plan
    metadata:
      name: k3s-latest
      namespace: system-upgrade
    spec:
      concurrency: 1
      version: v1.24.9+k3s2
      nodeSelector:
        matchExpressions:
          - {key: k3s-upgrade, operator: Exists}
      serviceAccountName: system-upgrade
      upgrade:
        image: docker.io/rancher/k3s-upgrade
    EOF
  6. Run the upgrade plan.
    The upgrade controller should watch for this plan and execute the upgrade on the labeled nodes

    kubectl apply -f upgrade-plan.yaml
  7. Once the plan is executed, all pods will restart and will take a few minutes to recover.
    Check the status of all the pods:

    watch kubectl get pods -A
  8. Check if the K3s version has been upgraded:

    kubectl get nodes
  9. Delete the system-upgrade-controller :

    kubectl delete -f https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.10.0/system-upgrade-controller.yaml

Demo Video

Here is the demo video that showcases the steps that need to be performed to upgrade K3s:

Screenshare - 2023-01-20 3_12_29 PM.mp4

Upgrading K3s - AirGap (Manual Approach)

  1. Take a shell session to each of the cluster nodes (VMs)

  2. Download and Extract the bundle file: tar -xf gv-platform-$VERSION.tar to all the VMs

  3. Perform the following steps in each of the VMs to Upgrade K3s:

    $ mkdir -p /var/lib/rancher/k3s/agent/images/
    $ gunzip -c assets/k3s-airgap-images-amd64.tar.gz > /var/lib/rancher/k3s/agent/images/airgap-images.tar
    $ cp assets/k3s /usr/local/bin && chmod +x /usr/local/bin/k3s
  4. Restart the k3s service across each of the nodes
    Master nodes:

    $ systemctl restart  k3s.service

    Worker nodes:

    $ systemctl restart k3s-agent.service
  5. Wait for a few minutes for the pods to recover.

    watch kubectl get pods -A
  6. Check the k3s version across the nodes

    kubectl get nodes

Demo Video

Here is the demo video that showcases the steps that need to be performed to upgrade K3s in the Air Gap environment:

Screenshare - 2023-01-30 11_44_51 AM (1).mp4

Upgrading K3s to 1.26

For the Platform Team: Local Cluster K3s Upgrade

If you are upgrading K3s of the local cluster, you would need to remove the existing PodSecurityPolicy resources.

We have only one of them under the chart aws-node-termination-handler

  1. Patch the helm Chart to disable the psp resource.

    kubectl patch helmchart aws-node-termination-handler -n kube-system --type='json' -p='[{"op": "add", "path": "/spec/set/rbac.pspEnabled", "value": "false"}]'
  2. This will trigger the removal of the PSP resource

The traefik is deployed as daemonset in the local clusters. You would need to restart the daemonset instead when following the steps given in Post Upgrade Patch

  • Deploy the system-upgrade-controller :

    kubectl apply -f https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.13.1/system-upgrade-controller.yaml
  • Create the upgrade plan
    Note: the key version has the version of the K3s that the cluster will be upgraded to.

    cat > upgrade-plan-server.yaml << EOF
    ---
    # Server plan
    apiVersion: upgrade.cattle.io/v1
    kind: Plan
    metadata:
      name: server-plan
      namespace: system-upgrade
    spec:
      concurrency: 1
      cordon: true
      nodeSelector:
        matchExpressions:
        - key: node-role.kubernetes.io/control-plane
          operator: In
          values:
          - "true"
      serviceAccountName: system-upgrade
      upgrade:
        image: rancher/k3s-upgrade
      version: v1.26.10+k3s1
    EOF

    If you are also running a worker node then execute this too:

    cat > upgrade-plan-agent.yaml << EOF
    ---
    # Agent plan
    apiVersion: upgrade.cattle.io/v1
    kind: Plan
    metadata:
      name: agent-plan
      namespace: system-upgrade
    spec:
      concurrency: 1
      cordon: true
      nodeSelector:
        matchExpressions:
        - key: node-role.kubernetes.io/control-plane
          operator: DoesNotExist
      prepare:
        args:
        - prepare
        - server-plan
        image: rancher/k3s-upgrade
      serviceAccountName: system-upgrade
      upgrade:
        image: rancher/k3s-upgrade
      version: v1.26.10+k3s1
    EOF
  • Run the upgrade plan:

    kubectl apply -f upgrade-plan-server.yaml

    In the case of a Worker node execute this too:

    kubectl apply -f upgrade-plan-agent.yaml
  • Once the plan is executed, all pods will restart and take a few minutes to recover
    Check the status of all the pods:

    watch kubectl get pods -A
  • Check if the K3s version has been upgraded:

    kubectl get nodes
  • Delete the system-upgrade-controller :

    kubectl delete -f https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.13.1/system-upgrade-controller.yaml

Reference: Apply upgrade: https://docs.k3s.io/upgrades/automated#install-the-system-upgrade-controller

Post Upgrade Patch

We have seen an issue with Traefik not able to access any resources after the upgrade is implemented. Follow these steps to implement the fix

  • Run this patch to add traefik.io to the apiGroup of the ClusterRole traefik-kube-system

    kubectl patch clusterrole traefik-kube-system -n kube-system --type='json' -p='[{"op": "add", "path": "/rules/-1/apiGroups/-", "value": "traefik.io"}]'
  • Add the missing CRDs

    kubectl apply -f https://assets.master.k3s.getvisibility.com/k3s/v1.26.10+k3s1/traefik-patch.yaml
  • Restart traefik deployment

    kubectl rollout restart deployment traefik -n kube-system

If you are unable to access the Keycloak or the Product UI then it might be a cache issue. Try the Private window of the browser you are using.

Reference: https://github.com/k3s-io/k3s/issues/8755#issuecomment-1789526830

Upgrading K3s - AirGap (Manual Approach)

Follow these steps to upgrade k3s: https://getvisibility.atlassian.net/wiki/spaces/GS/pages/179699758/K3s+-+Upgrade#Upgrading-K3s---AirGap-(Manual-Approach)

Post Upgrade Patch

We have seen an issue with Traefik not able to access any resources after the upgrade is implemented. Follow these steps to implement the fix

  • Run this patch to add traefik.io to the apiGroup of the ClusterRole traefik-kube-system

    kubectl patch clusterrole traefik-kube-system -n kube-system --type='json' -p='[{"op": "add", "path": "/rules/-1/apiGroups/-", "value": "traefik.io"}]'
  • Add the missing CRDs

    kubectl apply -f assets/traefik-patch.yaml
  • Restart traefik deployment

    kubectl rollout restart deployment traefik -n kube-system

Reference: https://github.com/k3s-io/k3s/issues/8755#issuecomment-1789526830

If you are unable to access the Keycloak or the Product UI then it might be a cache issue. Try the Private window of the browser you are using.

Certificates

By default, certificates in K3s expire in 12 months. If the certificates are expired or have fewer than 90 days remaining before they expire, the certificates are rotated when K3s is restarted.

  • No labels