v2.6 Upgrading EKS Cluster, Managed Nodes, and x-compute Nodes

v2.6 Upgrading EKS Cluster, Managed Nodes, and x-compute Nodes

Introduction

This document guides you through a step-by-step process of upgrading the EKS cluster, its node groups (managed nodes), and x-compute nodes (Exostellar nodes running your workloads).

For the sake of understanding, let's consider upgrading an EKS cluster from version 1.30 to 1.31.

 

Note: EKS only allows step-wise upgrade, like 1.30 -> 1.31. If you have more than one version to upgrade, say 1.29 -> 1.31, you’ll have to repeat the current process twice, i.e., 1.29 -> 1.30 and then 1.30 -> 1.31.

 

Upgrade Cluster

Step 1: Upgrade the Control Plane

The EKS cluster control plane can be upgraded from the AWS console (UI) or eksctl, if installed using it. If you wish to use either of the methods, use this guide: Update existing cluster to new Kubernetes version.

Using aws CLI, run the following command:

aws eks update-cluster-version \ --name "my-exostellar-cluster" \ --region "us-east-1" \ --kubernetes-version "1.31"
image-20250808-092914.png
Run EKS Upgrade Command

Wait until the version upgrade is successful:

The upgrade might take up to 10 minutes.

Screenshot from 2025-08-08 15-09-15.png
EKS Cluster Upgrade Successful

 

Step 2: Upgrade Managed Node Groups

After the EKS cluster's control plane is upgraded successfully, upgrade the managed nodes in the nodegroup(s).

Nodegroup(s) can be upgraded from the AWS console (UI) or eksctl, if installed using it. If you wish to use either of the methods, use this guide: Update a managed node group for your cluster.

Using aws CLI, run the following command:

aws eks update-nodegroup-version \ --cluster-name "my-exostellar-cluster" \ --nodegroup-name "my-node-group" \ --region "us-east-1" \ --kubernetes-version "1.31"
image-20250808-101211.png
Run EKS Nodegroup Upgrade Command

Wait until the node group’s version upgrade is successful:

The upgrade for node group with 3 nodes might take up to 20 minutes. Depending on the number of nodes, this might vary.

Screenshot from 2025-08-08 15-59-46.png
EKS Nodegroup Upgrade Successful

This performs a rolling update of the nodes, creating new nodes and draining old ones.

In case of multiple node groups, repeat the same for each.

 

Step 3: Upgrade the x-compute Nodes

x-compute nodes are the Exostellar nodes dynamically created by xKarpenter (Exo-Karpenter) based on your workloads' requirements.

If you don't have any workloads (i.e., no x-compute nodes) running, skip this step.

To upgrade x-compute node(s),

  1. Edit the nodeImageName field in the ExoNodeClass resource used for the selected x-compute node/workload, w.r.t. the upgraded version. Eg. For the cluster upgraded to 1.31, update as nodeImageName: "k8s-131".

    If using the default ExoNodePool default-pool created by Terraform modules, in the workload resource, edit the nodeImageName in the default ExoNodeClass default-class:

    kubectl patch exonodeclass/default-class \ --type='json' \ -p='[{"op": "replace", "path": "/spec/xcompute/nodeImageName", "value": "k8s-131"}]'

     

    image-20250808-150422.png
    Patch the nodeImageName field in ExoNodeClass default-class

 

  1. Observe the upgrade of the x-compute node(s):

    kubectl get nodes -l "eks.amazonaws.com/nodegroup=x-compute" -w

     

    Screenshot from 2025-08-08 21-14-53.png
    kubectl get no -w on x-compute node(s)

     

  2. Verify whether the workload pod is running on the new node (1.31):

    kubectl get pods -l app=nginx-1 -owide

     

    Screenshot from 2025-08-08 21-33-16.png
    kubectl get pod -o wide on workload pod



The upgrade might take up to 10 minutes, even with a single node.

After updating the nodeImageName in the ExoNodeClass, the xKarpenter starts the reconciliation in the background. But the activity on the kubectl get no -w is only visible after a few minutes. Do not make any other changes to the ExoNodeClass until it fully migrates to the new node.

For the above example, the nodeImageName is updated around 2025-08-08 21:06:24. The new node, though NotReady, is visible at 2025-08-08 21:13:13, which is 7 minutes.

xKarpenter doesn’t do a rolling update here. i.e., the workload pod is deleted in the old node (1.30) and then created in the new node (1.31). There might be a few seconds of delay.