Upgrading an AWS EKS cluster from Kubernetes 1.30 to 1.32 while maintaining Karpenter compatibility requires careful planning and execution. In this comprehensive guide, I'll walk you through the complete upgrade process based on real-world experience, highlighting critical pitfalls and best practices for achieving zero-downtime deployments.
Overview
This upgrade journey involves multiple components:
- EKS cluster upgrade (1.30 → 1.31 → 1.32)
- Karpenter upgrade (v0.37.6 → v1.0.9 → v1.2.3)
- Node group management and application migration
- EBS CSI driver configuration
Total Duration: Approximately 3–4 hours including validation and testing.
Pre-Upgrade Checklist
Before starting the upgrade process, ensure you have:
- Administrative access to your AWS account
kubectl
,helm
, andeksctl
properly configured- Backup of critical workloads and configurations
- Maintenance window scheduled
- ArgoCD or deployment tool access for application management
Step-by-Step Upgrade Process
1. Upgrade Base Node Group (Duration: ~15 minutes)
Start by upgrading your existing base node group through the AWS Console:
- Navigate to EKS → Clusters → Your Cluster → Compute → Node Groups
- Select your base node group
- Click "Update" and select the latest AMI version
- Monitor the rolling update process
Best Practice: Always upgrade node groups before the control plane to avoid compatibility issues.
2. Upgrade EKS Add-ons
Update all EKS add-ons to their latest compatible versions:
- CoreDNS
- kube-proxy
- VPC CNI
- AWS Load Balancer Controller (if installed)
Navigate to EKS → Clusters → Your Cluster → Add-ons and update each component individually.
3. Upgrade EKS Cluster to v1.31 (Duration: ~15 minutes)
Perform the first cluster upgrade via AWS Console:
- Navigate to EKS → Clusters → Your Cluster
- Click "Update cluster version"
- Select Kubernetes version 1.31
- Monitor the upgrade progress
Critical Note: EKS upgrades can only increment by one minor version at a time.
4. Re-upgrade Base Node Group (Duration: ~15 minutes)
After the control plane upgrade, update the node group again to match the cluster version:
- Return to the Node Groups section
- Update the base node group to use 1.31-compatible AMIs
- Wait for the rolling update to complete
5. Create Temporary Node Group
Create a new managed node group to serve as a temporary landing zone during Karpenter upgrades:
# Create temporary node group with similar specifications to your Karpenter nodes
aws eks create-nodegroup \
--cluster-name <your-cluster> \
--nodegroup-name temp-upgrade-nodes \
--instance-types m5.large,m5.xlarge \
--node-role <your-node-role-arn> \
--subnets <subnet-ids> \
--scaling-config minSize=1,maxSize=10,desiredSize=3
6. Migrate ArgoCD or Deployment to Temporary Nodes
Temporarily move ArgoCD workloads to the new node group:
# Update ArgoCD deployment
spec:
template:
spec:
nodeSelector:
eks.amazonaws.com/nodegroup: "temp-upgrade-nodes"
tolerations:
- key: "temporary-upgrade"
value: "true"
effect: "NoSchedule"
Important: Disable auto-sync in ArgoCD during this process to prevent configuration conflicts.
7. Upgrade Karpenter: v0.37.6 → v1.0.9
This is the most critical step. Karpenter v1.0 introduces breaking changes requiring careful migration.
Environment Setup
export AWS_PARTITION="aws"
export CLUSTER_NAME="<your-cluster-name>"
export AWS_REGION="<your-region>"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export KARPENTER_NAMESPACE="karpenter"
export KARPENTER_IAM_ROLE_ARN="arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/${CLUSTER_NAME}-karpenter"
export KARPENTER_VERSION="1.0.9"
Update IAM Policies
Create and attach the new Karpenter v1 IAM policy:
# Create temporary v1 policy
POLICY_DOCUMENT=$(mktemp)
curl -fsSL https://raw.githubusercontent.com/aws/karpenter-provider-aws/13d6fc014ea59019b1c3b1953184efc41809df11/website/content/en/v1.0/upgrading/get-controller-policy.sh | sh | envsubst > ${POLICY_DOCUMENT}
POLICY_NAME="KarpenterControllerPolicy-${CLUSTER_NAME}-v1"
ROLE_NAME="${CLUSTER_NAME}-karpenter"
POLICY_ARN="$(aws iam create-policy --policy-name "${POLICY_NAME}" --policy-document "file://${POLICY_DOCUMENT}" | jq -r .Policy.Arn)"
aws iam attach-role-policy --role-name "${ROLE_NAME}" --policy-arn "${POLICY_ARN}"
Install v1 CRDs
helm upgrade --install karpenter-crd oci://public.ecr.aws/karpenter/karpenter-crd \
--version "${KARPENTER_VERSION}" \
--namespace "${KARPENTER_NAMESPACE}" \
--create-namespace \
--set webhook.enabled=true \
--set webhook.serviceName="karpenter" \
--set webhook.port=8443
Upgrade Karpenter Controller
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
--version ${KARPENTER_VERSION} \
--namespace "${KARPENTER_NAMESPACE}" \
--create-namespace \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \
--set settings.clusterName=${CLUSTER_NAME} \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi \
--wait
8. Migrate to New Karpenter v1 API
Delete old Provisioners and create new NodePool
and EC2NodeClass
resources:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default-nodepool
spec:
template:
spec:
taints:
- key: karpenter-managed
value: "true"
effect: NoSchedule
requirements:
- key: topology.kubernetes.io/zone
operator: In
values: ["us-west-2a", "us-west-2b", "us-west-2c"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand", "spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "t", "m", "r"]
- key: karpenter.k8s.aws/instance-cpu
operator: In
values: ["2", "4", "8", "16", "32"]
- key: karpenter.k8s.aws/instance-memory
operator: In
values: ["8192", "10752", "16384", "32768"]
nodeClassRef:
name: default-ec2nodeclass
kind: EC2NodeClass
group: karpenter.k8s.aws
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default-ec2nodeclass
spec:
amiFamily: AL2023
amiSelectorTerms:
- id: "ami-06cc3e2ef40b89309"
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "<your-cluster-name>"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "<your-cluster-name>"
role: "KarpenterNodeRole-<your-cluster>"
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100
volumeType: gp3
iops: 3000
deleteOnTermination: true
metadataOptions:
httpEndpoint: enabled
tags:
Name: "Karpenter-managed-node"
9. Update CloudFormation Stack and Clean Up
TEMPOUT=$(mktemp)
curl -fsSL https://raw.githubusercontent.com/aws/karpenter-provider-aws/v"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml > "${TEMPOUT}"
aws cloudformation deploy \
--stack-name "Karpenter-${CLUSTER_NAME}" \
--template-file "${TEMPOUT}" \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides "ClusterName=${CLUSTER_NAME}"
# Remove temporary IAM policy
ROLE_NAME="${CLUSTER_NAME}-karpenter"
POLICY_NAME="KarpenterControllerPolicy-${CLUSTER_NAME}-v1"
POLICY_ARN=$(aws iam list-policies --query "Policies[?PolicyName=='${POLICY_NAME}'].Arn" --output text)
aws iam detach-role-policy --role-name "${ROLE_NAME}" --policy-arn "${POLICY_ARN}"
aws iam delete-policy --policy-arn "${POLICY_ARN}"
10. Test Karpenter Functionality
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-test
spec:
replicas: 3
selector:
matchLabels:
app: nginx-test
template:
metadata:
labels:
app: nginx-test
spec:
tolerations:
- key: karpenter-managed
value: "true"
effect: NoSchedule
containers:
- name: nginx
image: nginx:latest
resources:
requests:
cpu: 1
memory: 1Gi
11. Upgrade EKS Cluster to v1.32 (Duration: ~15 minutes)
Perform the second cluster upgrade via AWS Console:
Navigate to EKS → Clusters → Your Cluster → Update to Kubernetes version 1.32 → Monitor the upgrade process.
12. Final Node Group Upgrade (Duration: ~15 minutes)
Update the base node group to match the 1.32 cluster version.
13. Upgrade Karpenter to v1.2.3
export KARPENTER_VERSION="1.2.3"
# Upgrade CRDs
helm upgrade --install karpenter-crd oci://public.ecr.aws/karpenter/karpenter-crd \
--version "${KARPENTER_VERSION}" \
--namespace "${KARPENTER_NAMESPACE}" \
--create-namespace \
--set webhook.enabled=true \
--set webhook.serviceName="karpenter" \
--set webhook.port=8443
# Upgrade Karpenter
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
--version ${KARPENTER_VERSION} \
--namespace "${KARPENTER_NAMESPACE}" \
--create-namespace \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \
--set settings.clusterName=${CLUSTER_NAME} \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi \
--wait
14. Migrate Applications Back to Karpenter
Move your applications from temporary nodes back to Karpenter-managed nodes:
- Update ArgoCD configurations to use Karpenter node selectors
- Migrate Jenkins to on-demand Karpenter nodes
- Clean up outdated nodeSelector and taint references in deployments
Common Issue: Some workloads may fail due to outdated node selectors referencing the temporary upgrade nodes. Clean these up manually:
kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"nodeSelector":null}}}}'
15. Clean Up Temporary Resources
aws eks delete-nodegroup \
--cluster-name <your-cluster> \
--nodegroup-name temp-upgrade-nodes
16. Final Validation
- Test your CI/CD pipelines
- Verify all services are running on Karpenter-managed nodes
- Check application functionality and performance
- Monitor logs for any errors or warnings
Critical Pitfall: Jenkins Volume Attachment Issue
During our upgrade, Jenkins encountered a persistent volume attachment error:
Error: jenkins-748597fb85-nxtrm.1841c67a5fceb9b9
FailedAttachVolume
AttachVolume.Attach failed for volume "pvc-828eb-4246-9a69-498401ed6a2e":
volume attachment is being deleted
Resolution: EBS CSI Driver Service Account
- Check existing service accounts:
eksctl get iamserviceaccount --cluster <your-cluster> --region <your-region>
- Create IAM Role for EBS CSI:
- Annotate Service Account:
- Restart EBS CSI Controller:
kubectl rollout restart deployment ebs-csi-controller -n kube-system
eksctl create iamserviceaccount \
--name ebs-csi-controller-sa \
--namespace kube-system \
--cluster <your-cluster> \
--region <your-region> \
--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--override-existing-serviceaccounts \
--approve \
--role-only \
--role-name AmazonEKS_EBS_CSI_DriverRole
export SERVICE_ACCOUNT_ROLE_ARN=$(aws iam get-role --role-name AmazonEKS_EBS_CSI_DriverRole | jq -r '.Role.Arn')
kubectl annotate serviceaccount ebs-csi-controller-sa \
-n kube-system \
eks.amazonaws.com/role-arn=${SERVICE_ACCOUNT_ROLE_ARN}
Best Practices
- Staged Approach
Always upgrade incrementally (1.30 → 1.31 → 1.32).
Test each stage thoroughly before proceeding. - Temporary Infrastructure
Create temporary node groups for critical workload migration.
This provides a safety net during Karpenter upgrades. - Application Management
Disable ArgoCD auto-sync during upgrades.
Clean up legacy node selectors and taints post-migration. - Volume Management
Ensure EBS CSI driver has proper IAM permissions.
Test persistent volume functionality after upgrades. - Monitoring and Validation
Monitor each step closely.
Validate functionality at each stage.
Keep rollback plans ready.
Common Issues
- CronJobs Issues: May need to be deleted and recreated due to node selector conflicts.
- Volume Attachment Delays: EBS CSI driver permissions are critical.
- Karpenter API Changes: v1.0 introduces breaking changes requiring complete resource recreation.
- Node Selector Cleanup: Manual cleanup required for legacy references.
Conclusion
Upgrading EKS to Kubernetes 1.32 with Karpenter compatibility requires careful orchestration but is achievable with zero downtime when following these practices. The key is taking a staged approach, maintaining temporary infrastructure for workload migration, and thoroughly testing each component.
The most critical aspects are:
- Proper Karpenter v1 migration with new API resources
- EBS CSI driver configuration for persistent volumes
- Systematic application migration with cleanup of legacy configurations
By following this guide, you can successfully upgrade your EKS infrastructure while maintaining service availability and taking advantage of the latest Kubernetes and Karpenter features.
Total Upgrade Time: 3–4 hours including validation
Downtime: Zero with proper planning and execution