Driver Upgrade Guide#
This guide walks through the process of upgrading AMD GPU drivers on worker nodes.
Overview#
The upgrade process involves:
- Verifying current installation 
- Updating the driver version 
- Managing workloads 
- Updating node labels 
- Performing the upgrade 
Step-by-Step Upgrade Process#
1. Check Current Driver Version#
Verify the existing driver version label on your worker nodes:
kubectl get node <worker-node> -o yaml
Look for the label in this format:
kmm.node.kubernetes.io/version-module.<deviceconfig-namespace>.<deviceconfig-name>=<version>
Example:
kmm.node.kubernetes.io/version-module.kube-amd-gpu.test-device-config=6.1.3
2. Update DeviceConfig#
Update the driversVersion field in your DeviceConfig:
kubectl edit deviceconfigs <config-name> -n kube-amd-gpu
The operator will automatically:
- Look for the new driver image in the registry 
- Build the image if it doesn’t exist 
- Push the built image to your specified registry 
Image Tag Format#
The operator uses specific tag formats based on the OS:
| OS | Tag Format | Example | 
|---|---|---|
| Ubuntu | 
 | 
 | 
Warning: If a node’s ready status changes during upgrade (Ready → NotReady → Ready) before its driver version label is updated, the old driver won’t be reinstalled. Complete the upgrade steps for these nodes to install the new driver.
3. Stop Workloads#
Stop all workloads using the AMD GPU driver on the target node before proceeding.
4. Update Node Labels#
You have two options for updating node labels:
Option A: Direct Update (Recommended)#
If no additional maintenance is needed, directly update the version label:
# Old label format:
kmm.node.kubernetes.io/version-module.<namespace>.<config-name>=<old-version>
# New label format:
kmm.node.kubernetes.io/version-module.<namespace>.<config-name>=<new-version>
Option B: Remove and Add (If maintenance is needed)#
- Remove old version label: 
kubectl label node <worker-node> \
  kmm.node.kubernetes.io/version-module.<namespace>.<config-name>-
- Perform required maintenance 
- Add new version label: 
kubectl label node <worker-node> \
 kmm.node.kubernetes.io/version-module.<namespace>.<config-name>=<new-version>
5. Restart Workloads#
After the new driver is installed successfully, restart your GPU workloads on the upgraded node.
Verification#
To verify the upgrade, check node labels:
kubectl get node <worker-node> --show-labels | grep kmm.node.kubernetes.io
- Verify driver version: 
kubectl get deviceconfigs <config-name> -n kube-amd-gpu -o yaml
- Check driver status: 
kubectl get deviceconfigs <config-name> -n kube-amd-gpu -o jsonpath='{.status}'