AMD GPU Device Plugin for Kubernetes#
The AMD GPU Device Plugin for Kubernetes enables the use of AMD GPUs as schedulable resources in Kubernetes clusters. This plugin allows you to run GPU-accelerated workloads such as machine learning, scientific computing, and visualization applications on Kubernetes.
Features#
Implements the Kubernetes Device Plugin API for AMD GPUs
Exposes AMD GPUs as
amd.com/gpu
resources in KubernetesProvides automated node labeling with detailed GPU properties (device ID, VRAM, compute units, etc.)
Enables fine-grained GPU allocation for containers
System Requirements#
Kubernetes: v1.18 or higher
AMD GPUs: ROCm-capable AMD GPU hardware
GPU Drivers: AMD GPU drivers or ROCm stack installed on worker nodes
See the ROCm System Requirements for detailed hardware compatibility information.
Quick Start#
To deploy the device plugin, run it on all nodes equipped with AMD GPUs. The simplest way to do this is by creating a Kubernetes DaemonSet. A pre-built Docker image is available on DockerHub, and a predefined YAML file named k8s-ds-amdgpu-dp.yaml is included in this repository.
Create a DaemonSet in your Kubernetes cluster with the following command:
kubectl create -f k8s-ds-amdgpu-dp.yaml
Alternatively, you can pull directly from the web:
kubectl create -f https://raw.githubusercontent.com/ROCm/k8s-device-plugin/master/k8s-ds-amdgpu-dp.yaml
Deploy the Node Labeler (Optional)#
For enhanced GPU discovery and scheduling, deploy the AMD GPU Node Labeler:
kubectl create -f k8s-ds-amdgpu-labeller.yaml
This will automatically label nodes with GPU-specific information such as VRAM size, compute units, and device IDs.
Verify Installation#
After deploying the device plugin, verify that your AMD GPUs are properly recognized as schedulable resources:
# List all nodes with their AMD GPU capacity
kubectl get nodes -o custom-columns=NAME:.metadata.name,GPU:"status.capacity.amd\.com/gpu"
NAME GPU
k8s-node-01 8
Example Workload#
You can restrict workloads to a node with a GPU by adding resources.limits
to the pod definition. An example pod definition is provided in example/pod/pytorch.yaml. Create the pod by running:
kubectl create -f https://raw.githubusercontent.com/ROCm/k8s-device-plugin/master/example/pod/pytorch.yaml
Check the pod status with:
kubectl describe pods
After the pod is running, view the benchmark results with:
kubectl pytorch-gpu-pod-example
Contributing#
We welcome contributions to this project! Please refer to the Development Guidelines for details on how to get involved.