AMD GPU Operator Documentation#
The AMD GPU Operator simplifies the deployment and management of AMD Instinct GPU accelerators within Kubernetes clusters. This project enables seamless configuration and operation of GPU-accelerated workloads, including machine learning, Generative AI, and other GPU-intensive applications.
Features#
- Automated driver installation and management 
- Easy deployment of the AMD GPU device plugin 
- Metrics collection and export 
- Support for Vanilla Kubernetes 
- Simplified GPU resource allocation for containers 
- Automatic worker node labeling for GPU-enabled nodes 
Compatibility#
- Kubernetes: 1.29.0 
- Please refer to the ROCm documentation for the compatibility matrix for the AMD GPU DKMS driver. 
Prerequisites#
- Helm v3.2.0+ 
- kubectlCLI tool configured to access your cluster
Quick Start#
- Add the Helm repository: 
helm repo add rocm https://rocm.github.io/gpu-operator
helm repo update
- Install the AMD GPU Operator: 
helm install amd-gpu-operator rocm/gpu-operator-charts --namespace kube-amd-gpu --create-namespace
- Verify the installation: 
kubectl get pods -n kube-amd-gpu
Support#
For bugs and feature requests, please file an issue on our GitHub Issues page.