AMD GPU Operator Documentation#

The AMD GPU Operator simplifies the deployment and management of AMD Instinct GPU accelerators within Kubernetes clusters. This project enables seamless configuration and operation of GPU-accelerated workloads, including machine learning, Generative AI, and other GPU-intensive applications.

Features#

  • Automated driver installation and management

  • Easy deployment of the AMD GPU device plugin

  • Metrics collection and export

  • Support for both vanilla Kubernetes and OpenShift environments

  • Simplified GPU resource allocation for containers

  • Automatic worker node labeling for GPU-enabled nodes

Compatibility#

Supported Hardware#

GPUs

AMD Instinct™ MI300X

✅ Supported

AMD Instinct™ MI250

✅ Supported

AMD Instinct™ MI210

✅ Supported

OS & Platform Support Matrix#

Below is a matrix of supported Operating systems and the corresponding Kubernetes version that have been validated to work. We will continue to add more Operating Systems and future versions of Kubernetes with each release of the AMD GPU Operator and Metrics Exporter.

Operating System Kubernetes Red Hat OpenShift
Ubuntu 22.04 LTS 1.29—1.31
Ubuntu 24.04 LTS 1.29—1.31
Red Hat Core OS (RHCOS) 4.16—4.17

Please refer to the ROCM documentaiton for the compatability matrix for the AMD GPU DKMS driver.

Prerequisites#

  • Helm v3.2.0+

  • kubectl or oc CLI tool configured to access your cluster

Support#

For bugs and feature requests, please file an issue on our GitHub Issues page.