AMD GPU Operator Documentation

Contents

AMD GPU Operator Documentation#

The AMD GPU Operator simplifies the deployment and management of AMD Instinct GPU accelerators within Kubernetes clusters. This project enables seamless configuration and operation of GPU-accelerated workloads, including machine learning, Generative AI, and other GPU-intensive applications.

Features#

Automated driver installation and management
Easy deployment of the AMD GPU device plugin
Metrics collection and export
Support for both vanilla Kubernetes and OpenShift environments
Simplified GPU resource allocation for containers
Automatic worker node labeling for GPU-enabled nodes

Compatibility#

Supported Hardware#

GPUs
AMD Instinct™ MI300X	✅ Supported
AMD Instinct™ MI250	✅ Supported
AMD Instinct™ MI210	✅ Supported

OS & Platform Support Matrix#

Below is a matrix of supported Operating systems and the corresponding Kubernetes version that have been validated to work. We will continue to add more Operating Systems and future versions of Kubernetes with each release of the AMD GPU Operator and Metrics Exporter.

Operating System	Kubernetes	Red Hat OpenShift
Ubuntu 22.04 LTS	1.29—1.31
Ubuntu 24.04 LTS	1.29—1.31
Red Hat Core OS (RHCOS)		4.16—4.17

Please refer to the ROCM documentaiton for the compatability matrix for the AMD GPU DKMS driver.

Prerequisites#

Helm v3.2.0+
kubectl or oc CLI tool configured to access your cluster

Support#

For bugs and feature requests, please file an issue on our GitHub Issues page.