Usage Guide#

This guide provides information on how to use the AMD GPU Operator in your Kubernetes environment.

Creating a GPU-enabled Pod#

To create a pod that uses a GPU, specify the GPU resource in your pod specification:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: gpu-container
      image: rocm/pytorch:latest
      resources:
        limits:
          amd.com/gpu: 1 # requesting 1 GPU

Save this YAML to a file (e.g., gpu-pod.yaml) and create the pod:

kubectl apply -f gpu-pod.yaml

Checking GPU Status#

To check the status of GPUs in your cluster:

kubectl get nodes -o custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'amd\.com/gpu'

Using amd-smi#

To run amd-smi in a pod:

  • Create a YAML file named amd-smi.yaml:

apiVersion: v1
kind: Pod
metadata:
 name: amd-smi
spec:
 containers:
 - image: docker.io/rocm/pytorch:latest
   name: amd-smi
   command: ["/bin/bash"]
   args: ["-c","amd-smi version && amd-smi monitor -ptum"]
   resources:
    limits:
      amd.com/gpu: 1
    requests:
      amd.com/gpu: 1
 restartPolicy: Never
  • Create the pod:

kubectl create -f amd-smi.yaml
  • Check the logs and verify the output amd-smi reflects the expected ROCm version and GPU presence:

kubectl logs amd-smi
AMDSMI Tool: 24.6.2+2b02a07 | AMDSMI Library version: 24.6.2.0 | ROCm version: 6.2.2
GPU  POWER  GPU_TEMP  MEM_TEMP  GFX_UTIL  GFX_CLOCK  MEM_UTIL  MEM_CLOCK
  0  126 W     40 °C     32 °C       1 %    182 MHz       0 %    900 MHz

Using rocminfo#

To run rocminfo in a pod:

  • Create a YAML file named rocminfo.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: rocminfo
spec:
  containers:
  - image: rocm/pytorch:latest
    name: rocminfo
    command: ["/bin/sh","-c"]
    args: ["rocminfo"]
    resources:
      limits:
        amd.com/gpu: 1
  restartPolicy: Never
  • Create the pod:

kubectl create -f rocminfo.yaml
  • Check the logs and verify the output:

kubectl logs rocminfo

Configuring GPU Resources#

Configuration parameters are documented in the Custom Resource Installation Guide