System Administrators#

AMD SMI
The AMD System Management Interface (AMD SMI) library offers a unified tool for managing and monitoring GPUs, particularly in high-performance computing environments. It provides a user-space interface that allows applications to control GPU operations, monitor performance, and retrieve information about the system’s drivers and GPUs.

MI300X System Acceptance Tests
Test the correct functioning and optimal performance of server systems equipped with AMD Instinct MI300X GPU accelerators.

GPU Operator
The AMD GPU Operator simplifies the deployment and management of AMD Instinct GPU accelerators within Kubernetes clusters.

Device Plugin
Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster

Device Metrics Exporter
The AMD Device Metrics Exporter enables Prometheus-format metrics collection for AMD GPUs in HPC and AI environments.