Release Notes#
v1.2.1#
Release Highlights#
Prometheus Service Monitor
Easy integration with Prometheus Operator
K8s Toleration and Selector
Added capability to add tolerations and nodeSelector during helm install
Custom Prefix for Exporter
Adds more flexibility to add custome prefix to better identify AMD GPU on multi cluster deployment, through configmap
CommonConfig
Platform Support#
ROCm 6.3.x
v1.2.0#
Release Highlights#
GPU Health Monitoring
Real-time health checks via metrics exporter
With Kubernetes Device Plugin for automatic removal of unhealthy GPUs from compute node schedulable resources
Customizable health thresholds via K8s ConfigMaps
Platform Support#
ROCm 6.3.x
v1.1.0#
Platform Support#
ROCm 6.3.x
v1.0.0#
Release Highlights#
GPU Metrics Exporter for Prometheus
Real-time metrics exporter for GPU MI platforms.
Platform Support#
ROCm 6.2.x