Release Notes#
v1.2.1#
Release Highlights#
Prometheus Service Monitor
Easy integration with Prometheus Operator
K8s Toleration and Selector
Added capability to add tolerations and nodeSelector during helm install
Platform Support#
ROCm 6.3.x
v1.2.0#
Release Highlights#
GPU Health Monitoring
Real-time health checks via metrics exporter
With Kubernetes Device Plugin for automatic removal of unhealthy GPUs from compute node schedulable resources
Customizable health thresholds via K8s ConfigMaps
Platform Support#
ROCm 6.3.x
v1.1.0#
Platform Support#
ROCm 6.3.x
v1.0.0#
Release Highlights#
GPU Metrics Exporter for Prometheus
Real-time metrics exporter for GPU MI platforms.
Platform Support#
ROCm 6.2.x