Release Notes#

v1.2.1#

Release Highlights#

  • Prometheus Service Monitor

    • Easy integration with Prometheus Operator

  • K8s Toleration and Selector

    • Added capability to add tolerations and nodeSelector during helm install

  • Custom Prefix for Exporter

    • Adds more flexibility to add custome prefix to better identify AMD GPU on multi cluster deployment, through configmap CommonConfig

Platform Support#

ROCm 6.3.x

v1.2.0#

Release Highlights#

  • GPU Health Monitoring

    • Real-time health checks via metrics exporter

    • With Kubernetes Device Plugin for automatic removal of unhealthy GPUs from compute node schedulable resources

    • Customizable health thresholds via K8s ConfigMaps

Platform Support#

ROCm 6.3.x

v1.1.0#

Platform Support#

ROCm 6.3.x

v1.0.0#

Release Highlights#

  • GPU Metrics Exporter for Prometheus

    • Real-time metrics exporter for GPU MI platforms.

Platform Support#

ROCm 6.2.x