Kubernetes configuration

Kubernetes configuration#

When deploying AMD Device Metrics Exporter on Kubernetes, a ConfigMap is deployed in the exporter namespace.

Configuration parameters#

  • ServerPort: this field is ignored when Device Metrics Exporter is deployed by the GPU Operator to avoid conflicts with the service node port config.

  • GPUConfig:

    • Fields: An array of strings specifying what metrics field to be exported.

    • Labels: CARD_MODEL, GPU_UUID and SERIAL_NUMBER are always set and cannot be removed. Labels supported are available in the provided example configmap.yml.

    • CustomLabels: A map of user-defined labels and their values. Users can set up to 10 custom labels. From the GPUMetricLabel list, only CLUSTER_NAME is allowed to be set in CustomLabels. Any other labels from this list cannot be set. Users can define other custom labels outside of this restriction. These labels will be exported with every metric, ensuring consistent metadata across all metrics.

Setting custom values#

To use a custom configuration when deploying the Metrics Exporter:

  1. Create a ConfigMap based on the provided example configmap.yml

  2. Change the configMap property in values.yaml to configmap.yml

  3. Run helm install:

helm repo add exporter https://rocm.github.io/device-metrics-exporter
helm repo update
helm install exporter exporter/device-metrics-exporter-charts --namespace kube-amd-gpu --create-namespace --version=v1.2.1 -f values.yaml

Device Metrics Exporter polls for configuration changes every minute, so updates take effect without container restarts.