Prometheus Integration with Metrics Exporter for Openshift environments#
The AMD GPU Operator integrates with Prometheus to enable monitoring of GPU metrics across the Kubernetes cluster. Metrics exposed by the Device Metrics Exporter can be automatically discovered and scraped by Prometheus through the creation of a ServiceMonitor resource.
Prometheus integration is managed via the ServiceMonitor configuration in the DeviceConfig Custom Resource (CR). When enabled, the operator automatically creates a ServiceMonitor tailored to the metrics exported by the Device Metrics Exporter. The integration supports various authentication and authorization methods, including Bearer Tokens and mutual TLS (mTLS), providing flexibility to accommodate different security requirements.
Openshift has its own integrated Prometheus instances which we will utilize instead of a separate operator that vanilla k8s environments would utilize. Additionally, Openshift natively supports Perses for dashboards instead of grafana which is supported with our vanilla k8s deployment guide.
Prerequisites#
Before enabling Prometheus integration, ensure you have:
Ensure you have enabled and configured the openshift-user-workload-monitoring
Have labeled the kube-amd-gpu namespace with
openshift.io/cluster-monitoring=true
The Device Metrics Exporter enabled in your GPU Operator deployment.
Properly configured kube-rbac-proxy in the DeviceConfig CR if the exporter endpoint is protected (Optional).
The AMD GPU Operator relies on the ServiceMonitor CRD being available to inject the CRs when enabled. This CRD is installed by the Prometheus Operator.
Configuring Openshift for user workload monitoring#
Redhat has provided documentation to configure user workload monitoring on openshift; please follow the documentation here: https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html-multi/monitoring/index#user-workload-monitoring-first-steps
DeviceConfig Configuration#
To integrate Prometheus, configure the following section in the DeviceConfig CR under metricsExporter.prometheus.serviceMonitor
:
metricsExporter:
enable: true
prometheus:
serviceMonitor:
enable: true
interval: "60s" # Scrape frequency
attachMetadata:
node: true
honorLabels: false
honorTimestamps: true
labels:
release: prometheus-operator # Prometheus release label for target discovery
enable: Enable or disable Prometheus ServiceMonitor creation.
interval: Frequency at which Prometheus scrapes metrics (e.g., “30s”, “1m”). Defaulted to the interval configured in Prometheus global scope.
attachMetadata.node: Attaches node metadata to discovered targets.
honorLabels: Retain scraped metric labels over the target labels if conflicts arise.
honorTimestamps: Retain timestamps from scraped metrics.
labels: Custom labels added to the ServiceMonitor to facilitate Prometheus discovery.
Authentication and TLS Options#
The ServiceMonitor configuration supports various authentication and security methods for secure metrics collection:
metricsExporter:
prometheus:
serviceMonitor:
enable: true
bearerTokenFile: "/var/run/secrets/kubernetes.io/serviceaccount/token" # Deprecated
authorization:
credentials:
name: metrics-token
key: token
tlsConfig:
insecureSkipVerify: false
serverName: metrics-server.example.com
ca:
configMap:
name: server-ca
key: ca.crt
cert:
secret:
name: prometheus-client-cert
key: tls.crt
keySecret:
name: prometheus-client-cert
key: tls.key
bearerTokenFile: (Deprecated) Path to a file containing the bearer token for authentication. Retained for legacy use case. Use authorization block instead to pass tokens.
authorization: Configures token-based authorization. Reference to the token stored in a Kubernetes Secret
tlsConfig: Configures TLS for secure connections:
insecureSkipVerify: When true, skips certificate verification (not recommended for production)
serverName: Server name used for certificate validation
ca: ConfigMap containing the CA certificate for server verification
cert: Secret containing the client certificate for mTLS
keySecret: Secret containing the client key for mTLS
caFile/certFile/keyFile: File equivalents for certificates/keys mounted in Prometheus pod.
These options allow secure metrics collection from AMD Device Metrics Exporter endpoints that are protected by the kube-rbac-proxy sidecar for authentication/authorization.
Accessing Metrics with Openshift integrated Prometheus#
Upon applying the DeviceConfig with the correct settings, the GPU Operator automatically:
Deploys the ServiceMonitor resource in the GPU Operator namespace.
Sets the required labels and namespace selectors in ServiceMonitor CR for Prometheus discovery.
After the ServiceMonitor is deployed, Prometheus automatically begins scraping metrics. Verify the integration by accessing the Openshift UI and navigating to the “Targets” page under the Observe Tab on the left-hand side of the Openshift UI. Your Device Metrics Exporter should appear as a healthy target which will be denoted under the status column as ‘Up’ with a green checkmark.
To access specific metrics, you can perform a query under the Metrics page under the Observe tab.
Using with device-metrics-exporter with Perses based integrated Openshift Dashboards#
TODO
The pod
Label Conflict#
When Prometheus scrapes targets defined by a ServiceMonitor
, it automatically attaches labels to the metrics based on the target’s metadata. One such label is pod
, which identifies the Pod being scraped (in this case, the metrics exporter Pod itself).
This creates a conflict:
Exporter Metric Label:
pod="<workload-pod-name>"
(Indicates the actual GPU user)Prometheus Target Label:
pod="<metrics-exporter-pod-name>"
(Indicates the source of the metric)
Solution 1: honorLabels: true
(Default)#
To ensure the Grafana dashboards function correctly by using the workload pod name, the ServiceMonitor
created by the GPU Operator needs to prioritize the labels coming directly from the metrics exporter over the labels added by Prometheus during the scrape.
This is achieved by setting honorLabels: true
in the ServiceMonitor
configuration within the DeviceConfig
. This is the default setting in the GPU Operator.
# Example DeviceConfig snippet
spec:
metricsExporter:
prometheus:
serviceMonitor:
enable: true
# honorLabels defaults to true, ensuring exporter's 'pod' label is kept
# honorLabels: true
# ... other ServiceMonitor settings
Important: For this to work, the device-metrics-exporter
must actually be exporting the pod
label, which typically only happens when a workload is actively using the GPU on that node. If no workload is present, the pod
label might be missing from the metric, and the dashboards might not display data as expected for that specific GPU/node.
Solution 2: Relabeling#
An alternative approach is to use Prometheus relabeling rules within the ServiceMonitor
definition. This allows you to explicitly handle the conflicting pod
label added by Prometheus.
You can rename the Prometheus-added pod
label (identifying the exporter pod) to something else (e.g., exporter_pod
) and then drop the original pod
label added by Prometheus. This prevents the conflict and ensures the pod
label from the exporter (identifying the workload) is the only one present on the final ingested metric.
Add the following relabelings
to your ServiceMonitor
configuration in the DeviceConfig
:
# Example DeviceConfig snippet
spec:
metricsExporter:
prometheus:
serviceMonitor:
enable: true
honorLabels: false # Must be false if using relabeling to preserve exporter_pod
relabelings:
# Rename the Prometheus-added 'pod' label to 'exporter_pod'
- sourceLabels: [pod]
targetLabel: exporter_pod
action: replace
regex: (.*)
replacement: $1
# Drop the Prometheus-added 'pod' label to avoid conflict
- action: labeldrop
regex: pod
# ... other ServiceMonitor settings
This method explicitly resolves the conflict by manipulating the labels before ingestion, ensuring the pod
label always refers to the workload pod as intended by the device-metrics-exporter
.
Conclusion#
The AMD GPU Operator provides native support for Prometheus integration, simplifying GPU monitoring and alerting within Kubernetes clusters. By configuring the DeviceConfig CR, you can manage GPU metrics collection tailored to your requirements and preferences. For more detailed configuration guidance, refer to the example scenarios README.