Developer Guide#
This guide provides information for developers who want to contribute to or modify the AMD GPU Operator.
Warning
This project is not ready yet to accept the external developers commits.
Prerequisites#
Go v1.20 (due to open issues with Go v1.21 or v1.22)
Docker
Kubernetes cluster (v1.29.0+) or OpenShift (4.16+)
kubectlorocCLI tool configured to access your cluster
Development Environment Setup#
Install Helm:
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
For alternative installation methods, refer to the Helm Official Website.
Install Helmify:
Download the released binary from the Helmify GitHub release page, unpack it, and move it to your
PATH.
Clone the repository:
git clone https://github.com/ROCm/gpu-operator.git
cd gpu-operator
(Optional) Set up a local Docker registry. If you want to build and host container images locally, you can set up a local Docker registry:
docker run -d -p 5000:5000 --name registry registry:latest
Modify the registry-related variables in the
Makefile:DOCKER_REGISTRY: Set tolocalhost:5000for local development, or your preferred registryIMAGE_NAME: Set torocm/gpu-operatorIMAGE_TAG: Set as needed (e.g.,v1.0.0orlatest)
Compile the project:
make
This will generate the basic YAML files for CRD, build controller images, build Helm charts and build OpenShift OLM bundle.
(Optional) Run specific make target:
Run
make docker/shellto build and attach to a container with build environment configuredRun
make <specific target>within the container to execute specific make target.
Build and push the AMD GPU Operator image:
make docker-build
make docker-push
Note: If you’re using a remote registry that requires authentication, ensure you’ve logged in using
docker loginbefore pushing.
Generate Helm charts:
For vanilla Kubernetes:
make helmFor OpenShift:
OPENSHIFT=1 make helm
Check
Makefilehelp message for more options:
make help
Running Tests#
Running e2e requires a Kubernetes cluster, please prepare your Kubernetes cluster ready for running the e2e tests, as well as configure the kubeconfig file at ~/.kube/config for kubectl and helm toolkits to get access to your cluster. The e2e test cases will deploy the Operator to your cluster and run the test cases.
To run the e2e tests:
make e2e
To run e2e tests with a specific Helm chart:
make e2e GPU_OPERATOR_CHART="path to helm chart"
To run e2e test only:
make -C tests/e2e # run e2e tests only
GPU Operator E2E Tests#
The tests/k8s-e2e/ directory contains an e2e test suite that installs the GPU Operator via Helm and verifies metrics and health. Tests run against a live Kubernetes cluster.
Prerequisites#
A running Kubernetes cluster with at least one AMD GPU node
kubectlconfigured (~/.kube/configor a custom kubeconfig)Docker (to build the test runner image)
Test runner image#
docker build -t gpu-op-k8s-e2e:latest -f tests/k8s-e2e/Dockerfile.e2e tests/k8s-e2e/
Running tests#
Full install + verify + teardown#
Pass the helm chart as a local directory path (the helm-charts-k8s/ directory in the repository root) or an OCI/repo reference if publishing to a registry:
docker run --rm \
-v /path/to/kubeconfig:/kubeconfig:ro \
-v /path/to/gpu-operator/helm-charts-k8s:/helm-charts:ro \
gpu-op-k8s-e2e:latest \
-kubeconfig /kubeconfig \
-operatorchart /helm-charts \
-operatortag v1.5.0 \
-test.timeout 60m
Verify only (pre-deployed cluster)#
docker run --rm -v /path/to/kubeconfig:/kubeconfig:ro \
gpu-op-k8s-e2e:latest \
-kubeconfig /kubeconfig -existing \
-check.f 'TestOp010|TestOp020|TestOp030|TestOp040|TestOp050|TestOp060|TestOp065|TestOp070' \
-test.timeout 30m
Using make#
# Full install+verify+teardown
make -C tests/k8s-e2e all KUBECONFIG=/path/to/kubeconfig OPERATOR_TAG=v1.5.0
# Verify only (pre-deployed)
make -C tests/k8s-e2e verify KUBECONFIG=/path/to/kubeconfig
Common flags#
Flag |
Default |
Description |
|---|---|---|
|
|
Path to kubeconfig |
|
OCI registry chart |
GPU Operator helm chart (OCI ref or local path) |
|
|
GPU Operator chart version |
|
|
Kubernetes namespace |
|
|
Skip install/teardown — verify only against pre-deployed cluster |
|
|
Skip teardown after tests (leave operator installed) |
|
(none) |
Extra helm |
|
(all) |
Regex filter for test names (gocheck syntax) |
|
|
Overall test timeout |
Creating a Pull Request#
Fork the repository on GitHub.
Create a new branch for your changes.
Make your changes and commit them with clear, descriptive commit messages.
Push your changes to your fork.
Create a pull request against the main repository.
Please ensure your code follows our coding standards and includes appropriate tests.
Build Documentation Website Locally#
Download mkdocs utilities
python3 -m pip install mkdocs
Build the website
cd docs
python3 -m mkdocs build
Deploy the website
python3 -m mkdocs serve --dev-addr localhost:2345
The local docs website will dynmically update as changes are made to markdown docs.