Air-gapped Installation Guide for Openshift Environments#
This guide explains how to install the AMD GPU Operator in an air-gapped environment where the Openshift cluster has no external network connectivity. This procedure assumes that the system has internet access during the image creation and mirroring process. We are using the OpenShift internal repository for convenience, but the procedure should be similar for external repositories like quay and docker; however, the process as a whole may differ. Currently we only support GPU operator installation in air-gapped environment with a pre-compiled driver. To build(pre-compile) driver one of the system (it can be in staging environment) should have internet access during image creation and mirroring process.
Prerequisites#
OpenShift 4.16+
Internal repository is configured, see https://instinct.docs.amd.com/projects/gpu-operator/en/latest/installation/openshift-olm.html#configure-internal-registry for details.
Internet Access during operator install, driver compilation and image import processes.
NFD, KMM and GPU Operator installed via OperatorHub
Required Images#
The following images must be mirrored to your internal registry, see section 2.A in this document for details.
rocm/k8s-device-plugin:rhubi-latest
rocm/k8s-node-labeller:rhubi-latest
Installation Steps#
1. Build precompiled driver image#
Since this image is built in situ this procedure will differ from the images for the various GPU Operator components such as the labeler and device-plugin
A. Use basic DeviceConfig Custom Resource (CR), this will trigger a build when created and put the precompiled driver in the default imagestream location (image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/amdgpu_kmod)
apiVersion: amd.com/v1alpha1
kind: DeviceConfig
metadata:
name: devconf
namespace: kube-amd-gpu
spec:
driver:
enable: true
version: "6.4.1"
devicePlugin:
devicePluginImage: rocm/k8s-device-plugin:rhubi-latest
nodeLabellerImage: rocm/k8s-device-plugin:labeller-rhubi-latest
selector:
feature.node.kubernetes.io/amd-gpu: "true"
B. Create the CR to trigger the build process.
$ oc create -f myDeviceConfig.y -n kube-amd-gpu
deviceconfig.amd.com/devconf created
C. Observe the build process complete.
$ oc get pods -n kube-amd-gpu | grep build
devconf-build-trzb6-build 1/1 Running 0 12s
# observe build using oc log command
$ oc logs devconf-build-trzb6-build -n kube-amd-gpu
D. Once the build is complete, verify that the precompiled image is located in the internal registry.
$ oc get is -n kube-amd-gpu
NAME IMAGE REPOSITORY TAGS UPDATED
amdgpu_kmod image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/amdgpu_kmod coreos-9.6-5.14.0-570.19.1.el9_6.x86_64-6.4.1 3 days ago
2. Import required images#
A. Import the device-labeller and device-plugin images from docker into your internal registry
oc import-image rocm/k8s-device-plugin:rhubi-latest -n kube-amd-gpu --confirm
oc import-image rocm/k8s-node-labeller:rhubi-latest -n kube-amd-gpu --confirm
B. Once imported, verify that the required images are located in the internal registry.
$ oc get is -n kube-amd-gpu
NAME IMAGE REPOSITORY TAGS UPDATED
amdgpu_kmod image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/amdgpu_kmod coreos-9.6-5.14.0-570.19.1.el9_6.x86_64-6.4.1 3 days ago
k8s-device-plugin image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/k8s-device-plugin rhubi-latest 2 hours ago
k8s-node-labeller image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/k8s-node-labeller rhubi-latest 2 hours ago
3. Deployment of DeviceConfig in disconnected environment#
A. Once all the required images and the precompiled driver are present in the internal registry we can now deploy the modified DeviceConfig. Note: the image variables are pointing to the internal registry instead the external rcom repository.
apiVersion: amd.com/v1alpha1
kind: DeviceConfig
metadata:
name: devconf
namespace: kube-amd-gpu
spec:
driver:
image: image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/amdgpu_kmod
enable: true
version: "6.4.1"
devicePlugin:
devicePluginImage: image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/k8s-device-plugin:rhubi-latest
nodeLabellerImage: image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/k8s-node-labeller:rhubi-latest
selector:
feature.node.kubernetes.io/amd-gpu: "true"