Support for Container Device Interface#
Overview#
The Container Device Interface (CDI) is a standardized specification for exposing specialized hardware devices, such as AMD GPUs, to containers in a runtime-agnostic manner. This works consistently across different container runtimes.
CDI eliminates the need for runtime-specific hooks or shims, like amd-container-runtime, by allowing container runtimes to natively understand and inject device resources into containers.
The amd-ctk tool provides commands to generate and manage CDI specifications for AMD GPU devices on your system.
Prerequisites#
Before using CDI with AMD GPUs, ensure:
AMD GPU drivers are properly installed on the host system
The
amd-ctktool is installedYour container runtime supports CDI
Generating CDI Specifications#
To generate a CDI specification for AMD GPUs on your system, run:
sudo amd-ctk cdi generate
This command:
Scans the system for available AMD GPU devices
Creates a CDI specification file at
/etc/cdi/amd.jsonDefines device nodes, mount points, and environment variables needed for each GPU
Custom Output Location
To generate the specification in a different location, use the --output flag:
amd-ctk cdi generate --output /path/to/custom/amd.json
Validating CDI Specifications#
To verify that your CDI specification matches the actual GPU hardware on the system, run:
sudo amd-ctk cdi validate
This command:
Reads the CDI specification from
/etc/cdi/amd.jsonScans the system for available AMD GPU devices
Verifies that the devices defined in the specification accurately reflect the hardware present on the host
Custom Specification Path
To validate a specification at a different location, use the --path flag:
amd-ctk cdi validate --path /path/to/custom/amd.json
Note
The amd-ctk tool requires appropriate permissions to read and write CDI specification files. When operating on the default location (/etc/cdi), it requires elevated privileges, hence sudo is typically needed.
If you want to operate on a different user-owned location (using the --output or --path flags for generation or validation respectively), sudo can be omitted, provided the user has necessary read/write permissions for that location.
When using a custom output location, ensure your container runtime is configured to read CDI specifications from that directory. Most runtimes default to /etc/cdi and /var/run/cdi.
Important
Regenerate the CDI specification whenever you:
Add or remove GPU devices
Modify GPU partitioning or configuration
Troubleshooting#
Containers Cannot Access GPUs#
If containers do not see the expected GPU devices:
Validate the specification:
sudo amd-ctk cdi validate
If the validation fails, it indicates a mismatch between the CDI specification and the actual hardware. You may need to regenerate the specification in such cases.
Verify runtime configuration:
Ensure your container runtime is configured to read CDI specifications from the directory containing
amd.json. Check the runtime’s CDI configuration settings.Check file permissions:
ls -l /etc/cdi/amd.json
The file should be readable by the container runtime process. If you’re using a custom location, ensure the permissions allow the runtime to access it.
Regenerate if hardware changed:
If you’ve added, removed, or reconfigured GPUs, regenerate the specification:
sudo amd-ctk cdi generate
Verify device names:
Ensure you’re using the correct CDI device names (e.g.,
amd.com/gpu=0) while requesting devices.
Validation Errors#
If amd-ctk cdi validate reports errors:
Check that GPU devices are properly detected by the system (verify with
rocm-smi,amd-smior similar tools)Ensure GPU drivers are correctly installed
Regenerate the specification to reflect the current system state