Prerequisites#

Before proceeding with the prerequisites in this section, ensure that the system has been properly installed and is free of visible damage. As the system administrator, refer to the manufacturer’s installation guide to verify that the system is correctly installed in a rack with sufficient cooling, is connected to the required power, and is accessible from the network.

If errors or warnings are found during setup, you might need to troubleshoot the issue by, for example, reseating auxiliary PCBs, checking internal cable connections, or monitoring the situation. For debugging support, contact your system manufacturer.

This section describes the server settings to configure before testing, including:

Note

Before changing any system settings and testing, record existing production system settings to allow the system to be returned to the original settings.

System BIOS settings#

Some server manufacturers offer tools that allow the current BIOS configuration settings to be exported to a file, modified with needed changes, and loaded back to the system. If the server manufacturer doesn’t offer such a tool, the BIOS settings will need to be reviewed and updated manually from the BIOS setup interface before booting the OS.

Refer to the recommended system BIOS settings for MI300X to ensure the system BIOS is set up correctly for maximum performance prior to validating the system with AMD EPYC™ 9004-series processors and AMI System BIOS. These settings should be set as default values in the system BIOS. Analogous settings for other non-AMI System BIOS providers could be set similarly. For systems with Intel processors, some settings might not apply or be unavailable.

Supported operating systems#

Refer to the list of Linux Supported operating systems and ensure your system is installed with one. Other distributions might be unable to run ROCm or complete the validation tests in this document.

To obtain and validate the Linux distribution information for systems with the OS already installed, refer to the ROCm installation prerequisites.

GRUB settings#

GRUB, or GNU Grand Unified Bootloader, is a boot loader and boot manager for Linux that allows the operator to select which operating system and kernel configuration to use when booting the system. MI300X-based servers require appending strings to the Linux command line and this is done in the GRUB configuration file as described in the recommended GRUB settings for MI300X. After updating GRUB and rebooting the system, it is recommended to check the GRUB configuration file before proceeding.

Operating system settings#

To ensure the system is operating at maximum performance prior to running the validations and performance tests in this document, the operator should ensure that power gating is disabled, NUMA configuration is set appropriately, and specific environment variables are exported as outlined in the Operating system settings for MI300X. For illustration purposes, this document uses Ubuntu 22.04 for commands and output unless otherwise specified.

Updating system firmware#

Ensure that the system under test is running the latest firmware versions by contacting your system manufacturer. Systems with older firmware versions may not fully be validated, and performance or functionality could be sub-optimal.

ROCm installation#

Once the system is properly configured, ROCm software can be installed. Prior to validating the system, ensure that ROCm version 6.2 or later is installed. For maximum performance and functionality, it’s recommended to always install the latest ROCm version on the system.

Refer to ROCm installation for Linux for the available options to install ROCm on your system. For operators new to ROCm, see the Quick start installation guide for your supported distribution. Once ROCm is installed, follow the Post-installation instructions. To troubleshoot issues encountered when installing ROCm tools or libraries, see the Installation troubleshooting guide.

Run the following command to check the ROCm version running on the system after installation.

cat /opt/rocm/.info/version

Example output:

6.2.0-66

Note

Contact your system manufacturer support representative to ensure this version of ROCm installed is compatible with the system firmware.