AMD Instinct MI300X#

The AMD Instinct™ MI300X is a high-performance GPU accelerator designed for AI, HPC, and demanding workloads. This document provides a comprehensive overview of MI300X-specific requirements, specifications, and acceptance testing criteria.

Overview#

The AMD Instinct MI300X is a high-performance GPU accelerator designed for AI, HPC, and demanding workloads. The MI300X platform utilizes a Universal Baseboard (UBB 2.0) configuration that hosts 8 AMD Instinct MI300X OAM (OCP Accelerator Module) accelerators with a total of 1.5TB of HBM3 memory.

Each MI300X GPU features multiple chiplet design with 8 XCDs (Accelerator Complex Dies) and 192GB of HBM3 memory per accelerator. It utilizes AMD’s CDNA 3 architecture with fully-meshed Infinity Fabric™ connectivity between accelerators.

System Requirements#

Operating System Support#

For the most up-to-date information on supported operating systems and distributions, please refer to the official ROCm documentation:

ROCm System Requirements - Supported Distributions

Note

ROCm docs is the single source of truth for supported versions, distribution compatibility, and required dependencies for the ROCm toolkit.

GPU Identification#

All MI300X GPUs should appear in lspci output:

lspci | grep MI300X

Expected output example:

05:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300X]
26:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300X]
46:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300X]
65:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300X]
85:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300X]
a6:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300X]
c6:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300X]
e5:00.0 Processing accelerators: Advanced Micro Devices[AMD/ATI] Aqua Vanjaram [Instinct MI300X]

Acceptance Criteria#

The MI300X system acceptance process is designed to validate that your system is operating correctly and meets all performance specifications. Users are expected to step through the validation guides in sequence to ensure comprehensive system verification.

System Acceptance Process#

  1. Prerequisites Validation - Ensure all system requirements and dependencies are met

  2. Basic Health Checks - Verify hardware detection and basic system health

  3. System Validation - Conduct comprehensive stress testing and qualification

  4. Performance Benchmarks - Validate compute, memory, and interconnect performance

The system is accepted when all criteria below are successfully validated:

Prerequisites Validation#

Ensure all system requirements are met before proceeding with validation. See the Prerequisites documentation and System setup for more details.

  • ✅ Supported operating system version installed

  • ✅ Compatible ROCm version installed

  • ✅ System manufacturer compatibility verified

  • ✅ All required dependencies installed

Basic Health Checks#

These checks ensure fundamental system health and proper GPU detection. For detailed procedures, see Health Checks.

Test

Command

Pass/Fail criteria

Check OS distribution

cat /etc/os-release

Pass: OS version listed in compatibility matrix
Fail: Otherwise

Check kernel boot arguments

cat /proc/cmdline

Pass: Contains pci-realloc=off, amd_iommu=on or intel_iommu=on, and iommu=pt
Fail: Otherwise

Check for driver errors

sudo dmesg -T | grep amdgpu | grep -i error

Pass: Null
Fail: Errors reported

Check available memory

lsmem | grep "Total online memory"

Pass: 1.5T or more
Fail: Less than 1.5T

Check GPU presence

lspci | grep MI300X

Pass: All 8 GPUs found
Fail: Otherwise

Check GPU link speed and width

sudo lspci -d 1002:74a1 -vvv | grep -e DevSta -e LnkSta

Pass: Speed 32GT/s, width x16, no FatalErr+
Fail: Otherwise

Monitor utilization metrics

amd-smi monitor -putm

Pass: Idle metrics as specified
Fail: Otherwise

Check system kernel logs for errors

sudo dmesg -T | grep -i 'error|warn|fail|exception'

Pass: Null
Fail: Otherwise

System Validation#

Comprehensive validation ensures system stability under load. For detailed procedures, see System Validation.

Test

Command

Pass/Fail criteria

Compute/GPU properties

rvs -c ${RVS_CONF}/gpup_single.conf

Pass: All GPUs listed with no errors
Fail: Missing GPUs or errors

GPU stress test (GST)

rvs -c ${RVS_CONF}/MI300X/gst_single.conf

Pass: met: TRUE in logs
Fail: Target GFLOP/s not met

Input energy delay product (IET)

rvs -c ${RVS_CONF}/MI300X/iet_single.conf

Pass: met: TRUE for all actions
Fail: Otherwise

Memory test (MEM)

rvs -c ${RVS_CONF}/mem.conf -l mem.txt

Pass: All tests passed; bandwidth ~2TB/s
Fail: Any test failed or low bandwidth

PCIe bandwidth benchmark (PEBB)

rvs -c ${RVS_CONF}/MI300X/pebb_single.conf

Pass: All distances and bandwidths displayed
Fail: Missing data

PCIe qualification tool (PEQT)

rvs -c ${RVS_CONF}/peqt_single.conf

Pass: All actions true
Fail: Otherwise

P2P benchmark and qualification tool (PBQT)

rvs -c ${RVS_CONF}/pbqt_single.conf

Pass: peers:true lines and non-zero throughput
Fail: Otherwise

Performance Benchmarks#

Performance validation ensures the system meets MI300X specifications. For detailed procedures, see the Performance Benchmarking.

Command: TransferBench a2a

Pass: ≥ 32.9 GB/s

Command: TransferBench p2p

Test

Pass Criteria

UniDir

≥ 33.9 GB/s

BiDir

≥ 43.9 GB/s

Command: TransferBench example.cfg

Test

Pass Criteria

Test 1

≥ 47.1 GB/s

Test 2

≥ 48.4 GB/s

Test 3

≥ 31.9 (0→1), ≥ 38.9 (1→0) GB/s

Test 4

≥ 1264 GB/s

Test 5

N/A (GPU validation)

Test 6

≥ 48.6 GB/s

Command: build/all_reduce_perf -b 8 -e 8G -f 2 -g 8

Pass: ≥ 304 GB/s

Command:
rocblas-bench -f gemm \
  -r s -m 4000 \
  --lda 4000 --ldb 4000 --ldc 4000 \
  --transposeA N --transposeB T

Pass: ≥ 94100 TFLOPS

Command:
rocblas-bench -f gemm_strided_batched_ex \
  --transposeA N --transposeB T \
  -m 1024 -n 2048 -k 512 \
  --a_type h --lda 1024 --stride_a 4096 \
  --b_type h --ldb 2048 --stride_b 4096 \
  --c_type s --ldc 1024 --stride_c 2097152 \
  --d_type s --ldd 1024 --stride_d 2097152 \
  --compute_type s \
  --alpha 1.1 --beta 1 \
  --batch_count 5

Pass: ≥ 130600 TFLOPS

Command:
rocblas-bench -f gemm_strided_batched_ex \
  --transposeA N --transposeB T \
  -m 1024 -n 2048 -k 512 \
  --a_type i8_r --lda 1024 --stride_a 4096 \
  --b_type i8_r --ldb 2048 --stride_b 4096 \
  --c_type i32_r --ldc 1024 --stride_c 2097152 \
  --d_type i32_r --ldd 1024 --stride_d 2097152 \
  --compute_type i32_r \
  --alpha 1.1 --beta 1 \
  --batch_count 5

Pass: ≥ 162700 TFLOPS

Command: mpiexec -n 8 wrapper.sh

Copy #

Threshold (MB/s)

1

≥ 4,177,285

2

≥ 4,067,069

3

≥ 3,920,853

4

≥ 3,885,301

5

≥ 3,660,781

Pass: Greater than or equal to 162700 TFLOPS