AMD Instinct MI300A#

The AMD Instinct™ MI300A is a data-center Accelerated Processing Unit (APU) that integrates AMD “Zen 4” CPU cores and CDNA 3 GPU compute dies on a single package with unified HBM3 memory. This document provides MI300A-specific prerequisites, health checks, validation steps, and performance acceptance criteria.

Overview#

The MI300A is built on the CDNA 3 architecture (gfx942) and combines CPU and GPU compute dies sharing a single coherent pool of 128 GB HBM3 per APU. Unlike discrete OAM accelerators, MI300A platforms are vendor-defined; a typical qualified configuration hosts 4 MI300A APUs per node.

System requirements#

Operating system support#

For the most up-to-date information on supported operating systems and distributions, refer to the official ROCm documentation:

ROCm System Requirements - Supported Distributions

Note

ROCm docs is the single source of truth for supported versions, distribution compatibility, and required dependencies for the ROCm toolkit.

For BIOS, IOMMU, transparent hugepages, NUMA, and OS-level tuning that applies to all AMD Instinct hosts, see BIOS settings, OS tuning, and Kernel parameters. MI300A requires a Linux kernel that supports “Zen 4” (≥ 5.18 recommended).

GPU identification#

All MI300A APUs (PCI vendor:device 1002:74a0) should appear in lspci output:

sudo lspci -d 1002:74a0

Expected output example:

0000:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A]
0001:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A]
0002:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A]
0003:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A]

Acceptance criteria#

The MI300A system acceptance process validates that the platform is correctly configured, stable, and performing to expectations. Follow the sequence: Prerequisites → Basic Health Checks → System Validation → Performance Benchmarks.

System acceptance process#

  1. Prerequisites validation - Ensure all system requirements and dependencies are met

  2. Basic health checks - Verify hardware detection and basic system health

  3. System validation - Conduct comprehensive stress testing and qualification

  4. Performance benchmarks - Validate compute, memory, and interconnect performance

The system is accepted when all criteria below are successfully validated.

Prerequisites validation#

Ensure all system requirements are met before proceeding with validation. See the Prerequisites documentation and System setup for more details.

  • ✅ Supported operating system version installed with kernel ≥ 5.18 (Zen 4 support)

  • ✅ Compatible ROCm version installed (verify: cat /opt/rocm/.info/version); see the ROCm System Requirements for the current supported version matrix

  • ✅ BIOS configured per BIOS settings, with MI300A-specific values per platform vendor (IOMMU off, memory interleaving, NPS)

  • ✅ Required kernel parameters present: pci=realloc=off transparent_hugepage=always numa_balancing=disable

  • ✅ Sysctl tunings applied: vm.compaction_proactiveness=20, vm.max_map_count increased per ROCm guide

  • ✅ Environment variables (where applicable):

    • HSA_OVERRIDE_CPU_AFFINITY_DEBUG=0

    • GPU_MAX_ALLOC_PERCENT and GPU_SINGLE_ALLOC_PERCENT tuned per workload

  • ✅ Minimum 4 × 128 GB = 512 GB unified HBM3 visible to the OS usable host-visible memory (note: MI300A’s HBM is unified with CPU)

  • ✅ Latest applicable firmware applied consistently across nodes

  • ✅ ROCm Validation Suite (RVS) installed

Basic health checks#

These checks ensure fundamental system health and proper APU detection. For detailed procedures, see Health Checks.

Test

Command

Pass/Fail criteria

Check OS distribution

cat /etc/os-release

Pass: OS version listed in compatibility matrix
Fail: Otherwise

Check kernel boot arguments

cat /proc/cmdline

Pass: Contains pci=realloc=off transparent_hugepage=always numa_balancing=disable
Fail: Missing any required param

Check for driver errors

sudo dmesg -T | grep amdgpu | grep -i error

Pass: Null
Fail: Errors reported

Check available memory

lsmem | grep "Total online memory"

Pass: ≥ 4 × 128 GB = 512 GB unified HBM3 visible to the OS
Fail: Less than 4 × 128 GB = 512 GB unified HBM3 visible to the OS

Check GPU presence

sudo lspci -d 1002:74a0

Pass: 4 MI300A APUs found
Fail: Otherwise

Check GPU link speed and width

sudo lspci -d 1002:74a0 -vvv | grep -e DevSta -e LnkSta

Pass: Speed PCIe Gen 5 (32 GT/s), width x16, no FatalErr+
Fail: Otherwise

Monitor utilization metrics

amd-smi monitor -putm

Pass: Idle metrics as specified
Fail: Otherwise

Check system kernel logs for errors

sudo dmesg -T | grep -i 'error|warn|fail|exception'

Pass: Null
Fail: Otherwise

System validation#

Comprehensive validation ensures system stability under load. For detailed procedures, see System Validation.

Test

Command

Pass/Fail criteria

Compute/GPU properties

rvs -c ${RVS_CONF}/gpup_single.conf

Pass: All APUs listed with no errors
Fail: Missing APUs or errors

GPU stress test (GST)

rvs -c ${RVS_CONF}/MI300A/gst_single.conf

Pass: met: TRUE in logs
Fail: Target GFLOP/s not met

Input energy delay product (IET)

rvs -c ${RVS_CONF}/MI300A/iet_single.conf

Pass: met: TRUE for all actions
Fail: Otherwise

Memory test (MEM)

rvs -c ${RVS_CONF}/mem.conf -l mem.txt

Pass: All tests passed; bandwidth ≥ 2.0 TB/s per APU
Fail: Any test failed or low bandwidth

PCIe bandwidth benchmark (PEBB)

rvs -c ${RVS_CONF}/MI300A/pebb_single.conf

Pass: All distances and bandwidths displayed
Fail: Missing data

PCIe qualification tool (PEQT)

rvs -c ${RVS_CONF}/peqt_single.conf

Pass: All actions true
Fail: Otherwise

P2P benchmark and qualification tool (PBQT)

rvs -c ${RVS_CONF}/pbqt_single.conf

Pass: peers:true lines and non-zero throughput across all xGMI peers
Fail: Otherwise

Performance benchmarks#

Performance validation ensures the system meets MI300A specifications. For detailed procedures, see Performance Benchmarking.

Command: TransferBench a2a

Pass: ≥ 700 GB/s aggregate

Command: TransferBench p2p

Test

Pass criteria

UniDir

≥ 80 GB/s

BiDir

≥ 155 GB/s

Command: build/all_reduce_perf -b 8 -e 8G -f 2 -g 4

Pass: ≥ 230 GB/s busbw (peak, at 8 GiB message size)

Command: rocblas-bench (see code block below)
rocblas-bench -f gemm \
  -r s -m 4000 -n 4000 -k 4000 \
  --lda 4000 --ldb 4000 --ldc 4000 \
  --transposeA N --transposeB T

Pass: ≥ 60 TFLOPS per APU

Command: mpiexec -n 4 wrapper.sh

Kernel

Threshold (MB/s)

Copy

≥ 2,900,000

Mul

≥ 3,000,000

Add

≥ 3,250,000

Triad

≥ 3,250,000

Dot

≥ 2,200,000