AMD Instinct MI300A#
The AMD Instinct™ MI300A is a data-center Accelerated Processing Unit (APU) that integrates AMD “Zen 4” CPU cores and CDNA 3 GPU compute dies on a single package with unified HBM3 memory. This document provides MI300A-specific prerequisites, health checks, validation steps, and performance acceptance criteria.
Overview#
The MI300A is built on the CDNA 3 architecture (gfx942) and combines CPU and GPU compute dies sharing a single coherent pool of 128 GB HBM3 per APU. Unlike discrete OAM accelerators, MI300A platforms are vendor-defined; a typical qualified configuration hosts 4 MI300A APUs per node.
System requirements#
Operating system support#
For the most up-to-date information on supported operating systems and distributions, refer to the official ROCm documentation:
ROCm System Requirements - Supported Distributions
Note
ROCm docs is the single source of truth for supported versions, distribution compatibility, and required dependencies for the ROCm toolkit.
For BIOS, IOMMU, transparent hugepages, NUMA, and OS-level tuning that applies to all AMD Instinct hosts, see BIOS settings, OS tuning, and Kernel parameters. MI300A requires a Linux kernel that supports “Zen 4” (≥ 5.18 recommended).
GPU identification#
All MI300A APUs (PCI vendor:device 1002:74a0) should appear in lspci output:
sudo lspci -d 1002:74a0
Expected output example:
0000:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A]
0001:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A]
0002:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A]
0003:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A]
Acceptance criteria#
The MI300A system acceptance process validates that the platform is correctly configured, stable, and performing to expectations. Follow the sequence: Prerequisites → Basic Health Checks → System Validation → Performance Benchmarks.
System acceptance process#
Prerequisites validation - Ensure all system requirements and dependencies are met
Basic health checks - Verify hardware detection and basic system health
System validation - Conduct comprehensive stress testing and qualification
Performance benchmarks - Validate compute, memory, and interconnect performance
The system is accepted when all criteria below are successfully validated.
Prerequisites validation#
Ensure all system requirements are met before proceeding with validation. See the Prerequisites documentation and System setup for more details.
✅ Supported operating system version installed with kernel ≥ 5.18 (Zen 4 support)
✅ Compatible ROCm version installed (verify:
cat /opt/rocm/.info/version); see the ROCm System Requirements for the current supported version matrix✅ BIOS configured per BIOS settings, with MI300A-specific values per platform vendor (IOMMU off, memory interleaving, NPS)
✅ Required kernel parameters present:
pci=realloc=off transparent_hugepage=always numa_balancing=disable✅ Sysctl tunings applied:
vm.compaction_proactiveness=20,vm.max_map_countincreased per ROCm guide✅ Environment variables (where applicable):
HSA_OVERRIDE_CPU_AFFINITY_DEBUG=0GPU_MAX_ALLOC_PERCENTandGPU_SINGLE_ALLOC_PERCENTtuned per workload
✅ Minimum 4 × 128 GB = 512 GB unified HBM3 visible to the OS usable host-visible memory (note: MI300A’s HBM is unified with CPU)
✅ Latest applicable firmware applied consistently across nodes
✅ ROCm Validation Suite (RVS) installed
Basic health checks#
These checks ensure fundamental system health and proper APU detection. For detailed procedures, see Health Checks.
Test |
Command |
Pass/Fail criteria |
|---|---|---|
|
Pass: OS version listed in compatibility matrix |
|
|
Pass: Contains |
|
|
Pass: Null |
|
|
Pass: ≥ 4 × 128 GB = 512 GB unified HBM3 visible to the OS |
|
|
Pass: 4 MI300A APUs found |
|
|
Pass: Speed PCIe Gen 5 (32 GT/s), width |
|
|
Pass: Idle metrics as specified |
|
|
Pass: Null |
System validation#
Comprehensive validation ensures system stability under load. For detailed procedures, see System Validation.
Test |
Command |
Pass/Fail criteria |
|---|---|---|
|
Pass: All APUs listed with no errors |
|
|
Pass: |
|
|
Pass: |
|
|
Pass: All tests passed; bandwidth ≥ 2.0 TB/s per APU |
|
|
Pass: All distances and bandwidths displayed |
|
|
Pass: All actions true |
|
|
Pass: |
Performance benchmarks#
Performance validation ensures the system meets MI300A specifications. For detailed procedures, see Performance Benchmarking.
TransferBench a2aPass: ≥ 700 GB/s aggregate
TransferBench p2pTest |
Pass criteria |
|---|---|
UniDir |
≥ 80 GB/s |
BiDir |
≥ 155 GB/s |
build/all_reduce_perf -b 8 -e 8G -f 2 -g 4Pass: ≥ 230 GB/s busbw (peak, at 8 GiB message size)
rocblas-bench (see code block below)rocblas-bench -f gemm \
-r s -m 4000 -n 4000 -k 4000 \
--lda 4000 --ldb 4000 --ldc 4000 \
--transposeA N --transposeB T
Pass: ≥ 60 TFLOPS per APU
mpiexec -n 4 wrapper.shKernel |
Threshold (MB/s) |
|---|---|
Copy |
≥ 2,900,000 |
Mul |
≥ 3,000,000 |
Add |
≥ 3,250,000 |
Triad |
≥ 3,250,000 |
Dot |
≥ 2,200,000 |