AMD Instinct MI325X#
The AMD Instinct™ MI325X is a high-performance GPU accelerator designed for AI, HPC, and demanding workloads. This document provides a comprehensive overview of MI325X-specific requirements, specifications, and acceptance testing criteria.
Overview#
The AMD Instinct MI325X is a high-performance GPU accelerator designed for AI, HPC, and demanding workloads. The MI325X platform utilizes a Universal Baseboard (UBB 2.0) configuration that hosts 8 AMD Instinct MI325X OAM (OCP Accelerator Module) accelerators with a total of 2TB of HBM3 memory.
Each MI325X GPU features multiple chiplet design with 8 XCDs (Accelerator Complex Dies) and 256GB of HBM3 memory per accelerator. It utilizes AMD’s CDNA 3 architecture with fully-meshed Infinity Fabric™ connectivity between accelerators.
System Requirements#
Operating System Support#
For the most up-to-date information on supported operating systems and distributions, please refer to the official ROCm documentation:
ROCm System Requirements - Supported Distributions
Note
ROCm docs is the single source of truth for supported versions, distribution compatibility, and required dependencies for the ROCm toolkit.
GPU Identification#
All MI325X GPUs (device ID 1002:74a5) should appear in lspci
output:
sudo lspci -d 1002:74a5
Expected output example:
05:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a5
26:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a5
46:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a5
65:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a5
85:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a5
a6:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a5
c6:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a5
e5:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a5
Acceptance Criteria#
The MI325X system acceptance process is designed to validate that your system is operating correctly and meets all performance specifications. Users are expected to step through the validation guides in sequence to ensure comprehensive system verification.
System Acceptance Process#
Prerequisites Validation - Ensure all system requirements and dependencies are met
Basic Health Checks - Verify hardware detection and basic system health
System Validation - Conduct comprehensive stress testing and qualification
Performance Benchmarks - Validate compute, memory, and interconnect performance
The system is accepted when all criteria below are successfully validated:
Prerequisites Validation#
Ensure all system requirements are met before proceeding with validation. See the Prerequisites documentation and System setup for more details.
✅ Supported operating system version installed
✅ ROCm 6.3.2 or later installed
✅ System manufacturer compatibility verified
✅ All required dependencies installed
Basic Health Checks#
These checks ensure fundamental system health and proper GPU detection. For detailed procedures, see Health Checks.
Test |
Command |
Pass/Fail criteria |
---|---|---|
|
Pass: OS version listed in compatibility matrix |
|
|
Pass: Contains |
|
|
Pass: 2.5T or more |
|
|
Pass: All 8 GPUs found |
|
|
Pass: Speed 32GT/s, width |
|
|
Pass: Idle metrics as specified |
|
|
Pass: Null |
System Validation#
Comprehensive validation ensures system stability under load. For detailed procedures, see System Validation.
Test |
Command |
Pass/Fail criteria |
---|---|---|
|
Pass: All GPUs listed with no errors |
|
|
Pass: |
|
|
Pass: |
|
|
Pass: All tests passed; bandwidth ~2TB/s |
|
|
Pass: All distances and bandwidths displayed |
|
|
Pass: All actions true |
|
|
Pass: |
Performance Benchmarks#
Performance validation ensures the system meets MI325X specifications. For detailed procedures, see the Performance Benchmarking.
TransferBench a2a
Pass: ≥ 32.9 GB/s
TransferBench p2p
Test |
Pass Criteria |
---|---|
UniDir |
≥ 33.9 GB/s |
BiDir |
≥ 43.9 GB/s |
TransferBench example.cfg
Test |
Pass Criteria |
---|---|
Test 1 |
≥ 47.1 GB/s |
Test 2 |
≥ 48.4 GB/s |
Test 3 |
≥ 31.9 (0→1), ≥ 38.9 (1→0) GB/s |
Test 4 |
≥ 1264 GB/s |
Test 5 |
N/A (GPU validation) |
Test 6 |
≥ 48.6 GB/s |
build/all_reduce_perf -b 8 -e 8G -f 2 -g 8
Pass: ≥ 304 GB/s
rocblas-bench -f gemm \
-r s -m 4000 \
--lda 4000 --ldb 4000 --ldc 4000 \
--transposeA N --transposeB T
Pass: ≥ 94100 TFLOPS
rocblas-bench -f gemm_strided_batched_ex \
--transposeA N --transposeB T \
-m 1024 -n 2048 -k 512 \
--a_type h --lda 1024 --stride_a 4096 \
--b_type h --ldb 2048 --stride_b 4096 \
--c_type s --ldc 1024 --stride_c 2097152 \
--d_type s --ldd 1024 --stride_d 2097152 \
--compute_type s \
--alpha 1.1 --beta 1 \
--batch_count 5
Pass: ≥ 130600 TFLOPS
rocblas-bench -f gemm_strided_batched_ex \
--transposeA N --transposeB T \
-m 1024 -n 2048 -k 512 \
--a_type i8_r --lda 1024 --stride_a 4096 \
--b_type i8_r --ldb 2048 --stride_b 4096 \
--c_type i32_r --ldc 1024 --stride_c 2097152 \
--d_type i32_r --ldd 1024 --stride_d 2097152 \
--compute_type i32_r \
--alpha 1.1 --beta 1 \
--batch_count 5
Pass: ≥ 162700 TFLOPS