AMD Instinct MI250 / MI250X#
The AMD Instinct™ MI250 is a data-center OAM-form-factor GPU. This document provides MI250-specific prerequisites, health checks, validation steps, and performance acceptance criteria. It also applies to the AMD Instinct™ MI250X, which shares the same CDNA 2 (gfx90a) OAM platform and acceptance criteria; MI250X-specific differences are noted inline.
Overview#
The AMD Instinct MI250 brings the second-generation CDNA architecture to an OCP Accelerator Module (OAM) form factor purpose-built for HPC and large-scale AI training. Each MI250 packages two Graphics Compute Dies (GCDs) under a single OAM, each GCD presenting 110 CUs with Matrix Core technology and 64 GB of HBM2e memory at up to 1.6 TB/s, for a combined 128 GB and 3.2 TB/s per OAM. The two GCDs on an OAM are linked by a high-bandwidth on-package AMD Infinity Fabric™ interconnect, and each OAM exposes additional xGMI ports for direct GPU-to-GPU connectivity across a 4-OAM all-to-all mesh. A typical qualified configuration hosts 4 MI250 OAMs (8 GCDs total) per node.
The MI250 is built on the CDNA 2 architecture (gfx90a) in an OCP Accelerator Module (OAM) form factor. Each MI250 OAM hosts two Graphics Compute Dies (GCDs), each enumerated as an independent GPU by ROCm tools, with 128 GB of HBM2e memory per OAM (64 GB per GCD). GPUs are connected to each other and to the host CPUs through AMD Infinity Fabric™ (xGMI).
The MI250X is the higher-performance variant of the same CDNA 2 (gfx90a) OAM platform and is validated using the criteria in this document. It powers exascale-class supercomputers such as Frontier and LUMI. MI250X reference deployments commonly use an 8-OAM (16-GCD) node topology; scale the per-node GCD counts in the commands below accordingly (for example, -g 16 for RCCL and mpiexec -n 16 for BabelStream on an 8-OAM node). MI250X also shares the MI250 PCI vendor:device ID (1002:740c).
System requirements#
Operating system support#
For the most up-to-date information on supported operating systems and distributions, refer to the official ROCm documentation:
ROCm System Requirements - Supported Distributions
Note
ROCm docs is the single source of truth for supported versions, distribution compatibility, and required dependencies for the ROCm toolkit.
For BIOS, NUMA, and OS-level tuning that applies to all AMD Instinct hosts, see BIOS settings and OS tuning.
GPU identification#
All MI250 GCDs (PCI vendor:device 1002:740c) should appear in lspci output. On a fully populated 4-OAM MI250 platform you should see 8 GCD entries (2 per OAM):
sudo lspci -d 1002:740c
Expected output example:
0000:11:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran (rev 01)
0000:14:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran (rev 01)
0000:32:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran (rev 01)
0000:35:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran (rev 01)
0000:8e:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran (rev 01)
0000:93:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran (rev 01)
0000:ae:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran (rev 01)
0000:b3:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran (rev 01)
The 8 GCDs are paired by OAM (e.g. 11+14 are the two GCDs on one OAM, 32+35 on the next, and so on). Same-OAM GCD pairs are connected by a high-bandwidth on-package link, while cross-OAM connectivity uses external xGMI ports in a 4-OAM all-to-all mesh.
Acceptance criteria#
The MI250 system acceptance process validates that the platform is correctly configured, stable, and performing to expectations. Follow the sequence: Prerequisites → Basic Health Checks → System Validation → Performance Benchmarks.
System acceptance process#
Prerequisites validation - Ensure all system requirements and dependencies are met
Basic health checks - Verify hardware detection and basic system health
System validation - Conduct comprehensive stress testing and qualification
Performance benchmarks - Validate compute, memory, and interconnect performance
The system is accepted when all criteria below are successfully validated.
Prerequisites validation#
Ensure all system requirements are met before proceeding with validation. See the Prerequisites documentation and System setup for more details.
✅ Supported operating system version installed
✅ Compatible ROCm version installed (verify:
cat /opt/rocm/.info/version); see the ROCm System Requirements for the current supported version matrix✅ BIOS configured per BIOS settings, with MI250-specific values per platform vendor
✅ Required kernel parameters present:
pci=realloc=off iommu=pt✅ Minimum 1T system memory available
✅ Latest applicable firmware applied consistently across nodes
✅ ROCm Validation Suite (RVS) installed
Basic health checks#
These checks ensure fundamental system health and proper GPU detection. For detailed procedures, see Health Checks.
Test |
Command |
Pass/Fail criteria |
|---|---|---|
|
Pass: OS version listed in compatibility matrix |
|
|
Pass: Contains |
|
|
Pass: Null |
|
|
Pass: ≥ 1T |
|
|
Pass: 8 MI250 GCDs found |
|
|
Pass: Speed PCIe Gen 4 (16 GT/s), width |
|
|
Pass: Idle metrics as specified |
|
|
Pass: Null |
System validation#
Comprehensive validation ensures system stability under load. For detailed procedures, see System Validation.
Test |
Command |
Pass/Fail criteria |
|---|---|---|
|
Pass: All GCDs listed with no errors |
|
|
Pass: |
|
|
Pass: |
|
|
Pass: All tests passed; bandwidth ≥ 1050 GB/s per GCD |
|
|
Pass: All distances and bandwidths displayed |
|
|
Pass: All actions true |
|
|
Pass: |
Performance benchmarks#
Performance validation ensures the system meets MI250 specifications. For detailed procedures, see Performance Benchmarking.
TransferBench a2aPass: ≥ 800 GB/s aggregate
TransferBench p2pTest |
Pass criteria |
|---|---|
UniDir |
≥ 30 GB/s |
BiDir |
≥ 55 GB/s |
build/all_reduce_perf -b 8 -e 8G -f 2 -g 8Pass: ≥ 125 GB/s busbw (peak, at 8 GiB message size)
rocblas-bench (see code block below)rocblas-bench -f gemm \
-r s -m 4000 -n 4000 -k 4000 \
--lda 4000 --ldb 4000 --ldc 4000 \
--transposeA N --transposeB T
Pass: ≥ 30 TFLOPS per GCD
mpiexec -n 8 wrapper.shKernel |
Threshold (MB/s) |
|---|---|
Copy |
≥ 1,200,000 |
Mul |
≥ 1,200,000 |
Add |
≥ 1,100,000 |
Triad |
≥ 1,100,000 |
Dot |
≥ 1,200,000 |