AMD Instinct MI210#

The AMD Instinct™ MI210 GPU is a mainstream HPC and AI PCIe-form-factor accelerator. This document provides MI210-specific prerequisites, health checks, validation steps, and performance acceptance criteria.

Overview#

The AMD Instinct MI210 brings second-generation CDNA architecture to a standard full-height, full-length, dual-slot PCIe® add-in card aimed at single-server HPC and AI deployments. Each MI210 provides 104 compute units, 64 GB of HBM2e memory at up to 1.6 TB/s of memory bandwidth, and up to three AMD Infinity Fabric™ links that enable direct GPU-to-GPU connectivity in dual- and quad-GPU hive configurations. The card is passively cooled with a 300 W TDP and supports PCIe® Gen4 host connectivity.

The MI210 is built on AMD CDNA 2 architecture (gfx90a) in a PCIe add-in-card form factor with 104 compute units and 64 GB of HBM2e memory per accelerator. Unlike the OAM-based AMD Instinct MI250 and MI250X, MI210 deployments are PCIe-attached; GPU-to-GPU traffic uses AMD Infinity Fabric™ links when an Infinity Fabric bridge is installed, otherwise it traverses host PCIe.

System requirements#

Operating system support#

For the most up-to-date information on supported operating systems and distributions, see the official ROCm documentation:

ROCm System Requirements - Supported Distributions

Note

ROCm docs is the single source of truth for supported versions, distribution compatibility, and required dependencies for the ROCm toolkit.

For BIOS, NUMA, and OS-level tuning that applies to all AMD Instinct hosts, see BIOS settings and OS tuning. MI210 systems share the general OS and IOMMU guidance documented for other CDNA 2 platforms but might differ in BIOS power and xGMI topology settings; consult your platform vendor’s BIOS guide for MI210-specific values.

GPU identification#

All MI210 GPUs (PCI vendor:device 1002:740f) should appear in lspci output:

sudo lspci -d 1002:740f

Expected output example:

03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
27:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
43:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
63:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
83:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
a3:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
c3:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
e3:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)

Acceptance criteria#

The MI210 system acceptance process validates that the platform is correctly configured, stable, and performing to expectations. Follow the sequence: Prerequisites → Basic Health Checks → System Validation → Performance Benchmarks.

System acceptance process#

  1. Prerequisites validation - Ensure all system requirements and dependencies are met

  2. Basic health checks - Verify hardware detection and basic system health

  3. System validation - Conduct comprehensive stress testing and qualification

  4. Performance benchmarks - Validate compute, memory, and interconnect performance

The system is accepted when all criteria below are successfully validated.

Prerequisites validation#

Ensure all system requirements are met before proceeding with validation. See the Prerequisites documentation and System setup for more details.

  • ✅ Supported operating system version installed

  • ✅ Compatible ROCm version installed

  • ✅ BIOS configured per BIOS settings, with MI210-specific values per platform vendor

  • ✅ Required kernel parameters present: pci=realloc=off, pci=bfsort, iommu=pt, and amd_iommu=on (or intel_iommu=on on Intel hosts) — see Kernel Parameters

  • ✅ Minimum 512G system memory available

  • ✅ Latest applicable firmware applied consistently across nodes

  • ✅ ROCm Validation Suite (RVS) installed

Basic health checks#

These checks ensure fundamental system health and proper GPU detection. For detailed procedures, see Health Checks.

Test

Command

Pass/Fail criteria

Check OS distribution

cat /etc/os-release

Pass: OS version listed in compatibility matrix
Fail: Otherwise

Check kernel boot arguments

cat /proc/cmdline

Pass: Contains pci=realloc=off, pci=bfsort, iommu=pt, and amd_iommu=on or intel_iommu=on
Fail: Otherwise

Check for driver errors

sudo dmesg -T | grep amdgpu | grep -i error

Pass: Null
Fail: Errors reported

Check available memory

lsmem | grep "Total online memory"

Pass: ≥ 512G
Fail: Less than 512G

Check GPU presence

sudo lspci -d 1002:740f

Pass: 4 MI210 GPUs found
Fail: Otherwise

Check GPU link speed and width

sudo lspci -d 1002:740f -vvv | grep -e DevSta -e LnkSta

Pass: Speed 16GT/s, width x16, no FatalErr+
Fail: Otherwise

Monitor utilization metrics

amd-smi monitor -putm

Pass: Idle metrics as specified
Fail: Otherwise

Check system kernel logs for errors

sudo dmesg -T | grep -i 'error|warn|fail|exception'

Pass: Null
Fail: Otherwise

System validation#

Comprehensive validation ensures system stability under load. For detailed procedures, see System Validation.

Test

Command

Pass/Fail criteria

Compute/GPU properties

rvs -c ${RVS_CONF}/gpup_single.conf

Pass: All GPUs listed with no errors
Fail: Missing GPUs or errors

GPU stress test (GST)

rvs -c ${RVS_CONF}/MI210/gst_single.conf

Pass: met: TRUE in logs
Fail: Target GFLOP/s not met

Input energy delay product (IET)

rvs -c ${RVS_CONF}/MI210/iet_single.conf

Pass: met: TRUE for all actions
Fail: Otherwise

Memory test (MEM)

rvs -c ${RVS_CONF}/mem.conf -l mem.txt

Pass: All tests passed; bandwidth ~1.1TB/s per GPU
Fail: Any test failed or low bandwidth

PCIe bandwidth benchmark (PEBB)

rvs -c ${RVS_CONF}/MI210/pebb_single.conf

Pass: All distances and bandwidths displayed
Fail: Missing data

PCIe qualification tool (PEQT)

rvs -c ${RVS_CONF}/peqt_single.conf

Pass: All actions true
Fail: Otherwise

P2P benchmark and qualification tool (PBQT)

rvs -c ${RVS_CONF}/pbqt_single.conf

Pass: peers:true lines and non-zero throughput
Fail: Otherwise

Performance benchmarks#

Performance validation ensures the system meets MI210 specifications. For detailed procedures, see Performance Benchmarking.

Command: TransferBench a2a

Pass: ≥ 80 GB/s per GPU aggregate

Command: TransferBench p2p

Test

Pass Criteria

UniDir

≥ 35 GB/s per same-socket peer-pair

BiDir

≥ 65 GB/s per same-socket peer-pair (combined)

Command: build/all_reduce_perf -b 8 -e 8G -f 2 -g <N>

Config

Pass Criteria

-g 4 (single-socket quad)

≥ 30 GB/s avg bus bandwidth

-g 8 (dual-socket, cross-socket ring)

≥ 8 GB/s avg bus bandwidth

Command: rocblas-bench (see code block below)
rocblas-bench -f gemm \
  -r s -m 4000 -n 4000 -k 4000 \
  --lda 4000 --ldb 4000 --ldc 4000 \
  --transposeA N --transposeB T

Pass: ≥ 28000 GFLOPS

Command: mpiexec -n 4 wrapper.sh

Kernel

Threshold (MB/s)

Copy

≥ 1,230,000

Mul

≥ 1,225,000

Add

≥ 1,115,000

Triad

≥ 1,115,000

Dot

≥ 1,170,000