---
myst:
    html_meta:
        "description": "MI210 GPU system acceptance guide: prerequisites, health checks, system validation, and performance benchmarks for HPC and AI deployments."
        "keywords": "AMD Instinct MI210, GPU acceptance testing, ROCm, HPC, AI, PCIe GPU, system validation, health checks, BabelStream, rocBLAS, RCCL, TransferBench, CDNA2"
---
# AMD Instinct MI210

The AMD Instinct™ MI210 GPU is a mainstream HPC and AI PCIe-form-factor accelerator. This document provides MI210-specific prerequisites, health checks, validation steps, and performance acceptance criteria.

## Overview

The AMD Instinct MI210 brings second-generation CDNA architecture to a standard full-height, full-length, dual-slot PCIe® add-in card aimed at single-server HPC and AI deployments. Each MI210 provides 104 compute units, 64 GB of HBM2e memory at up to 1.6 TB/s of memory bandwidth, and up to three AMD Infinity Fabric™ links that enable direct GPU-to-GPU connectivity in dual- and quad-GPU hive configurations. The card is passively cooled with a 300 W TDP and supports PCIe® Gen4 host connectivity.

The MI210 is built on AMD CDNA 2 architecture (gfx90a) in a PCIe add-in-card form factor with 104 compute units and 64 GB of HBM2e memory per accelerator. Unlike the OAM-based AMD Instinct MI250 and MI250X, MI210 deployments are PCIe-attached; GPU-to-GPU traffic uses AMD Infinity Fabric™ links when an Infinity Fabric bridge is installed, otherwise it traverses host PCIe.

- **[MI210 Product Page](https://www.amd.com/en/products/accelerators/instinct/mi200/mi210.html)**
- **[MI200 Series Data Sheet](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instinct-mi200-datasheet.pdf)**
- **[MI200 Series Microarchitecture](https://instinct.docs.amd.com/latest/gpu-arch/mi250.html)**

## System requirements

### Operating system support

For the most up-to-date information on supported operating systems and distributions, see the official ROCm documentation:

[ROCm System Requirements - Supported Distributions](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-distributions)

```{note}
[ROCm docs](https://rocm.docs.amd.com) is the single source of truth for supported versions, distribution compatibility, and required dependencies for the ROCm toolkit.
```

For BIOS, NUMA, and OS-level tuning that applies to all AMD Instinct hosts, see [BIOS settings](../common/bios-settings.md) and [OS tuning](../common/os-tuning.md). MI210 systems share the general OS and IOMMU guidance documented for other CDNA 2 platforms but might differ in BIOS power and xGMI topology settings; consult your platform vendor's BIOS guide for MI210-specific values.

### GPU identification

All MI210 GPUs (PCI vendor:device `1002:740f`) should appear in `lspci` output:

```bash
sudo lspci -d 1002:740f
```

Expected output example:

```bash
03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
27:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
43:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
63:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
83:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
a3:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
c3:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
e3:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
```

## Acceptance criteria

The MI210 system acceptance process validates that the platform is correctly configured, stable, and performing to expectations. Follow the sequence: Prerequisites → Basic Health Checks → System Validation → Performance Benchmarks.

### System acceptance process

1. **[Prerequisites validation](#prerequisites-validation)** - Ensure all system requirements and dependencies are met
2. **[Basic health checks](#basic-health-checks)** - Verify hardware detection and basic system health
3. **[System validation](#system-validation)** - Conduct comprehensive stress testing and qualification
4. **[Performance benchmarks](#performance-benchmarks)** - Validate compute, memory, and interconnect performance

The system is accepted when all criteria below are successfully validated.

### Prerequisites validation

Ensure all system requirements are met before proceeding with validation. See the [Prerequisites documentation](../common/prerequisites.md) and [System setup](../common/system-setup.md) for more details.

- ✅ Supported operating system version installed
- ✅ Compatible ROCm version installed
- ✅ BIOS configured per [BIOS settings](../common/bios-settings.md), with MI210-specific values per platform vendor
- ✅ Required kernel parameters present: `pci=realloc=off`, `pci=bfsort`, `iommu=pt`, and `amd_iommu=on` (or `intel_iommu=on` on Intel hosts) — see [Kernel Parameters](../common/kernel-parameters.md)
- ✅ Minimum 512G system memory available
- ✅ Latest applicable firmware applied consistently across nodes
- ✅ ROCm Validation Suite (RVS) installed

### Basic health checks

These checks ensure fundamental system health and proper GPU detection. For detailed procedures, see [Health Checks](../common/health-checks.md).

| Test | Command | Pass/Fail criteria |
|------|---------|-------------------|
| [Check OS distribution](../common/health-checks.md#check-os-distribution) | `cat /etc/os-release` | **Pass**: OS version listed in compatibility matrix<br>**Fail**: Otherwise |
| [Check kernel boot arguments](../common/health-checks.md#check-kernel-boot-arguments) | `cat /proc/cmdline` | **Pass**: Contains `pci=realloc=off`, `pci=bfsort`, `iommu=pt`, and `amd_iommu=on` or `intel_iommu=on`<br>**Fail**: Otherwise |
| [Check for driver errors](../common/health-checks.md#check-for-driver-errors) | `sudo dmesg -T \| grep amdgpu \| grep -i error` | **Pass**: Null<br>**Fail**: Errors reported |
| [Check available memory](../common/health-checks.md#check-for-available-system-memory) | `lsmem \| grep "Total online memory"` | **Pass**: ≥ 512G<br>**Fail**: Less than 512G |
| [Check GPU presence](../common/health-checks.md#check-gpu-presence) | `sudo lspci -d 1002:740f` | **Pass**: 4 MI210 GPUs found<br>**Fail**: Otherwise |
| [Check GPU link speed and width](../common/health-checks.md#check-gpu-pcie-bus-link-speed-and-width) | `sudo lspci -d 1002:740f -vvv \| grep -e DevSta -e LnkSta` | **Pass**: Speed 16GT/s, width `x16`, no `FatalErr+`<br>**Fail**: Otherwise |
| [Monitor utilization metrics](../common/health-checks.md#monitor-utilization-metrics) | `amd-smi monitor -putm` | **Pass**: Idle metrics as specified<br>**Fail**: Otherwise |
| [Check system kernel logs for errors](../common/health-checks.md#check-system-kernel-logs) | `sudo dmesg -T \| grep -i 'error\|warn\|fail\|exception'` | **Pass**: Null<br>**Fail**: Otherwise |

### System validation

Comprehensive validation ensures system stability under load. For detailed procedures, see [System Validation](../common/system-validation.md).

| Test | Command | Pass/Fail criteria |
|------|---------|-------------------|
| [Compute/GPU properties](../common/system-validation.md#gpu-properties) | `rvs -c ${RVS_CONF}/gpup_single.conf` | **Pass**: All GPUs listed with no errors<br>**Fail**: Missing GPUs or errors |
| [GPU stress test (GST)](../common/system-validation.md#gpu-stress-test) | `rvs -c ${RVS_CONF}/MI210/gst_single.conf` | **Pass**: `met: TRUE` in logs<br>**Fail**: Target GFLOP/s not met |
| [Input energy delay product (IET)](../common/system-validation.md#input-energy-delay-product) | `rvs -c ${RVS_CONF}/MI210/iet_single.conf` | **Pass**: `met: TRUE` for all actions<br>**Fail**: Otherwise |
| [Memory test (MEM)](../common/system-validation.md#mem) | `rvs -c ${RVS_CONF}/mem.conf -l mem.txt` | **Pass**: All tests passed; bandwidth ~1.1TB/s per GPU<br>**Fail**: Any test failed or low bandwidth |
| [PCIe bandwidth benchmark (PEBB)](../common/system-validation.md#pcie-bandwidth-benchmark) | `rvs -c ${RVS_CONF}/MI210/pebb_single.conf` | **Pass**: All distances and bandwidths displayed<br>**Fail**: Missing data |
| [PCIe qualification tool (PEQT)](../common/system-validation.md#pcie-qualification-tool) | `rvs -c ${RVS_CONF}/peqt_single.conf` | **Pass**: All actions true<br>**Fail**: Otherwise |
| [P2P benchmark and qualification tool (PBQT)](../common/system-validation.md#p2p-benchmark-and-qualification-tool) | `rvs -c ${RVS_CONF}/pbqt_single.conf` | **Pass**: `peers:true` lines and non-zero throughput<br>**Fail**: Otherwise |

### Performance benchmarks

Performance validation ensures the system meets MI210 specifications. For detailed procedures, see [Performance Benchmarking](../common/system-validation.md#performance-benchmarking).

:::{card} Command: `TransferBench a2a`
[TransferBench all-to-all](../common/system-validation.md#transferbench)
^^^
**Pass:** ≥ 80 GB/s per GPU aggregate
+++
**Fail:** otherwise
:::

:::{card} Command: `TransferBench p2p`
[TransferBench peer-to-peer](../common/system-validation.md#transferbench)
^^^

| Test | Pass Criteria |
|------|--------------|
| UniDir | ≥ 35 GB/s per same-socket peer-pair |
| BiDir | ≥ 65 GB/s per same-socket peer-pair (combined) |

+++
**Fail:** otherwise
:::

:::{card} Command: `build/all_reduce_perf -b 8 -e 8G -f 2 -g <N>`
[RCCL Allreduce](../common/system-validation.md#rccl-allreduce)
^^^

| Config | Pass Criteria |
|--------|--------------|
| `-g 4` (single-socket quad) | ≥ 30 GB/s avg bus bandwidth |
| `-g 8` (dual-socket, cross-socket ring) | ≥ 8 GB/s avg bus bandwidth |

+++
**Fail:** otherwise
:::

:::{card} Command: `rocblas-bench` (see code block below)
[rocBLAS FP32](../common/system-validation.md#rocblas-gemm-benchmarks)
^^^

```bash
rocblas-bench -f gemm \
  -r s -m 4000 -n 4000 -k 4000 \
  --lda 4000 --ldb 4000 --ldc 4000 \
  --transposeA N --transposeB T
```

**Pass:** ≥ 28000 GFLOPS
+++
**Fail:** otherwise
:::

:::{card} Command: `mpiexec -n 4 wrapper.sh`
[BabelStream](../common/system-validation.md#babelstream)
^^^

| Kernel | Threshold (MB/s) |
|--------|-----------------|
| Copy  | ≥ 1,230,000 |
| Mul   | ≥ 1,225,000 |
| Add   | ≥ 1,115,000 |
| Triad | ≥ 1,115,000 |
| Dot   | ≥ 1,170,000 |

+++
**Fail:** otherwise
:::
