AMD Instinct MI100#

The AMD Instinct™ MI100 is a data-center compute PCIe-form-factor GPU. This document provides MI100-specific prerequisites, health checks, validation steps, and performance acceptance criteria.

Overview#

The AMD Instinct MI100 introduces the first-generation CDNA architecture in a standard full-height, full-length, dual-slot PCIe® add-in card aimed at HPC and accelerated computing workloads. Each MI100 provides 120 compute units with Matrix Core technology, 32 GB of HBM2 memory at up to 1.2 TB/s, and AMD Infinity Fabric™ link support for direct GPU-to-GPU connectivity in 2- and 4-GPU hive configurations. The card is passively cooled with a 300 W TDP and supports PCIe® Gen4 host connectivity.

The MI100 is built on the CDNA architecture (gfx908) with 120 compute units and 32 GB of HBM2 memory per GPU. The MI100 Infinity Fabric™ topology tops out at 4 GPUs per hive, so the validation reference configuration for this document is a single 4-GPU MI100 hive with Infinity Fabric™ bridges providing direct GPU-to-GPU connectivity across all peers. Larger deployments (for example, dual-socket servers with two 4-GPU hives for 8 MI100s total) are common; in those systems, cross-hive traffic traverses the host PCIe fabric and the per-hive criteria below apply to each hive independently.

System requirements#

Operating system support#

For the most up-to-date information on supported operating systems and distributions, refer to the official ROCm documentation:

ROCm System Requirements - Supported Distributions

Note

ROCm docs is the single source of truth for supported versions, distribution compatibility, and required dependencies for the ROCm toolkit.

For BIOS, NUMA, and OS-level tuning that applies to all AMD Instinct hosts, see BIOS settings and OS tuning.

GPU identification#

All MI100 GPUs (PCI vendor:device 1002:738c) should appear in lspci output:

sudo lspci -d 1002:738c

Expected output example (4-GPU MI100 hive):

1d:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Arcturus GL-XL [Instinct MI100] (rev 01)
20:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Arcturus GL-XL [Instinct MI100] (rev 01)
23:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Arcturus GL-XL [Instinct MI100] (rev 01)
26:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Arcturus GL-XL [Instinct MI100] (rev 01)

Acceptance criteria#

The MI100 system acceptance process validates that the platform is correctly configured, stable, and performing to expectations. Follow the sequence: Prerequisites → Basic Health Checks → System Validation → Performance Benchmarks.

System acceptance process#

  1. Prerequisites validation - Ensure all system requirements and dependencies are met

  2. Basic health checks - Verify hardware detection and basic system health

  3. System validation - Conduct comprehensive stress testing and qualification

  4. Performance benchmarks - Validate compute, memory, and interconnect performance

The system is accepted when all criteria below are successfully validated.

Prerequisites validation#

Ensure all system requirements are met before proceeding with validation. See the Prerequisites documentation and System setup for more details.

  • ✅ Supported operating system version installed

  • ✅ Compatible ROCm version installed

  • ✅ BIOS configured per BIOS settings, with MI100-specific values per platform vendor

  • ✅ Required kernel parameters present: pci=realloc=off, pci=bfsort, iommu=pt, and amd_iommu=on (or intel_iommu=on on Intel hosts) — see Kernel Parameters

  • ✅ Minimum 256G system memory available

  • ✅ Latest applicable firmware applied consistently across nodes

  • ✅ ROCm Validation Suite (RVS) installed

Basic health checks#

These checks ensure fundamental system health and proper GPU detection. For detailed procedures, see Health Checks.

Test

Command

Pass/Fail criteria

Check OS distribution

cat /etc/os-release

Pass: OS version listed in compatibility matrix
Fail: Otherwise

Check kernel boot arguments

cat /proc/cmdline

Pass: Contains pci=realloc=off, pci=bfsort, iommu=pt, and amd_iommu=on or intel_iommu=on
Fail: Otherwise

Check for driver errors

sudo dmesg -T | grep amdgpu | grep -i error

Pass: Null
Fail: Errors reported

Check available memory

lsmem | grep "Total online memory"

Pass: ≥ 256G
Fail: Less than 256G

Check GPU presence

sudo lspci -d 1002:738c

Pass: 4 MI100 GPUs found (per hive)
Fail: Otherwise

Check GPU link speed and width

sudo lspci -d 1002:738c -vvv | grep -e DevSta -e LnkSta

Pass: Speed 16GT/s, width x16, no FatalErr+
Fail: Otherwise

Monitor utilization metrics

amd-smi monitor -putm

Pass: Idle metrics as specified
Fail: Otherwise

Check system kernel logs for errors

sudo dmesg -T | grep -i 'error|warn|fail|exception'

Pass: Null
Fail: Otherwise

System validation#

Comprehensive validation ensures system stability under load. For detailed procedures, see System Validation.

Test

Command

Pass/Fail criteria

Compute/GPU properties

rvs -c ${RVS_CONF}/gpup_single.conf

Pass: All GPUs listed with no errors
Fail: Missing GPUs or errors

GPU stress test (GST)

rvs -c ${RVS_CONF}/MI100/gst_single.conf

Pass: met: TRUE in logs
Fail: Target GFLOP/s not met

Input energy delay product (IET)

rvs -c ${RVS_CONF}/MI100/iet_single.conf

Pass: met: TRUE for all actions
Fail: Otherwise

Memory test (MEM)

rvs -c ${RVS_CONF}/mem.conf -l mem.txt

Pass: All tests passed; bandwidth ≥ 800 GB/s per GPU
Fail: Any test failed or low bandwidth

PCIe bandwidth benchmark (PEBB)

rvs -c ${RVS_CONF}/MI100/pebb_single.conf

Pass: All distances and bandwidths displayed
Fail: Missing data

PCIe qualification tool (PEQT)

rvs -c ${RVS_CONF}/peqt_single.conf

Pass: All actions true
Fail: Otherwise

P2P benchmark and qualification tool (PBQT)

rvs -c ${RVS_CONF}/pbqt_single.conf

Pass: peers:true lines and non-zero throughput
Fail: Otherwise

Note

The reference configuration for this document is a single 4-GPU MI100 hive with AMD Infinity Fabric™ bridges installed, so intra-hive PBQT and TransferBench numbers reflect XGMI throughput. On systems without bridges, P2P traffic traverses the host PCIe fabric and these thresholds will not be met.

Performance benchmarks#

Performance validation ensures the system meets MI100 specifications. For detailed procedures, see Performance Benchmarking.

Command: TransferBench a2a

Pass: ≥ 270 GB/s aggregate

Command: TransferBench p2p

Test

Pass criteria

UniDir

≥ 30 GB/s

BiDir

≥ 57 GB/s

Command: build/all_reduce_perf -b 8 -e 8G -f 2 -g 4

Pass: ≥ 72 GB/s busbw (peak, at 8 GiB message size)

Command: rocblas-bench (see code block below)
rocblas-bench -f gemm \
  -r s -m 4000 -n 4000 -k 4000 \
  --lda 4000 --ldb 4000 --ldc 4000 \
  --transposeA N --transposeB T

Pass: ≥ 28 TFLOPS per GPU

Command: mpiexec -n 4 wrapper.sh

Kernel

Threshold (MB/s)

Copy

≥ 940,000

Mul

≥ 940,000

Add

≥ 910,000

Triad

≥ 910,000

Dot

≥ 950,000