AMD Instinct MI100#
The AMD Instinct™ MI100 is a data-center compute PCIe-form-factor GPU. This document provides MI100-specific prerequisites, health checks, validation steps, and performance acceptance criteria.
Overview#
The AMD Instinct MI100 introduces the first-generation CDNA architecture in a standard full-height, full-length, dual-slot PCIe® add-in card aimed at HPC and accelerated computing workloads. Each MI100 provides 120 compute units with Matrix Core technology, 32 GB of HBM2 memory at up to 1.2 TB/s, and AMD Infinity Fabric™ link support for direct GPU-to-GPU connectivity in 2- and 4-GPU hive configurations. The card is passively cooled with a 300 W TDP and supports PCIe® Gen4 host connectivity.
The MI100 is built on the CDNA architecture (gfx908) with 120 compute units and 32 GB of HBM2 memory per GPU. The MI100 Infinity Fabric™ topology tops out at 4 GPUs per hive, so the validation reference configuration for this document is a single 4-GPU MI100 hive with Infinity Fabric™ bridges providing direct GPU-to-GPU connectivity across all peers. Larger deployments (for example, dual-socket servers with two 4-GPU hives for 8 MI100s total) are common; in those systems, cross-hive traffic traverses the host PCIe fabric and the per-hive criteria below apply to each hive independently.
System requirements#
Operating system support#
For the most up-to-date information on supported operating systems and distributions, refer to the official ROCm documentation:
ROCm System Requirements - Supported Distributions
Note
ROCm docs is the single source of truth for supported versions, distribution compatibility, and required dependencies for the ROCm toolkit.
For BIOS, NUMA, and OS-level tuning that applies to all AMD Instinct hosts, see BIOS settings and OS tuning.
GPU identification#
All MI100 GPUs (PCI vendor:device 1002:738c) should appear in lspci output:
sudo lspci -d 1002:738c
Expected output example (4-GPU MI100 hive):
1d:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Arcturus GL-XL [Instinct MI100] (rev 01)
20:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Arcturus GL-XL [Instinct MI100] (rev 01)
23:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Arcturus GL-XL [Instinct MI100] (rev 01)
26:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Arcturus GL-XL [Instinct MI100] (rev 01)
Acceptance criteria#
The MI100 system acceptance process validates that the platform is correctly configured, stable, and performing to expectations. Follow the sequence: Prerequisites → Basic Health Checks → System Validation → Performance Benchmarks.
System acceptance process#
Prerequisites validation - Ensure all system requirements and dependencies are met
Basic health checks - Verify hardware detection and basic system health
System validation - Conduct comprehensive stress testing and qualification
Performance benchmarks - Validate compute, memory, and interconnect performance
The system is accepted when all criteria below are successfully validated.
Prerequisites validation#
Ensure all system requirements are met before proceeding with validation. See the Prerequisites documentation and System setup for more details.
✅ Supported operating system version installed
✅ Compatible ROCm version installed
✅ BIOS configured per BIOS settings, with MI100-specific values per platform vendor
✅ Required kernel parameters present:
pci=realloc=off,pci=bfsort,iommu=pt, andamd_iommu=on(orintel_iommu=onon Intel hosts) — see Kernel Parameters✅ Minimum 256G system memory available
✅ Latest applicable firmware applied consistently across nodes
✅ ROCm Validation Suite (RVS) installed
Basic health checks#
These checks ensure fundamental system health and proper GPU detection. For detailed procedures, see Health Checks.
Test |
Command |
Pass/Fail criteria |
|---|---|---|
|
Pass: OS version listed in compatibility matrix |
|
|
Pass: Contains |
|
|
Pass: Null |
|
|
Pass: ≥ 256G |
|
|
Pass: 4 MI100 GPUs found (per hive) |
|
|
Pass: Speed 16GT/s, width |
|
|
Pass: Idle metrics as specified |
|
|
Pass: Null |
System validation#
Comprehensive validation ensures system stability under load. For detailed procedures, see System Validation.
Test |
Command |
Pass/Fail criteria |
|---|---|---|
|
Pass: All GPUs listed with no errors |
|
|
Pass: |
|
|
Pass: |
|
|
Pass: All tests passed; bandwidth ≥ 800 GB/s per GPU |
|
|
Pass: All distances and bandwidths displayed |
|
|
Pass: All actions true |
|
|
Pass: |
Note
The reference configuration for this document is a single 4-GPU MI100 hive with AMD Infinity Fabric™ bridges installed, so intra-hive PBQT and TransferBench numbers reflect XGMI throughput. On systems without bridges, P2P traffic traverses the host PCIe fabric and these thresholds will not be met.
Performance benchmarks#
Performance validation ensures the system meets MI100 specifications. For detailed procedures, see Performance Benchmarking.
TransferBench a2aPass: ≥ 270 GB/s aggregate
TransferBench p2pTest |
Pass criteria |
|---|---|
UniDir |
≥ 30 GB/s |
BiDir |
≥ 57 GB/s |
build/all_reduce_perf -b 8 -e 8G -f 2 -g 4Pass: ≥ 72 GB/s busbw (peak, at 8 GiB message size)
rocblas-bench (see code block below)rocblas-bench -f gemm \
-r s -m 4000 -n 4000 -k 4000 \
--lda 4000 --ldb 4000 --ldc 4000 \
--transposeA N --transposeB T
Pass: ≥ 28 TFLOPS per GPU
mpiexec -n 4 wrapper.shKernel |
Threshold (MB/s) |
|---|---|
Copy |
≥ 940,000 |
Mul |
≥ 940,000 |
Add |
≥ 910,000 |
Triad |
≥ 910,000 |
Dot |
≥ 950,000 |