AMD SMI CLI Tool - Usage Guide#
Overview#
AMD SMI tool is a command line utility that utilizes AMD SMI Library APIs to monitor and configure AMD GPUs on Linux host systems. The tool is used to monitor AMD GPUs status in virtualization environments, providing comprehensive GPU management capabilities for host administrators. The tool outputs GPU/driver information in plain text, in JSON, or in CSV formats while it can also show the info in the console or save to the specified output file.
Return Codes#
The AMD SMI CLI tool uses specific return codes to indicate the status of command execution:
Return Code |
Description |
|---|---|
0 |
Success. Does not display a message |
-1 |
Invalid Command “Command [command_user_wrote] is invalid. Run ‘help’ for more info.” |
-2 |
Invalid Parameter “Parameter [command_user_wrote] is invalid. Run ‘help’ for more info.” |
-3 |
Device Not Found “Device [index_from_list|BDF|UUID inputted by user] cannot be found on the system. Run ‘help’ for more info.” |
-4 |
Invalid File Path “Path [path_user_wrote] cannot be found.” |
-5 |
Invalid Parameter Value “Value [value_user_wrote] is not of valid type or format. Run ‘help’ for more info.” |
-6 |
Missing Parameter Value “Parameter [parameter_which_requires_a_value] requires a value. Run ‘help’ for more info.” |
-7 |
Command Not Supported “Command [command_user_wrote] is not supported on the system. Run ‘help’ for more info.” |
-8 |
Parameter Not Supported “Parameter [parameter_user_wrote] is not supported on the system. Run ‘help’ for more info.” |
-9 |
Required command “Command [command_user_wrote] requires a target argument. Run ‘help’ for more info.” |
-10 |
Invalid subcommand “Command [command_user_wrote] is invalid. Must receive valid AMD-SMI Command first. Run ‘help’ for more info.” |
-11 |
Permission Denied “Permission denied. This action requires elevated privileges.” |
-100 |
Unknown Error “An unknown error has occurred. Run ‘help’ for more info.” |
Library Error Codes: (-1014) – (-1001) – SMI-LIB Error “SMI-LIB has returned error [smi_lib_error_code] - [smi_lib_error_code_string]”
Commands#
Commands take arguments that help to specify the type of information to be displayed. Note that some commands such as help, list and version do not have arguments. The commands and respective arguments that they accept are described as follows:
help Display information about the tool.
version Display information about current version of the library and the tool.
list (discovery) Lists all GPUs and VFs on the system and their most basic general information.
static Gets static information about the specified GPU or VF. If no target is specified, returns information for all GPUs on the system.
GPU Parameters:
--gpu=<gpu_index from list, gpu_bdf, gpu_uuid>: Parameters for a specific GPU--asic: All asic information.--bus: All bus information.--vbios: All video bios information (if available).--board: All board information.--limit: All limit metric values (i.e. power and thermal limits).--driver: Displays driver version.--ras: Displays ras features information.--dfc-ucode: All dfc ucode table information.--fb-info: All fb information.--num-vf: Displays number of supported and enabled VFs.--vram: All vram information.--cache: All cache info.--partition: All partition information.--ifwi: All IFWI/video bios information.--numa: All NUMA information.
VF Parameters:
--vf=<gpu_index:vf_index from list, vf_bdf, vf_uuid>: Gets general information about the specified VF (e.g. timeslice, fb info)
firmware (ucode) Gets firmware information about the specified GPU or VF. If no target is specified, returns information for all GPUs on the system.
GPU Parameters:
--gpu=<gpu_index from list, gpu_bdf, gpu_uuid>: Parameters for a specific GPU--fw-list: All firmware list information.--error-records: All error records information.
VF Parameters:
--vf=<gpu_index:vf_index from list, vf_bdf, vf_uuid>: Parameters for a specific VF--fw-list: All firmware list information.
bad-pages
--gpu=<gpu_index from list, gpu_bdf, gpu_uuid>: Gets bad page information about the specified GPU. If no argument is provided, returns information for all GPUs on the system.
metric Gets metric information about the specified GPU or VF. If no target is specified, returns information for all GPUs on the system.
GPU Parameters:
--gpu=<gpu_index from list, gpu_bdf, gpu_uuid>: Parameters for a specific GPU--usage: All usage information.--power: All power readings information.--clock: All frequency sensor readings.--temperature: All thermal sensor readings.--ecc: All ecc information.--ecc-block: Number of ECC errors per block.--pcie: Current pcie information.--energy: Amount of energy consumed.
VF Parameters:
--vf=<gpu_index:vf_index from list, vf_bdf, vf_uuid>: Parameters for a specific VF--schedule: All scheduling info.--guard: All guard information.--guest-data: All guest data information.--per-partition: Per-partition metrics information.
Note: When using the
--csvformat modifier with the metric command, only one argument is supported per command (e.g., metric –usage). For all other formats (plain text and json), multiple arguments are supported. The per-partition command does not support the--csvformat modifier.event
--gpu=<gpu_index from list, gpu_bdf, gpu_uuid>: Displays event information for GPU. If no argument is provided, returns event information for all GPUs on the system.
Note: The watch, watch_time and iterations modifiers are not supported for the event command.
topology
--gpu= <gpu_index from list, gpu_bdf, gpu_uuid>: Displays link topology information. If no argument is provided, returns information for all GPUs on the system.
Topology arguments for the GPU are the following:
--weight: Current weight information.--hops: Current hops information.--fb-sharing: Current framebuffer sharing information.--link-type: Link type information.--coherent: Cache coherent information.--atomics: 32 and 64-bit atomic link capability information.--bi-dir: bi-directional link capability information.--dma: dma link capability information.
--link-status: Link status information.
Note: The topology command does not support the
--csvformat modifier.xgmi
--gpu= <gpu_index from list, gpu_bdf, gpu_uuid>: Displays XGMI capabilities, framebuffer sharing and metric information. If no argument is provided, returns information for all GPUs on the system.
XGMI arguments for the GPU are the following:
--caps: XGMI capabilities.--fb-sharing: Framebuffer sharing for each mode.--metric: Metric XGMI information.--source-status: Port status information.--link-status: Link status information.
Note: The xgmi command does not support the
--csvformat modifier.reset Reset or cleanup operations for GPUs and VFs. Available only in plain text. If no target is provided, returns tool exception.
GPU Parameters:
--gpu=<gpu_index from list, gpu_bdf, gpu_uuid>: Parameters for a specific GPU--gpureset: Reset GPU.
VF Parameters:
--vf=<gpu_index:vf_index from list, vf_bdf, vf_uuid>: Parameters for a specific VF (requires SR-IOV)--vf-fb: Cleanup VF FB for the specified VF.
set
--gpu=<gpu_index from list, gpu_bdf, gpu_uuid>: Set options for devices. If no argument is provided, returns tool exception. Available only in plain text.
Set arguments for the GPU are the following:
--memory-partition=<AmdSmiMemoryPartitionSetting>: Sets memory partition mode. Runamd-smi partitionto list memory-partition modes supported on current platform.--accelerator-partition=<profile_index>: Sets accelerator partition mode to a mode based on profile_index from partition command. Runamd-smi partitionto list accelerator partition modes supported on current platform.--power-cap=<power_cap_value>: Sets the power cap to the provided cap value. Note: Cap value must be between the minimum (min_power_cap) and maximum (max_power_cap) power cap values which can be retrieved fromamd-smi static --limitcommand.--xgmi-plpd=<policy>: Sets XGMI Per-Link Power Down (PLPD) to enabled or disabled.--num-vf=<number_of_vfs>: Sets the number of Virtual Functions (VFs) to be enabled on the specified GPU. The number must be within the supported range for the GPU. Useamd-smi static --gpu=<gpu> --num-vfto check current VF configuration and supported limits.--soc-pstate=<pstate_level>: Sets the SOC (System on Chip) performance state level to control power and performance characteristics.--xgmi --fb-sharing-mode=<AmdSmiXgmiFbSharingMode>: Sets framebuffer sharing mode from list [“MODE_1”, “MODE_2”, “MODE_4”, “MODE_8”] where, MODE_X represents that X GPUs will be in the same group, linked together: MODE_1 (one GPU in a group), MODE_2 (two GPUs in a group), MODE_4 (four GPUs in a group), MODE_8 (eight GPUs in a group). All possible configurations can be seen by running theamd-smi xgmicommand, not all of them are supported on all systems.
monitor
--gpu=<gpu_index from list, gpu_bdf, gpu_uuid>: Monitor a target device for the specified arguments. If no arguments are provided, all arguments will be enabled. Use the watch arguments to run continuously.
Monitor arguments for the GPU are the following:
--gfx: Monitor graphics utilization (%) and clock (MHz).--mem: Monitor memory utilization (%) and clock (MHz).--encoder: Monitor encoder utilization (%) and clock (MHz).--ecc: Monitor ECC single bit, ECC double bit.--pcie: Monitor PCIe bandwidth in Mb/s and PCIe replay error count.--power-usage: Monitor power usage in Watts.--temperature: Monitor temperature in Celsius.--decoder: Monitor decoder utilization (%) and clock (MHz).
partition
--gpu= <gpu_index from list, gpu_bdf, gpu_uuid>: Displays capabilities and current information for memory and accelerator partition. If no argument is provided, returns information for all GPUs on the system.
Partition arguments for the GPU are the following:
--current: Current memory and accelerator partition information.--memory: Memory partition information.--accelerator: Accelerator partition information.--global: Global partition configuration settings.
Note: The partition command does not support the
--csvor--jsonformat modifiers.ras Retrieves RAS (Reliability, Availability, Serviceability) error information. (MI300 host systems only, human-readable output only)
RAS arguments (mutually exclusive):
--cper --severity=<fatal|nonfatal-uncorrected|nonfatal-corrected|all> [--folder=FOLDER] [--file-limit=NUMBER] [--follow]: Get CPER (Common Platform Error Record) entries based on severity level. Supports GPU filtering with--gpu. Optional folder saves error files. File limit controls maximum saved files. Follow enables continuous monitoring.--afid --cper-file=FILE: Extract AFID (AMD Field ID) list from existing CPER file. GPU filtering not supported.
Basic Usage#
Command Syntax#
sudo amd-smi <command> <options>
<command>is the primary command to execute. It must be the first argument after amd-smi.<options>can include subcommands, modifiers, or other arguments relevant to the specified command.
Getting Help#
To get detailed information about the available commands and options, you can run help command. The help command provides a comprehensive overview of the tool’s functionalities and usage instructions. Simply run tool without arguments or with command help.
$ sudo amd-smi help
Copyright 2023-2025 Advanced Micro Devices, Inc. All rights reserved.
usage: amd-smi help
AMD System Management Interface | AMD SMI tool version 29.0.0
AMD-SMI Commands:
Descriptions:
version Display version information
list List GPU information
static Gets static information about the specified GPU
metric Gets metric information about the specified GPU
monitor Monitor metrics for target devices
bad-pages Gets bad page information about the specified GPU
event Displays event information for the given GPU
firmware Gets firmware information about the specified GPU
set Set options for devices
reset Reset options for devices
xgmi Displays xgmi information of the devices
topology Displays topology information of the devices
partition Displays partition information of the devices
ras Displays ras information of the devices
From help message you can see which subcommands are supported on the system and a short description for each command.
To access the help documentation for a specific command, simply use that command name followed by the help command. For example, if you want to get help for “list” command you can use the tool the following way.
$ sudo amd-smi list --help
Copyright 2023-2025 Advanced Micro Devices, Inc. All rights reserved.
usage: amd-smi list [-h | --help] [--json | --csv] [--file FILE] [-g | --gpu [GPU ...]]
List all GPUs and VFs on the system and their most basic general information.
If no GPU is specified, returns basic information for all GPUs on the system.
List arguments:
Description:
-h, --help show this help message and exit
-g, --gpu [GPU ...] Select a GPU ID, BDF or UUID, if not selected it will return for all GPUs
Command Modifiers:
Description:
--json Displays output in JSON format (human readable by default).
--csv Displays output in CSV format (human readable by default).
--file FILE Saves output into a file on the provided path (stdout by default).
Output Formats#
The AMD SMI CLI tool supports three output formats:
Human-readable (Default)#
$ sudo amd-smi list
GPU: 0
BDF: 0000:0c:00.0
UUID: 67ff74a1-0000-1000-8081-b5b9fd6edd00
VF: 0
BDF: 0000:0c:02.0
UUID: 670074a1-0000-1000-8081-b5b9fd6edd00
JSON Format#
$ sudo amd-smi list --json
[
{
"gpu": 0,
"bdf": "0000:0c:00.0",
"uuid": "67ff74a1-0000-1000-8081-b5b9fd6edd00",
"vfs": [
{
"vf": 0,
"bdf": "0000:0c:02.0",
"uuid": "670074a1-0000-1000-8081-b5b9fd6edd00"
}
]
}
]
CSV Format#
$ sudo amd-smi list --csv
gpu,gpu_bdf,gpu_uuid,vf,vf_bdf,vf_uuid
0,0000:0c:00.0,67ff74a1-0000-1000-8081-b5b9fd6edd00,0,0000:0c:02.0,670074a1-0000-1000-8081-b5b9fd6edd00
Saving Output to File#
All outputs can be saved to a file using the --file parameter:
sudo amd-smi list --file=output.txt
Command Examples with Sample Outputs#
1. Version Information#
$ sudo amd-smi version
Output:
VERSION:
TOOL_NAME: AMD SMI tool
TOOL_VERSION: 29.0.0
LIB_VERSION: 35.0.0
2. Static Information#
Get all static information for GPU 0:
$ sudo amd-smi static --gpu=0
Output:
GPU: 0
ASIC:
MARKET_NAME: MI300X
VENDOR_ID: 0x1002
VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
SUBVENDOR_ID: 0x1002
DEVICE_ID: 0x74A1
SUBSYSTEM_ID: 0x74A1
REV_ID: 0x0
ASIC_SERIAL: 0xF33397508B72EAAF
OAM_ID: 2
NUM_OF_COMPUTE_UNITS: 304
BUS:
BDF: 0000:0c:00.0
MAX_PCIE_WIDTH: 16
MAX_PCIE_SPEED: 32 GT/s
PCIE_INTERFACE_VERSION: Gen 5
SLOT_TYPE: OAM
MAX_PCIE_INTERFACE_VERSION: Gen 5
VBIOS:
NAME: AMD MI300X_PRODUCTION_1VF
BUILD_DATE: 2024/03/15 14:30
PART_NUMBER: 113-MI3PRD-001
VERSION: 022.040.003.036.000001
BOARD:
MODEL_NUMBER: 102-G30201-0B
PRODUCT_SERIAL: PCB068560-0020
FRU_ID: 113-AMDG302010B14
PRODUCT_NAME: Instinct MI300X
MANUFACTURER_NAME: AMD
LIMIT:
MAX_POWER: 750 W
MIN_POWER: 100 W
SOCKET_POWER: 750 W
SLOWDOWN_EDGE_TEMPERATURE: N/A C
SLOWDOWN_HOTSPOT_TEMPERATURE: 100 C
SLOWDOWN_MEM_TEMPERATURE: 95 C
SHUTDOWN_EDGE_TEMPERATURE: N/A C
SHUTDOWN_HOTSPOT_TEMPERATURE: 110 C
SHUTDOWN_MEM_TEMPERATURE: 105 C
VRAM:
TYPE: HBM3
VENDOR: HYNIX
SIZE: 196592 MB
BIT_WIDTH: 8192
MAX_BANDWIDTH: 5300 GB/s
Get specific static information:
$ sudo amd-smi static --gpu=0 --asic --limit
Output
GPU: 0
ASIC:
MARKET_NAME: AMD Instinct MI350X
VENDOR_ID: 0x1002
VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
SUBVENDOR_ID: 0x1002
DEVICE_ID: 0x75A0
SUBSYSTEM_ID: 0x75A0
REV_ID: 0x0
ASIC_SERIAL: 0xBA6524093004B8B2
OAM_ID: 0
NUM_OF_COMPUTE_UNITS: 256
LIMIT:
MAX_POWER: 1000 W
MIN_POWER: 0 W
SOCKET_POWER: 1000 W
SLOWDOWN_EDGE_TEMPERATURE: N/A
SLOWDOWN_HOTSPOT_TEMPERATURE: 100 C
SLOWDOWN_MEM_TEMPERATURE: 115 C
SHUTDOWN_EDGE_TEMPERATURE: N/A
SHUTDOWN_HOTSPOT_TEMPERATURE: 110 C
SHUTDOWN_MEM_TEMPERATURE: 120 C
3. Metric Information#
Get usage metrics:
$ sudo amd-smi metric --gpu=0 --usage
Output:
GPU: 0
USAGE:
GFX_ACTIVITY: 45 %
UMC_ACTIVITY: 12 %
MM_ACTIVITY: 3 %
VCN_ACTIVITY: [ 0 %, 2 %, 0 %, 1 % ]
JPEG_ACTIVITY: [ 0 %, 0 %, 1 %, 0 %, 0 %, 0 %, 0 %, 0 % ]
Get power metrics:
$ sudo amd-smi metric --gpu=0 --power
Output:
GPU: 0
POWER:
SOCKET_POWER: 320 W
GFX_VOLTAGE: 875 mV
SOC_VOLTAGE: 950 mV
MEM_VOLTAGE: 1250 mV
POWER_MANAGEMENT: ENABLED
Get temperature metrics:
$ sudo amd-smi metric --gpu=0 --temperature
Output:
GPU: 0
TEMPERATURE:
EDGE: 65 C
HOTSPOT: 75 C
MEM: 68 C
Get clock information:
$ sudo amd-smi metric --gpu=0 --clock
Output:
GPU: 0
CLOCK:
GFX:
CLK: 1800 MHz
MIN_CLK: 500 MHz
MAX_CLK: 2100 MHz
CLK_LOCKED: DISABLED
DEEP_SLEEP: DISABLED
MEM:
CLK: 1600 MHz
MIN_CLK: 400 MHz
MAX_CLK: 1600 MHz
CLK_LOCKED: DISABLED
DEEP_SLEEP: DISABLED
4. Firmware Information#
$ sudo amd-smi firmware --gpu=0
Output:
GPU: 0
FW_LIST:
FW_0:
FW_ID: SMU
FW_VERSION: 0.85.117.1
FW_1:
FW_ID: CP_MEC_JT1
FW_VERSION: 0x80b8
FW_2:
FW_ID: CP_MEC1
FW_VERSION: 0x80b8
FW_3:
FW_ID: RLC
FW_VERSION: 0x45
FW_4:
FW_ID: SDMA0
FW_VERSION: 0x18
FW_5:
FW_ID: SDMA1
FW_VERSION: 0x18
FW_6:
FW_ID: SDMA2
FW_VERSION: 0x18
FW_7:
FW_ID: SDMA3
FW_VERSION: 0x18
FW_8:
FW_ID: RLC_V
FW_VERSION: 0x1a
FW_9:
FW_ID: MMSCH
FW_VERSION: 8.0.19
FW_10:
FW_ID: PSP_SYSDRV
FW_VERSION: 0.36.2.5a
FW_11:
FW_ID: PSP_SOSDRV
FW_VERSION: 0.36.2.5a
FW_12:
FW_ID: PSP_KEYDB
FW_VERSION: 5.0.36.0
FW_13:
FW_ID: DFC
FW_VERSION: 0.1.0.1
FW_14:
FW_ID: PSP_BL
FW_VERSION: 0.a1.2.1e
FW_15:
FW_ID: REG_ACCESS_WHITELIST
FW_VERSION: c.2.36.0
FW_16:
FW_ID: P2S_TABLE
FW_VERSION: 0x50101
FW_17:
FW_ID: PSP_SOC
FW_VERSION: 0.36.2.5a
FW_18:
FW_ID: PSP_DBG
FW_VERSION: 0.36.2.5a
FW_19:
FW_ID: PSP_INTF
FW_VERSION: 0.36.2.5a
FW_20:
FW_ID: PSP_RAS
FW_VERSION: 0.36.2.5a
ERROR_RECORDS:
5. Bad Pages Information#
$ sudo amd-smi bad-pages --gpu=0
Output:
GPU: 0
BAD_PAGE_1:
RETIRED_BAD_PAGE: 0x7FFF12345000
TIMESTAMP: 01/10/2025:08/41/33
MEM_CHANNEL: 2
MCUMC_ID: 1
BAD_PAGE_2:
RETIRED_BAD_PAGE: 0x7FFF12346000
TIMESTAMP: 03/10/2025:06/11/13
MEM_CHANNEL: 3
MCUMC_ID: 1
6. Event Information#
$ sudo amd-smi event --gpu=0
Output:
EVENT_INFO:
GPU: 0
MESSAGE: Temperature threshold exceeded
CATEGORY: THERMAL
DATE: 2025-10-08:11:23:07.505
GPU: 0
MESSAGE: ECC single bit error corrected
CATEGORY: ECC
DATE: 2025-10-09:10:34:25.237
7. Topology Information#
$ sudo amd-smi topology --weight
Output:
WEIGHT_TABLE:
0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
0000:0c:00.0 0 15 15 15 15 15 15 15
0000:22:00.0 15 0 15 15 15 15 15 15
0000:38:00.0 15 15 0 15 15 15 15 15
0000:5c:00.0 15 15 15 0 15 15 15 15
0000:9f:00.0 15 15 15 15 0 15 15 15
0000:af:00.0 15 15 15 15 15 0 15 15
0000:bf:00.0 15 15 15 15 15 15 0 15
0000:df:00.0 15 15 15 15 15 15 15 0
8. XGMI Information#
$ sudo amd-smi xgmi --caps
Output:
XGMI_CONFIGURATION_SUPPORT_CAPABILITY:
MODE_1 MODE_2 MODE_4 MODE_8 MODE_CUSTOM
0000:0c:00.0 SUPPORTED SUPPORTED SUPPORTED SUPPORTED SUPPORTED
0000:22:00.0 SUPPORTED SUPPORTED SUPPORTED SUPPORTED SUPPORTED
0000:38:00.0 SUPPORTED SUPPORTED SUPPORTED SUPPORTED SUPPORTED
0000:5c:00.0 SUPPORTED SUPPORTED SUPPORTED SUPPORTED SUPPORTED
0000:9f:00.0 SUPPORTED SUPPORTED SUPPORTED SUPPORTED SUPPORTED
0000:af:00.0 SUPPORTED SUPPORTED SUPPORTED SUPPORTED SUPPORTED
0000:bf:00.0 SUPPORTED SUPPORTED SUPPORTED SUPPORTED SUPPORTED
0000:df:00.0 SUPPORTED SUPPORTED SUPPORTED SUPPORTED SUPPORTED
9. Partition Information#
$ sudo amd-smi partition --gpu=0 --current
Output:
GPU: 0
PARTITION:
ACCELERATOR_PARTITION: SPX
MEMORY_PARTITION: NPS1
PARTITION_ID: 0
Get per-partition metrics:
$ sudo amd-smi metric --vf=0:0 --per-partition
Output:
GPU: 0
VF: 0
PER_PARTITION:
AID_0:
CLK_VCLK: 29 MHz
CLK_VCLK_MIN_LIMIT: 914 MHz
CLK_VCLK_MAX_LIMIT: 1333 MHz
CLK_DCLK_LIMIT: 22 MHz
CLK_DCLK_MIN_LIMIT: 711 MHz
CLK_DCLK_MAX_LIMIT: 1142 MHz
CLK_SCLK_LIMIT: 28 MHz
CLK_SCLK_MIN_LIMIT: 888 MHz
CLK_SCLK_MAX_LIMIT: 1142 MHz
VCN_ACTIVITY: 0 %
JPEG_ACTIVITY: [0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %]
AID_1:
CLK_VCLK: 29 MHz
CLK_VCLK_MIN_LIMIT: 914 MHz
CLK_VCLK_MAX_LIMIT: 1333 MHz
CLK_DCLK_LIMIT: 22 MHz
CLK_DCLK_MIN_LIMIT: 711 MHz
CLK_DCLK_MAX_LIMIT: 1142 MHz
CLK_SCLK_LIMIT: 28 MHz
CLK_SCLK_MIN_LIMIT: 888 MHz
CLK_SCLK_MAX_LIMIT: 1142 MHz
VCN_ACTIVITY: 0 %
JPEG_ACTIVITY: [0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %]
AID_2:
CLK_VCLK: 29 MHz
CLK_VCLK_MIN_LIMIT: 914 MHz
CLK_VCLK_MAX_LIMIT: 1333 MHz
CLK_DCLK_LIMIT: 22 MHz
CLK_DCLK_MIN_LIMIT: 711 MHz
CLK_DCLK_MAX_LIMIT: 1142 MHz
CLK_SCLK_LIMIT: 28 MHz
CLK_SCLK_MIN_LIMIT: 888 MHz
CLK_SCLK_MAX_LIMIT: 1142 MHz
VCN_ACTIVITY: 0 %
JPEG_ACTIVITY: [0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %]
AID_3:
CLK_VCLK: 29 MHz
CLK_VCLK_MIN_LIMIT: 914 MHz
CLK_VCLK_MAX_LIMIT: 1333 MHz
CLK_DCLK_LIMIT: 22 MHz
CLK_DCLK_MIN_LIMIT: 711 MHz
CLK_DCLK_MAX_LIMIT: 1142 MHz
CLK_SCLK_LIMIT: 28 MHz
CLK_SCLK_MIN_LIMIT: 888 MHz
CLK_SCLK_MAX_LIMIT: 1142 MHz
VCN_ACTIVITY: 0 %
JPEG_ACTIVITY: [0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %]
XCP_0:
GFX_CLK: [132 MHz]
GFX_MIN_CLK: [500 MHz]
GFX_MAX_CLK: [2100 MHz]
GFX_CLK_LOCKED: [DISABLED]
GFX_USAGE: [0 %]
XCP_1:
GFX_CLK: [132 MHz]
GFX_MIN_CLK: [500 MHz]
GFX_MAX_CLK: [2100 MHz]
GFX_CLK_LOCKED: [DISABLED]
GFX_USAGE: [0 %]
XCP_2:
GFX_CLK: [132 MHz]
GFX_MIN_CLK: [500 MHz]
GFX_MAX_CLK: [2100 MHz]
GFX_CLK_LOCKED: [DISABLED]
GFX_USAGE: [0 %]
XCP_3:
GFX_CLK: [132 MHz]
GFX_MIN_CLK: [500 MHz]
GFX_MAX_CLK: [2100 MHz]
GFX_CLK_LOCKED: [DISABLED]
GFX_USAGE: [0 %]
XCP_4:
GFX_CLK: [132 MHz]
GFX_MIN_CLK: [500 MHz]
GFX_MAX_CLK: [2100 MHz]
GFX_CLK_LOCKED: [DISABLED]
GFX_USAGE: [0 %]
XCP_5:
GFX_CLK: [132 MHz]
GFX_MIN_CLK: [500 MHz]
GFX_MAX_CLK: [2100 MHz]
GFX_CLK_LOCKED: [DISABLED]
GFX_USAGE: [0 %]
XCP_6:
GFX_CLK: [132 MHz]
GFX_MIN_CLK: [500 MHz]
GFX_MAX_CLK: [2100 MHz]
GFX_CLK_LOCKED: [DISABLED]
GFX_USAGE: [0 %]
XCP_7:
GFX_CLK: [132 MHz]
GFX_MIN_CLK: [500 MHz]
GFX_MAX_CLK: [2100 MHz]
GFX_CLK_LOCKED: [DISABLED]
GFX_USAGE: [0 %]
Get global partition configuration:
$ sudo amd-smi partition --global
Output:
GLOBAL_PARTITION_CONFIG:
GPU ACCELERATOR_TYPE SUPPORTED_VF_MODE MEMORY_PARTITION_CAPS
0 SPX 1 NPS1
DPX 2 NPS1
QPX 4 NPS1,NPS4
CPX 1,2,4,8 NPS1,NPS4
1 SPX 1 NPS1
DPX 2 NPS1
QPX 4 NPS1,NPS4
CPX 1,2,4,8 NPS1,NPS4
2 SPX 1 NPS1
DPX 2 NPS1
QPX 4 NPS1,NPS4
CPX 1,2,4,8 NPS1,NPS4
3 SPX 1 NPS1
DPX 2 NPS1
QPX 4 NPS1,NPS4
CPX 1,2,4,8 NPS1,NPS4
4 SPX 1 NPS1
DPX 2 NPS1
QPX 4 NPS1,NPS4
CPX 1,2,4,8 NPS1,NPS4
5 SPX 1 NPS1
DPX 2 NPS1
QPX 4 NPS1,NPS4
CPX 1,2,4,8 NPS1,NPS4
6 SPX 1 NPS1
DPX 2 NPS1
QPX 4 NPS1,NPS4
CPX 1,2,4,8 NPS1,NPS4
7 SPX 1 NPS1
DPX 2 NPS1
QPX 4 NPS1,NPS4
CPX 1,2,4,8 NPS1,NPS4
10. Monitor Command (Continuous Monitoring)#
$ sudo amd-smi monitor
Output (updates every second):
GPU POWER HOTSPOT_TEMP MEM_TEMP GFX_UTIL GFX_CLOCK MEM_UTIL MEM_CLOCK ENC_UTIL VCLK DEC_UTIL DCLK CORRECTABLE_ECC UNCORRECTABLE_ECC PCIE_REPLAY PCIE_BW
0 156 W 38 C 33 C 0 % 132 MHz 0 % 900 MHz 0 % 29 MHz 0 % 22 MHz 0 0 0 18 Mb/s
1 153 W 38 C 31 C 0 % 132 MHz 0 % 900 MHz 0 % 29 MHz 0 % 22 MHz 0 0 0 44 Mb/s
2 149 W 35 C 30 C 0 % 132 MHz 0 % 900 MHz 0 % 29 MHz 0 % 22 MHz 0 0 0 18 Mb/s
3 140 W 36 C 31 C 0 % 132 MHz 0 % 900 MHz 0 % 29 MHz 0 % 22 MHz 0 0 0 18 Mb/s
4 149 W 33 C 31 C 0 % 132 MHz 0 % 900 MHz 0 % 29 MHz 0 % 22 MHz 0 0 0 18 Mb/s
5 151 W 39 C 33 C 0 % 132 MHz 0 % 900 MHz 0 % 29 MHz 0 % 22 MHz 0 0 0 41 Mb/s
6 140 W 35 C 31 C 0 % 132 MHz 0 % 900 MHz 0 % 29 MHz 0 % 22 MHz 0 0 0 18 Mb/s
7 140 W 38 C 33 C 0 % 132 MHz 0 % 900 MHz 0 % 29 MHz 0 % 22 MHz 0 0 0 18 Mb/s
11. Set Commands#
Set power cap:
$ sudo amd-smi set --gpu=0 --power-cap=600
Output:
GPU: 0
POWER_CAP: Successfully set power cap to 600 W
Set memory partition:
$ sudo amd-smi set --memory-partition=NPS2
Output
Setting memory-partition in progress. This may take a while...
GPU: 0
MEMORY_PARTITION: Successfully set memory partition to NPS2
GPU: 1
MEMORY_PARTITION: Successfully set memory partition to NPS2
GPU: 2
MEMORY_PARTITION: Successfully set memory partition to NPS2
GPU: 3
MEMORY_PARTITION: Successfully set memory partition to NPS2
GPU: 4
MEMORY_PARTITION: Successfully set memory partition to NPS2
GPU: 5
MEMORY_PARTITION: Successfully set memory partition to NPS2
GPU: 6
MEMORY_PARTITION: Successfully set memory partition to NPS2
GPU: 7
MEMORY_PARTITION: Successfully set memory partition to NPS2
Set number of VFs enabled:
$ sudo amd-smi set --num-vf=1
Output
GPU: 0
NUM_VF_ENABLED: Successfully set enabled VFs to 1
GPU: 1
NUM_VF_ENABLED: Successfully set enabled VFs to 1
GPU: 2
NUM_VF_ENABLED: Successfully set enabled VFs to 1
GPU: 3
NUM_VF_ENABLED: Successfully set enabled VFs to 1
GPU: 4
NUM_VF_ENABLED: Successfully set enabled VFs to 1
GPU: 5
NUM_VF_ENABLED: Successfully set enabled VFs to 1
GPU: 6
NUM_VF_ENABLED: Successfully set enabled VFs to 1
GPU: 7
NUM_VF_ENABLED: Successfully set enabled VFs to 1
Set XGMI Per-Link Down Policy
$ sudo amd-smi set --xgmi-plpd=0
Output
GPU: 0
DPM_POLICY: Successfully set xgmi per-link power down policy to 0
GPU: 1
DPM_POLICY: Successfully set xgmi per-link power down policy to 0
GPU: 2
DPM_POLICY: Successfully set xgmi per-link power down policy to 0
GPU: 3
DPM_POLICY: Successfully set xgmi per-link power down policy to 0
GPU: 4
DPM_POLICY: Successfully set xgmi per-link power down policy to 0
GPU: 5
DPM_POLICY: Successfully set xgmi per-link power down policy to 0
GPU: 6
DPM_POLICY: Successfully set xgmi per-link power down policy to 0
GPU: 7
DPM_POLICY: Successfully set xgmi per-link power down policy to 0
Set SOC Pstate
$ sudo amd-smi set --soc-pstate=0
Output
GPU: 0
SOC_PSTATE: Successfully set dpm soc pstate policy to 0
GPU: 1
SOC_PSTATE: Successfully set dpm soc pstate policy to 0
GPU: 2
SOC_PSTATE: Successfully set dpm soc pstate policy to 0
GPU: 3
SOC_PSTATE: Successfully set dpm soc pstate policy to 0
GPU: 4
SOC_PSTATE: Successfully set dpm soc pstate policy to 0
GPU: 5
SOC_PSTATE: Successfully set dpm soc pstate policy to 0
GPU: 6
SOC_PSTATE: Successfully set dpm soc pstate policy to 0
GPU: 7
SOC_PSTATE: Successfully set dpm soc pstate policy to 0
Set XGMI FB Sharing Mode
$ sudo amd-smi set --xgmi --fb-sharing-mode=MODE_1
Output
XGMI FB_SHARING_MODE: Successfully set mode to MODE_1 for the given group/s
12. Reset Commands#
Reset GPU:
$ sudo amd-smi reset --gpureset
Output
GPU: 0
GPU_RESET: Successfully reset GPU
GPU: 1
GPU_RESET: Successfully reset GPU
GPU: 2
GPU_RESET: Successfully reset GPU
GPU: 3
GPU_RESET: Successfully reset GPU
GPU: 4
GPU_RESET: Successfully reset GPU
GPU: 5
GPU_RESET: Successfully reset GPU
GPU: 6
GPU_RESET: Successfully reset GPU
GPU: 7
GPU_RESET: Successfully reset GPU
Clean VF framebuffer:
$ sudo amd-smi reset --vf=0:0 --vf-fb
Output
Successfully reset vf fb for vf with id: 0:0
13. RAS Error Information (MI300 Host Systems Only)#
Get CPER entries with fatal severity:
$ sudo amd-smi ras --cper --severity=fatal --folder=/tmp/ras_logs
Output:
timestamp gpu_id severity file_name list of afids
07/10/2025 09:03:09 0 fatal fatal-1.cper 24
07/10/2025 09:04:15 1 fatal fatal-2.cper 24 29
Get all CPER entries with continuous monitoring:
$ sudo amd-smi ras --cper --severity=all --folder=/tmp/ras_logs --file-limit=10 --follow
Extract AFID from existing CPER file:
$ sudo amd-smi ras --afid --cper-file=/tmp/ras_logs/fatal-2.cper
Output:
24 29
14. JSON and CSV Format Examples#
JSON format for metrics:
$ sudo amd-smi metric --pcie --gpu=0 --json
Output:
[
{
"gpu": 0,
"pcie": {
"width": 16,
"speed": {
"value": 32,
"unit": "GT/s"
},
"bandwidth": {
"value": 18,
"unit": "Mb/s"
},
"replay_count": 0,
"l0_to_recovery_count": 0,
"replay_roll_over_count": 0,
"nak_sent_count": 0,
"nak_received_count": 0
}
}
]
CSV format for usage:
$ sudo amd-smi metric --pcie --gpu=0 --csv
Output:
gpu,pcie_current_width,pcie_current_speed,pcie_current_bandwidth,pcie_replay_count,pcie_l0_to_recovery_count,pcie_replay_roll_over_count,pcie_nak_sent_count,pcie_nak_received_count
0,16,32,18,0,0,0,0,0
Use Case Scenarios#
This section provides practical workflows for common administrative tasks using amd-smi.
Memory Partition Management#
Scenario: Configure memory partitioning modes
# Step 1: Check current memory partition configuration
$ sudo amd-smi partition --gpu=0 --current
# Step 2: List available memory partition modes
$ sudo amd-smi partition --gpu=0 --memory-partition
# Step 3: Set memory partition to NPS2
$ sudo amd-smi set --memory-partition=NPS2
# Step 4: Verify the partition change
$ sudo amd-smi partition --gpu=0 --current
Accelerator Partition Configuration#
Scenario: Set up accelerator partitioning modes
# Step 1: Check available accelerator partition profiles
$ sudo amd-smi partition --gpu=0 --accelerator-partition
# Step 2: View current accelerator partition setting
$ sudo amd-smi partition --gpu=0 --current
# Step 3: Set accelerator partition to profile 2 (example)
$ sudo amd-smi set --accelerator-partition=2
# Step 4: Verify the partition configuration
$ sudo amd-smi partition --gpu=0 --current
XGMI Framebuffer Sharing Setup#
Scenario: Configure framebuffer sharing
# Step 1: Check XGMI capabilities and current configuration
$ sudo amd-smi xgmi --caps
$ sudo amd-smi xgmi --fb-sharing
# Step 2: View topology to understand GPU connections
$ sudo amd-smi topology
# Step 3: Set framebuffer sharing mode for 2-GPU group
$ sudo amd-smi set --xgmi --fb-sharing-mode=MODE_2
# Step 4: Verify the framebuffer sharing configuration
$ sudo amd-smi topology --fb-sharing
System Health Monitoring#
Scenario: Perform comprehensive system health check
# Step 1: Check for bad pages and ECC errors
$ sudo amd-smi bad-pages
$ sudo amd-smi metric --ecc
# Step 2: Monitor temperatures and power consumption
$ sudo amd-smi metric --temperature
$ sudo amd-smi metric --power
# Step 3: Check thermal and power limits
$ sudo amd-smi static --limit
# Step 4: Monitor real-time performance metrics (updates continuously)
$ sudo amd-smi monitor --gpu=0 --temperature --power-usage --ecc --watch=1
Performance Analysis Workflow#
Scenario: Analyze GPU performance and utilization patterns
# Step 1: Get baseline static information
$ sudo amd-smi static --gpu=0 --asic --vram --cache
# Step 2: Monitor GPU utilization over time (updates every second for 60 seconds)
$ sudo amd-smi monitor --gpu=0 --gfx --mem --power-usage --watch=1 --watch-time=60
# Step 3: Check detailed metrics for bottleneck analysis
$ sudo amd-smi metric --gpu=0 --usage --clock --temperature --pcie
# Step 4: Export performance data for analysis
$ sudo amd-smi metric --gpu=0 --usage --clock --power --json > performance_data.json
VF Management in SR-IOV Environment#
Scenario: Manage Virtual Functions for GPU virtualization
# Step 1: List all GPUs and available VFs
$ sudo amd-smi list
# Step 2: Check VF capabilities and current configuration
$ sudo amd-smi static --gpu=0 --num-vf
# Step 3: Configure the number of VFs (if modification needed)
$ sudo amd-smi set --gpu=0 --num-vf=8
# Step 4: Verify VF configuration after change
$ sudo amd-smi static --gpu=0 --num-vf
# Step 5: Monitor VF performance metrics
$ sudo amd-smi metric --vf=0:0 --schedule --guard
Firmware and Driver Validation#
Scenario: Validate firmware versions and driver compatibility
# Step 1: Check current firmware versions
$ sudo amd-smi firmware --gpu=0 --fw-list
# Step 2: Verify driver version and compatibility
$ sudo amd-smi static --gpu=0 --driver
# Step 3: Check for firmware error records
$ sudo amd-smi firmware --gpu=0 --error-records
RAS Error Monitoring (MI300 Host Systems)#
Scenario: Monitor and analyze hardware reliability errors
# Step 1: Check for critical fatal errors and save to files
$ sudo amd-smi ras --cper --severity=fatal --folder=/var/log/gpu_errors --file-limit=50
# Step 2: Monitor all error types with continuous monitoring
$ sudo amd-smi ras --cper --severity=all --folder=/var/log/gpu_errors --follow --file-limit=100
# Step 3: Analyze existing error files to extract firmware component IDs
$ sudo amd-smi ras --afid --cper-file=/var/log/gpu_errors/fatal-1.cper
# Step 4: Monitor specific GPU for non-fatal corrected errors
$ sudo amd-smi ras --gpu=0 --cper --severity=nonfatal-corrected --folder=/var/log/gpu0_errors