AMD SMI Python API reference#
Python interface – Consists of Python function declarations, which directly call the C interface. The client can use the Python interface to build applications in Python.
Requirements#
python 3.10+ 64-bit
Overview#
Folder structure#
File Name |
Note |
|---|---|
|
Python package initialization file |
|
Amdsmi library python interface |
|
Python wrapper around amdsmi binary |
|
Amdsmi exceptions python file |
|
Documentation |
Build steps#
Navigate to project’s root folder and run Makefile command:
make package
Build process will create a folder build/amdsmi/package/BUILD_MODE/amdsmi, where BUILD_MODE can be Release or Debug.
The folder will contain the following files:
__init__.pyamdsmi_interface.pyamdsmi_wrapper.pyamdsmi_exception.pylibamdsmi.soREADME.md
Amdsmi usage#
Generated amdsmi folder should be copied and placed next to importing script. It should be imported as:
from amdsmi import *
try:
amdsmi_init()
# amdsmi calls ...
except AmdSmiException as e:
print(e)
finally:
try:
amdsmi_shut_down()
except AmdSmiException as e:
print(e)
To initialize amdsmi lib, amdsmi_init() must be called before all other calls to amdsmi lib.
To close connection to driver, amdsmi_shut_down() must be the last call.
Amdsmi Exceptions#
All exceptions are in amdsmi_exception.py file.
Exceptions that can be thrown are:
AmdSmiException: base smi exception classAmdSmiLibraryException: derives baseAmdSmiExceptionclass and represents errors that can occur in smi-lib. When this exception is thrown,err_codeanderr_infoare set.err_codeis an integer that corresponds to errors that can occur in smi-lib anderr_infois a string that explains the error that occurred. Example:
try:
amdsmi_init()
processors = amdsmi_get_processor_handles()
num_of_GPUs = len(processors)
if num_of_GPUs == 0:
print("No GPUs on machine")
except AmdSmiException as e:
print("Error code: {}".format(e.err_code))
if e.err_code == AmdSmiRetCode.AMDSMI_STATUS_RETRY:
print("Error info: {}".format(e.err_info))
finally:
try:
amdsmi_shut_down()
except AmdSmiException as e:
print(e)
AmdSmiRetryException: DerivesAmdSmiLibraryExceptionclass and signals processor is busy and call should be retried.AmdSmiTimeoutException: DerivesAmdSmiLibraryExceptionclass and represents that call had timed out.AmdSmiParameterException: Derives baseAmdSmiExceptionclass and represents errors related to invaild parameters passed to functions. When this exception is thrown, err_msg is set and it explains what is the actual and expected type of the parameters.AmdSmiBdfFormatException: Derives baseAmdSmiExceptionclass and represents invalid bdf format.
Amdsmi API#
amdsmi_init#
Description: Initialize smi lib and connect to driver
Input parameters: init_flags (Optional parameter, if no value provided, default value is AMDSMI_INIT_ALL_PROCESSORS value)
init_flags is AmdSmiInitFlags enum:
Field |
Description |
|---|---|
|
all processors |
|
amd cpus |
|
amd gpus |
|
non amd cpus |
|
non amd gpus |
|
amd apus |
Output: None
Exceptions that can be thrown by amdsmi_init function:
AmdSmiLibraryException
Example:
try:
amdsmi_init()
# continue with amdsmi
except AmdSmiException as e:
print("Init failed")
print(e)
amdsmi_shut_down#
Description: Finalize and close connection to driver
Input parameters: None
Output: None
Exceptions that can be thrown by amdsmi_shut_down function:
AmdSmiLibraryException
Example:
try:
amdsmi_shut_down()
except AmdSmiException as e:
print("Fini failed")
print(e)
amdsmi_get_processor_handles#
Description: Returns list of GPU device handle objects on current machine
Input parameters: None
Output: List of GPU device handles
Exceptions that can be thrown by amdsmi_get_processor_handles function:
AmdSmiLibraryException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
print(amdsmi_get_gpu_device_uuid(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_processor_handle_from_bdf#
Description: Returns processor handle from the given BDF
Input parameters: bdf string in form of either <domain>:<bus>:<device>.<function> or <bus>:<device>.<function> in hexcode format.
Where:
<domain>is 4 hex digits long from 0000-FFFF interval<bus>is 2 hex digits long from 00-FF interval<device>is 2 hex digits long from 00-1F interval<function>is 1 hex digit long from 0-7 interval
Output: processor handle object
Exceptions that can be thrown by amdsmi_get_processor_handle_from_bdf function:
AmdSmiLibraryExceptionAmdSmiBdfFormatException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
print(amdsmi_get_gpu_device_uuid(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_device_bdf#
Description: Returns BDF of the given device
Input parameters:
GPU device for which to query
Output: BDF string in form of <domain>:<bus>:<device>.<function> in hexcode format.
Where:
<domain>is 4 hex digits long from 0000-FFFF interval<bus>is 2 hex digits long from 00-FF interval<device>is 2 hex digits long from 00-1F interval<function>is 1 hex digit long from 0-7 interval
Exceptions that can be thrown by amdsmi_get_gpu_device_bdf function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
print("GPU bdf:", amdsmi_get_gpu_device_bdf(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_index_from_processor_handle#
Description: Returns the index of the given processor handle
Input parameters:
GPU device for which to query
Output: GPU device index
Exceptions that can be thrown by amdsmi_get_index_from_processor_handle function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
print("Processor's index:", amdsmi_get_index_from_processor_handle(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_processor_handle_from_index#
Description: Returns the processor handle from the given processor index
Input parameters:
Function processor index to query
Output: processor handle object
Exceptions that can be thrown by amdsmi_get_processor_handle_from_index function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processors = amdsmi_get_processor_handles()
num_of_GPUs = len(processors)
if num_of_GPUs == 0:
print("No GPUs on machine")
else:
for index in range(num_of_GPUs):
print("Processor handle:", amdsmi_get_processor_handle_from_index(index))
except AmdSmiException as e:
print(e)
amdsmi_get_vf_bdf#
Description: Returns BDF of the given VF
Input parameters:
VF for which to query
Output: BDF string in form of <domain>:<bus>:<device>.<function> in hexcode format.
Where:
<domain>is 4 hex digits long from 0000-FFFF interval<bus>is 2 hex digits long from 00-FF interval<device>is 2 hex digits long from 00-1F interval<function>is 1 hex digit long from 0-7 interval
Exceptions that can be thrown by amdsmi_get_vf_bdf function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
vf = amdsmi_get_vf_handle_from_bdf("0000:23:02.0")
print("VF's bdf:", amdsmi_get_vf_bdf(vf))
except AmdSmiException as e:
print(e)
amdsmi_get_vf_handle_from_bdf#
Description: Returns processor handle (VF) from the given BDF
Input parameters: bdf string in form of either <domain>:<bus>:<device>.<function> or <bus>:<device>.<function> in hexcode format.
Where:
<domain>is 4 hex digits long from 0000-FFFF interval<bus>is 2 hex digits long from 00-FF interval<device>is 2 hex digits long from 00-1F interval<function>is 1 hex digit long from 0-7 interval
Output: processor handle object
Exceptions that can be thrown by amdsmi_get_vf_handle_from_bdf function:
AmdSmiLibraryExceptionAmdSmiBdfFormatException
Example:
try:
vf = amdsmi_get_vf_handle_from_bdf("0000:23:02.0")
print(amdsmi_get_vf_uuid(vf))
except AmdSmiException as e:
print(e)
amdsmi_get_processor_handle_from_uuid#
Description: Returns processor handle from the given UUID
Input parameters: uuid string Output: processor handle object
Exceptions that can be thrown by amdsmi_get_processor_handle_from_uuid function:
AmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_uuid("fcff7460-0000-1000-80e9-b388cfe84658")
print("Processor's UUID: ", amdsmi_get_gpu_device_uuid(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_vf_handle_from_uuid#
Description: Returns the handle of a virtual function (VF) from the given UUID
Input parameters: uuid string Output: vf object
Exceptions that can be thrown by amdsmi_get_vf_handle_from_uuid function:
AmdSmiLibraryException
Example:
try:
vf = amdsmi_get_vf_handle_from_uuid("87007460-0000-1000-8059-3ae746ab9206")
print("VF's UUID: ", amdsmi_get_vf_uuid(vf))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_device_uuid#
Description: Returns the UUID of the device
Input parameters:
GPU device for which to query
Output: UUID string unique to the device
Exceptions that can be thrown by amdsmi_get_gpu_device_uuid function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
print("Device UUID: ", amdsmi_get_gpu_device_uuid(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_vf_uuid#
Description: Returns the UUID of the device
Input parameters:
VF handle for which to query
Output: UUID string unique to the device
Exceptions that can be thrown by amdsmi_get_vf_uuid function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
vf_id = amdsmi_get_vf_handle_from_vf_index(processor_handle, 0)
print("VF UUID: ", amdsmi_get_vf_uuid(vf_id))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_driver_info#
Description: Returns the version string of the driver
Input parameters:
processor_handleGPU device for which to query
Output:
driver_nameDriver name string that is handling the GPU devicedriver_versionDriver version string that is handling the GPU devicedriver_dateDriver date string that is handling the GPU device
Exceptions that can be thrown by amdsmi_get_gpu_driver_info function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
driver_info = amdsmi_get_gpu_driver_info(processor)
print("Driver name: ", driver_info.driver_name)
print("Driver version: ", driver_info.driver_version)
print("Driver date: ", driver_info.driver_date)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_driver_model#
Description: Returns driver model information
Input parameters:
processor_handleGPU device for which to query
Output:
current driver model from
AmdSmiDriverModelTypeenum
AmdSmiDriverModelType enum:
Field |
Description |
|---|---|
|
Windows Display Driver Model |
|
Windows Driver Model |
|
Compute Driver Model |
Exceptions that can be thrown by amdsmi_get_gpu_driver_model function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
driver_model = amdsmi_get_gpu_driver_model(processor)
print("Driver model: ", driver_model)
except AmdSmiException as e:
print(e)
amdsmi_get_vf_handle_from_vf_index#
Description: Returns VF id of the VF referenced by its index (in partitioning info)
Input parameters:
processor handlePF or child VF of a GPU device for which to queryVF's indexIndex of VF (0-31) in GPU’s partitioning info
Output:
VF id
Exceptions that can be thrown by amdsmi_get_vf_handle_from_vf_index function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
vf_id = amdsmi_get_vf_handle_from_vf_index(processor_handle, 0)
print(amdsmi_get_vf_info(vf_id))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_total_ecc_count#
Description: Returns the number of ECC errors on the GPU device
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
Count of ECC correctable errors |
|
Count of ECC uncorrectable errors |
|
Count of ECC deferred errors |
Exceptions that can be thrown by amdsmi_get_gpu_total_ecc_count function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
ecc_errors = amdsmi_get_gpu_total_ecc_count(processor)
print(ecc_errors['correctable_count'])
print(ecc_errors['uncorrectable_count'])
print(ecc_errors['deferred_count'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_ecc_count#
Description: Returns the number of ECC errors on the GPU device for the given block
Input parameters:
processor_handleGPU device which to queryblockThe block for which error counts should be retrieved
block is AmdSmiGpuBlock enum:
Field |
Description |
|---|---|
|
UMC block |
|
SDMA block |
|
GFX block |
|
MMHUB block |
|
ATHUB block |
|
PCIE_BIF block |
|
HDP block |
|
XGMI_WAFL block |
|
DF block |
|
SMN block |
|
SEM block |
|
MP0 block |
|
MP1 block |
|
FUSE block |
|
MCA block |
|
VCN block |
|
JPEG block |
|
IH block |
|
MPIO block |
Output: Dictionary with fields
Field |
Description |
|---|---|
|
Count of ECC correctable errors |
|
Count of ECC uncorrectable errors |
|
Count of ECC deferred errors |
Exceptions that can be thrown by amdsmi_get_gpu_ecc_count function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
ecc_errors = amdsmi_get_gpu_ecc_count(processor, AmdSmiGpuBlock.UMC)
print(ecc_errors['correctable_count'])
print(ecc_errors['uncorrectable_count'])
print(ecc_errors['deferred_count'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_ecc_enabled#
Description: Returns ECC capabilities (disable/enable) for each GPU block.
Input parameters:
processor_handleGPU device which to query
Output: Dictionary of each GPU block and its value (False if the block is not enabled, True if the block is enabled)
Each GPU block in the dictionary is from AmdSmiGpuBlock enum:
Field |
Description |
|---|---|
|
UMC block |
|
SDMA block |
|
GFX block |
|
MMHUB block |
|
ATHUB block |
|
PCIE_BIF block |
|
HDP block |
|
XGMI_WAFL block |
|
DF block |
|
SMN block |
|
SEM block |
|
MP0 block |
|
MP1 block |
|
FUSE block |
|
MCA block |
|
VCN block |
|
JPEG block |
|
IH block |
|
MPIO block |
Exceptions that can be thrown by amdsmi_get_gpu_ecc_enabled function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
ecc_status = amdsmi_get_gpu_ecc_enabled(processor, AmdSmiGpuBlock.UMC)
print(ecc_status)
except AmdSmiException as e:
print(e)
amdsmi_status_code_to_string#
Description: Get a description of a provided AMDSMI error status
Input parameters:
statusThe error status for which a description is desired
Output: String description of the provided error code
Exceptions that can be thrown by amdsmi_status_code_to_string function:
AmdSmiParameterException
Example:
try:
status_str = amdsmi_status_code_to_string(AmdSmiRetCode.SUCCESS)
print(status_str)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_ras_feature_info#
Description: Returns RAS feature info
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
RAS EEPROM version |
|
ecc correction schema mask used with |
Exceptions that can be thrown by amdsmi_get_gpu_ras_feature_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
ras_feature = amdsmi_get_gpu_ras_feature_info(processor)
print(ras_feature['ras_eeprom_version'])
print(ras_feature['ecc_correction_schema'])
except AmdSmiException as e:
print(e)
amdsmi_get_bad_page_threshold#
Description: Returns bad page threshold
Input parameters:
processor_handleGPU device which to query
Output: Bad page threshold value
Exceptions that can be thrown by amdsmi_get_bad_page_threshold function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
bad_page_threshold = amdsmi_get_bad_page_threshold(processor)
print(bad_page_threshold)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_bad_page_info#
Description: Returns bad page info.
Input parameters:
processor handle objectPF of a GPU device to query
Output: list of dictionaries with fields for each bad page
Field |
Description |
|---|---|
|
64K/4K Driver managed location that is blocked from further use |
|
Marks the last time when the RAS event was observed |
|
this value identifies the memory channel the issue has been reported on |
|
this value identifies the memory controller the issue has been reported on |
Exceptions that can be thrown by amdsmi_get_gpu_bad_page_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23.00.0")
bad_page_info = amdsmi_get_gpu_bad_page_info(processor)
if len(bad_page_info) == 0:
print("no bad pages")
else:
for table_record in bad_page_info:
print(hex(table_record["retired_page"]))
print(datetime.fromtimestamp(table_record['ts']).strftime('%Y/%m/%d:%H/%M/%S'))
print(table_record['mem_channel'])
print(table_record['mcumc_id'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_asic_info#
Description: Returns asic information for the given GPU
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Content |
|---|---|
|
market name |
|
vendor id |
|
vendor name |
|
subsystem vendor id |
|
unique id of a GPU |
|
revision id |
|
asic serial |
|
xgmi physical id |
|
num of compute units (Not supported yet, currently hardcoded to 0) |
|
target graphics version (Not supported yet, currently hardcoded to 0) |
|
subsystem device id |
Exceptions that can be thrown by amdsmi_get_gpu_asic_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
asic_info = amdsmi_get_gpu_asic_info(processor)
print(asic_info['market_name'])
print(asic_info['vendor_id'])
print(asic_info['vendor_name'])
print(asic_info['subvendor_id'])
print(asic_info['device_id'])
print(asic_info['subsystem_id'])
print(asic_info['rev_id'])
print(asic_info['asic_serial'])
print(asic_info['oam_id'])
print(asic_info['num_of_compute_units'])
print(asic_info['target_graphics_version'])
except AmdSmiException as e:
print(e)
amdsmi_get_pcie_info#
Description: Returns static and metric information about PCIe link for the given GPU
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Content |
||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||||||||||
|
|
Exceptions that can be thrown by amdsmi_get_pcie_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
pcie_info = amdsmi_get_pcie_info(processor)
print(pcie_info['pcie_static']['max_pcie_width'])
print(pcie_info['pcie_static']['max_pcie_speed'])
print(pcie_info['pcie_static']['slot_type'])
print(pcie_info['pcie_static']['max_pcie_interface_version'])
print(pcie_info['pcie_metric']['pcie_speed'])
print(pcie_info['pcie_metric']['pcie_width'])
print(pcie_info['pcie_metric']['pcie_bandwidth'])
print(pcie_info['pcie_metric']['pcie_interface_version'])
print(pcie_info['pcie_metric']['pcie_replay_count'])
print(pcie_info['pcie_metric']['pcie_l0_to_recovery_count'])
print(pcie_info['pcie_metric']['pcie_replay_roll_over_count'])
print(pcie_info['pcie_metric']['pcie_nak_sent_count'])
print(pcie_info['pcie_metric']['pcie_nak_received_count'])
print(pcie_info['pcie_metric']['pcie_lc_perf_other_end_recovery_count'])
except AmdSmiException as e:
print(e)
amdsmi_get_power_cap_info#
Description: Returns dictionary of power capabilities as currently configured on the given GPU
Input parameters:
processor_handleGPU device which to querysensor_indsensor index. Normally, this will be 0. If a processor has more than one sensor, it could be greater than 0. Parameter sensor_ind is unused on @platform{host}. It is an optional parameter and is set to 0 by default.
Output: Dictionary with fields
Field |
Description |
|---|---|
|
power capability |
|
default power capability |
|
dynamic power management capability |
|
minimum power capability |
|
maximum power capability |
Exceptions that can be thrown by amdsmi_get_power_cap_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
power_info = amdsmi_get_power_cap_info(processor)
print(power_info['power_cap'])
print(power_info['default_power_cap'])
print(power_info['dpm_cap'])
print(power_info['min_power_cap'])
print(power_info['max_power_cap'])
except AmdSmiException as e:
print(e)
amdsmi_get_fb_layout#
Description: Returns framebuffer related information for the given GPU
Input parameters:
processor handlePF of a GPU device for which to query
Output: Dictionary with field
Field |
Description |
|---|---|
|
total framebuffer size in MB |
|
framebuffer reserved space in MB |
|
framebuffer offset in MB |
|
framebuffer alignment in MB |
|
maximum usable framebuffer size in MB |
|
minimum usable framebuffer size in MB |
Exceptions that can be thrown by amdsmi_get_fb_layout function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
fb_info = amdsmi_get_fb_layout(processor)
print(fb_info['total_fb_size'])
print(fb_info['pf_fb_reserved'])
print(fb_info['pf_fb_offset'])
print(fb_info['fb_alignment'])
print(fb_info['max_vf_fb_usable'])
print(fb_info['min_vf_fb_usable'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_activity#
Description: Returns the engine usage for the given GPU.
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
graphics engine usage/activity percentage (0 - 100) |
|
memory/UMC engine usage/activity percentage (0 - 100) |
|
average multimedia engine usages/activities in percentage (0 - 100) |
Exceptions that can be thrown by amdsmi_get_gpu_activity function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
engine_activity = amdsmi_get_gpu_activity(processor)
print(engine_activity['gfx_activity'])
print(engine_activity['umc_activity'])
print(engine_activity['mm_activity'])
except AmdSmiException as e:
print(e)
amdsmi_get_power_info#
Description: Returns the current power, power limit, and voltage for the given GPU
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
Note: socket_power can rarely spike above the socket power limit in some cases |
|
|
socket power |
|
gfx voltage |
|
socket voltage |
|
memory voltage |
Exceptions that can be thrown by amdsmi_get_power_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
power_info = amdsmi_get_power_info(processor)
print(power_info['socket_power'])
print(power_info['gfx_voltage'])
print(power_info['soc_voltage'])
print(power_info['mem_voltage'])
except AmdSmiException as e:
print(e)
amdsmi_set_power_cap#
Description: Sets GPU power cap.
Input parameters:
processor handleprocessor handlesensor_indsensor index. Normally, this will be 0. If a processor has more than one sensor, it could be greater than 0. Parameter sensor_ind is unused on @platform{host}.capvalue representing power cap to set. The value must be between the minimum (min_power_cap) and maximum (max_power_cap) power cap values, which can be obtained from ::amdsmi_power_cap_info_t.
Output:
None
Exceptions that can be thrown by amdsmi_set_power_cap function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
sensor_ind = 0
for processor in processors:
power_info = amdsmi_get_power_cap_info(processor)
power_limit = random.randint(power_info['min_power_cap'], power_info['max_power_cap'])
amdsmi_set_power_cap(processor, sensor_ind, power_limit)
except AmdSmiException as e:
print(e)
amdsmi_is_gpu_power_management_enabled#
Description: Returns is power management enabled
Input parameters:
processor_handleGPU device which to query
Output: Bool true if power management enabled else false
Exceptions that can be thrown by amdsmi_is_gpu_power_management_enabled function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
is_power_management_enabled = amdsmi_is_gpu_power_management_enabled(processor)
print(is_power_management_enabled)
except AmdSmiException as e:
print(e)
amdsmi_get_temp_metric#
Description: Returns the current temperature or limit temperature for the given processor
Input parameters:
processor_handleGPU device which to querythermal_domainone ofAmdSmiTemperatureTypeenum values:
Field |
Description |
|---|---|
|
edge thermal domain |
|
hotspot/junction thermal domain |
|
memory/vram thermal domain |
|
plx thermal domain (Not supported yet) |
|
HBM 0 thermal domain (Not supported yet) |
|
HBM 1 thermal domain (Not supported yet) |
|
HBM 2 thermal domain (Not supported yet) |
|
HBM 3 thermal domain (Not supported yet) |
thermal_metricone ofAmdSmiTemperatureMetricenum values:
Field |
Description |
|---|---|
|
current thermal metric |
|
max thermal metric (Not supported yet) |
|
min thermal metric (Not supported yet) |
|
max hyst thermal metric (Not supported yet) |
|
min hyst thermal metric (Not supported yet) |
|
limit thermal metric |
|
critical hyst metric (Not supported yet) |
|
emergency thermal metric (Not supported yet) |
|
emergency hyst thermal metric (Not supported yet) |
|
critical min thermal metric (Not supported yet) |
|
critical min hyst thermal metric (Not supported yet) |
|
offset thermal metric (Not supported yet) |
|
lowest thermal metric (Not supported yet) |
|
highest thermal metric (Not supported yet) |
|
shutdown thermal metric |
Output: Temperature value
Exceptions that can be thrown by amdsmi_get_temp_metric function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for device in processors:
print("=============== EDGE THERMAL DOMAIN ================")
thermal_measure = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.EDGE, AmdSmiTemperatureMetric.CURRENT)
print("Current temperature:")
print(thermal_measure)
thermal_measure = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.EDGE, AmdSmiTemperatureMetric.CRITICAL)
print("Limit temperature:")
print(thermal_measure)
print("=============== HOTSPOT THERMAL DOMAIN ================")
thermal_measure = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.HOTSPOT, AmdSmiTemperatureMetric.CURRENT)
print("Current temperature:")
print(thermal_measure)
thermal_measure = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.HOTSPOT, AmdSmiTemperatureMetric.CRITICAL)
print("Limit temperature:")
print(thermal_measure)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_cache_info#
Description: Returns the cache info for the given processor
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
AmdSmiCacheProperty enum values
Field |
Description |
|---|---|
|
Cache enabled |
|
Data cache |
|
Inst cache |
|
CPU cache |
|
SIMD cache |
Exceptions that can be thrown by amdsmi_get_gpu_cache_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for device in processors:
cache_info = amdsmi_get_gpu_cache_info(device)
for cache in cache_info["cache"]:
print(cache["cache_properties"])
print(cache["cache_size"])
print(cache["cache_level"])
print(cache["max_num_cu_shared"])
print(cache["num_cache_instance"])
except AmdSmiException as e:
print(e)
amdsmi_get_clock_info#
Description: Returns the clock measurements for the given GPU
Input parameters:
processor_handleGPU device which to queryclock_domainone ofAmdSmiClkTypeenum values:
Field |
Description |
|---|---|
|
system clock domain |
|
gfx clock domain |
|
Data Fabric clock (for ASICs running on a separate clock) domain (Not supported yet) |
|
Display Controller Engine clock domain (Not supported yet) |
|
SOC clock domain (Not supported yet) |
|
memory clock domain |
|
PCIe clock domain (Not supported yet) |
|
first multimedia engine (VCLK0) clock domain |
|
second multimedia engine (VCLK1) clock domain |
|
DCLK0 clock domain |
|
DCLK1 clock domain |
Output: Dictionary with fields
Field |
Description |
|---|---|
|
current clock value for the given domain |
|
minimum clock value for the given domain |
|
maximum clock value for the given domain |
|
clock locked flag only supported on GFX clock domain |
|
clock deep sleep mode flag |
Exceptions that can be thrown by amdsmi_get_clock_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
print("=============== GFX CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.GFX)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_locked'])
print(clock_measure['clk_deep_sleep'])
print("=============== MEM CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.MEM)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== SYS CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.SYS)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== VCLK0 CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.VCLK0)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== VCLK1 CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.VCLK1)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== DCLK0 CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.DCLK0)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== DCLK1 CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.DCLK1)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_vram_info#
Description: Returns the static information for the VRAM info
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
VRAM type from |
|
VRAM vendor from |
|
VRAM size in MB |
|
VRAM bit width |
AmdSmiVramType enum:
Field |
Description |
|---|---|
|
UNKNOWN VRAM type |
|
HBM VRAM type |
|
HBM2 VRAM type |
|
HBM2E VRAM type |
|
HBM3 VRAM type |
|
HBM3E VRAM type |
|
DDR2 VRAM type |
|
DDR3 VRAM type |
|
DDR4 VRAM type |
|
GDDR1 VRAM type |
|
GDDR2 VRAM type |
|
GDDR3 VRAM type |
|
GDDR4 VRAM type |
|
GDDR5 VRAM type |
|
GDDR6 VRAM type |
|
GDDR7 VRAM type |
AmdSmiVramVendor enum:
Field |
Description |
|---|---|
|
SAMSUNG VRAM vendor |
|
INFINEON VRAM vendor |
|
ELPIDA VRAM vendor |
|
ETRON VRAM vendor |
|
NANYA VRAM vendor |
|
HYNIX VRAM vendor |
|
MOSEL VRAM vendor |
|
WINBOND VRAM vendor |
|
ESMT VRAM vendor |
|
MICRON VRAM vendor |
|
UNKNOWN VRAM vendor |
Exceptions that can be thrown by amdsmi_get_gpu_vram_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
vram_info = amdsmi_get_gpu_vram_info(processor)
print(vram_info['vram_type'])
print(vram_info['vram_vendor'])
print(vram_info['vram_size'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_vbios_info#
Description: Returns the static information for the VBIOS on the GPU device.
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
vbios name |
|
vbios build date |
|
vbios part number |
|
vbios version string |
|
boot firmware info |
Exceptions that can be thrown by amdsmi_get_gpu_vbios_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
vbios_info = amdsmi_get_gpu_vbios_info(processor)
print(vbios_info['name'])
print(vbios_info['build_date'])
print(vbios_info['part_number'])
print(vbios_info['version'])
print(vbios_info['boot_firmware'])
except AmdSmiException as e:
print(e)
amdsmi_get_fw_info#
Description: Returns the firmware information for the given GPU.
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
List of dictionaries that contain information about a certain firmware block |
Exceptions that can be thrown by amdsmi_get_fw_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
firmware_list = amdsmi_get_fw_info(processor)
for firmware_block in firmware_list:
print(firmware_block['fw_id'])
print(firmware_block['fw_version'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_board_info#
Description: Returns board related information for the given GPU
Input parameters:
GPU device handle object
Output: Dictionary with fields
Field |
Description |
|---|---|
|
board model number |
|
board product serial number |
|
fru (field-replaceable unit) id |
|
board product name |
|
board manufacturer name |
Exceptions that can be thrown by amdsmi_get_gpu_board_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
board_info = amdsmi_get_gpu_board_info(processor)
print(board_info['model_number'])
print(board_info['product_serial'])
print(board_info['fru_id'])
print(board_info['manufacturer_name'])
print(board_info['product_name'])
except AmdSmiException as e:
print(e)
amdsmi_get_num_vf#
Description: Returns number of enabled VFs and number of supported VFs for the given GPU
Input parameters:
processor handlePF of a GPU device for which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
number of enabled VFs |
|
number of supported VFs |
Exceptions that can be thrown by amdsmi_get_num_vf function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
num_vf = amdsmi_get_num_vf(processor)
print(num_vf['num_vf_enabled'])
print(num_vf['num_vf_supported'])
except AmdSmiException as e:
print(e)
amdsmi_get_vf_partition_info#
Description: Returns array of the current framebuffer partitioning structures on the given GPU
Input parameters:
processor handle objectPF of a GPU device for which to query
Output: Array of dictionary with fields
Field |
Description |
||||||
|---|---|---|---|---|---|---|---|
|
VF handle |
||||||
|
|
Exceptions that can be thrown by amdsmi_get_vf_partition_info function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
partitions = amdsmi_get_vf_partition_info(processor)
print(partitions[0]['fb']['fb_size'])
# partitions[0]['fb']['fb_size'] is frame buffer size of the first VF on the given GPU
# we can access any VF from the array via its index in partitions list
except AmdSmiException as e:
print(e)
amdsmi_set_num_vf#
Description: Set number of enabled VFs for the given GPU
Input parameters:
processor_handleGPU device which to querynumber of enabled VFs to be set
Output: None
Exceptions that can be thrown by amdsmi_set_num_vf function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_num_vf(processor,2)
except AmdSmiException as e:
print(e)
amdsmi_clear_vf_fb#
Description: Clears framebuffer of the given VF on the given GPU.
If trying to clear the framebuffer of an active function,
the call will fail
Input parameters:
VF device handle
Output: None
Exceptions that can be thrown by amdsmi_clear_vf_fb function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for device in devices:
partitions = amdsmi_get_vf_partition_info(device)
amdsmi_clear_vf_fb(partitions[0]['vf_id'])
# partitions[0]['vf_id'] is handle of the first VF on the given GPU
except AmdSmiException as e:
print(e)
amdsmi_get_vf_data#
Description: Returns the scheduler information and guard structure for the given VF.
Input parameters:
VF handle
Output: Dictionary with fields
Field |
Description |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
function level reset counter |
||||||||||||
|
boot up time in microseconds |
||||||||||||
|
shutdown time in microseconds |
||||||||||||
|
reset time in microseconds |
||||||||||||
|
vf state |
||||||||||||
|
last boot start time |
||||||||||||
|
last boot end time |
||||||||||||
|
last shutdown start time |
||||||||||||
|
last shutdown end time |
||||||||||||
|
last reset start time |
||||||||||||
|
last reset end time |
||||||||||||
|
current session active time, reset after guest reload |
||||||||||||
|
current session running time, reset after guest reload |
||||||||||||
|
total active time, reset after host reload |
||||||||||||
|
total running time, reset after host reload |
||||||||||||
|
show if guard info is enabled for VF |
||||||||||||
|
|
AmdSmiGuardType enum values are keys in guard dictionary
Field |
Description |
|---|---|
|
function level reset status |
|
exclusive access mode status |
|
exclusive access time out status |
|
generic interrupt status |
State is AmdSmiGuardState enum object with values
Field |
Description |
|---|---|
|
the event number is within the threshold |
|
the event number hits the threshold |
|
the event number is bigger than the threshold |
State is AmdSmiVfState enum object with values
Field |
Description |
|---|---|
|
vf state unavailable |
|
vf state available |
|
vf state active |
|
vf state suspended |
|
vf state fullaccess |
|
same as available, indicates this is a default VF |
Exceptions that can be thrown by amdsmi_get_vf_data function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
partitions = amdsmi_get_vf_partition_info(processor)
vf_data = amdsmi_get_vf_data(partitions[0]['vf_id'])
sched_info = vf_data['sched']
guard_info = vf_data['guard']
print(sched_info['boot_up_time'])
print(sched_info['flr_count'])
print(sched_info['state'].name)
print(sched_info['last_boot_start'])
print(sched_info['last_boot_end'])
print(sched_info['last_shutdown_start'])
print(sched_info['last_shutdown_end'])
print(sched_info['shutdown_time'])
print(sched_info['last_reset_start'])
print(sched_info['last_reset_end'])
print(sched_info['reset_time'])
print(sched_info['current_active_time'])
print(sched_info['current_running_time'])
print(sched_info['total_active_time'])
print(sched_info['total_running_time'])
print(guard_info['enabled'])
for guard_type in guard_info['guard']:
print("type: {} ".format(guard_type))
print("state: {}".format(guard_info['guard'][guard_type]['state']))
print("amount: {}".format(guard_info['guard'][guard_type]['amount']))
print("interval: {}".format(guard_info['guard'][guard_type]['interval']))
print("threshold: {}".format(guard_info['guard'][guard_type]['threshold']))
print("active: {}".format(guard_info['guard'][guard_type]['active']))
print("==================")
except AmdSmiException as e:
print(e)
amdsmi_get_vf_info#
Description: Returns the configuration structure for a given VF
Input parameters:
VF handle
Output: Dictionary with fields
Field |
Description |
||||||
|---|---|---|---|---|---|---|---|
|
|
||||||
|
gfx timeslice in us |
Exceptions that can be thrown by amdsmi_get_vf_info function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
partitions = amdsmi_get_vf_partition_info(processor)
# partitions[0]['vf_id'] is handle of the first VF on the given GPU
config = amdsmi_get_vf_info(partitions[0]['vf_id'])
print("fb_offset: {}".format(config['fb']['fb_offset']))
print("fb_size: {}".format(config['fb']['fb_size']))
print("gfx_timeslice : {}".format(config['gfx_timeslice']))
except AmdSmiException as e:
print(e)
amdsmi_get_guest_data#
Description: Gets guest OS information of the queried VF
Input parameters:
processor handleVF of a GPU device
Output: Dictionary with fields
Field |
Description |
|---|---|
|
driver version |
|
fb usage in MB |
Exceptions that can be thrown by amdsmi_get_guest_data function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
num_vf_enabled = amdsmi_get_num_vf(processor)['num_vf_enabled']
partitions = amdsmi_get_vf_partition_info(processor)
for i in range(0, num_vf_enabled):
guest_data = amdsmi_get_guest_data(partitions[i]['vf_id'])
print(guest_data)
except AmdSmiException as e:
print(e)
amdsmi_get_fw_error_records#
Description: Gets firmware error records
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with field err_records, which is list of elements
Field |
Description |
|---|---|
|
system time in seconds |
|
vf index |
|
firmware id |
|
firmware load status |
Exceptions that can be thrown by amdsmi_get_fw_error_records function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
err_records = amdsmi_get_fw_error_records(processor)
print(err_records)
except AmdSmiException as e:
print(e)
amdsmi_get_dfc_fw_table#
Description: Gets dfc firmware table
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with field header, and data which is a list of elements
Each header is a dictionary with following fields:
Field |
Description |
|---|---|
|
dfc firmware version |
|
number of entries in the dfc table |
|
gart wr guest min |
|
gart wr guest max |
Each data entry is a dictionary with following fields:
Field |
Description |
|---|---|
|
dfc firmware type |
|
verification enabled |
|
customer ordinal |
|
white list |
|
black list |
Each white list entry is a dictionary with following fields:
Field |
Description |
|---|---|
|
latest |
|
oldest |
Exceptions that can be thrown by amdsmi_get_dfc_fw_table function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
dfc_table = amdsmi_get_dfc_fw_table(processor)
print(dfc_table)
except AmdSmiException as e:
print(e)
amdsmi_get_vf_fw_info#
Description: Returns GPU firmware related information.
Input parameters:
processor handleVF of a GPU device for which to query
Output: Dictionary with field fw_list, which is list of elements
If microcode of certain type is not loaded, version will be 0.
Field |
Description |
||||||
|---|---|---|---|---|---|---|---|
|
|
Exceptions that can be thrown by amdsmi_get_vf_fw_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
partitions = amdsmi_get_vf_partition_info(processor)
fw_info = amdsmi_get_vf_fw_info(partitions[0]['vf_id'])
fw_num = len(fw_info['fw_list'])
for j in range(0, fw_num):
fw = fw_info['fw_list'][j]
print(fw['fw_name'].name)
print(fw['fw_version'])
except AmdSmiException as e:
print(e)
amdsmi_get_partition_profile_info#
Description: Gets partition profile info
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields, current_profile and profiles
Field |
Description |
|---|---|
|
current profile index |
|
list of all profiles |
Where, profiles is a list containing
Field |
Description |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
number of vfs |
||||||||||||
|
|
Keys for profile_caps dictionary are in AmdSmiProfileCapabilityType enum
Field |
Description |
|---|---|
|
memory |
|
encode engine |
|
decode engine |
|
compute engine |
Exceptions that can be thrown by amdsmi_get_partition_profile_info function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
profile_info = amdsmi_get_partition_profile_info(processor)
print(profile_info)
except AmdSmiException as e:
print(e)
amdsmi_get_link_metrics#
Description: Gets link metric information
Input parameters:
processor handlePF of a GPU device
Output: links list of dictionaries with fields for each link
Field |
Description |
|---|---|
|
BDF of the given processor |
|
current link speed in Gb/s |
|
max bandwidth of the link |
|
type of the link from |
|
total data received for each link in KB |
|
total data transferred for each link in KB |
AmdSmiLinkType enum:
Field |
Description |
|---|---|
|
Unknown |
|
XGMI link type |
|
PCIe link type |
|
Link not applicable |
|
Unknown |
Exceptions that can be thrown by amdsmi_get_link_metrics function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
link_metrics = amdsmi_get_link_metrics(processor)
print(link_metrics)
except AmdSmiException as e:
print(e)
amdsmi_get_link_topology#
Description: Gets link topology information between two connected processors
Input parameters:
source processor handlePF of a source GPU devicedestination processor handlePF of a destination GPU device
Output: Dictionary with fields
Field |
Description |
|---|---|
|
link weight between two GPUs |
|
HW status of the link |
|
type of the link from |
|
number of hops between two GPUs |
|
framebuffer sharing between two GPUs |
AmdSmiLinkType enum:
Field |
Description |
|---|---|
|
Unknown |
|
XGMI link type |
|
PCIe link type |
|
Link not applicable |
|
Unknown |
Exceptions that can be thrown by amdsmi_get_link_topology function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for src_processor in processors:
for dst_processor in processors:
link_topology = amdsmi_get_link_topology(src_processor, dst_processor)
print(link_topology)
except AmdSmiException as e:
print(e)
amdsmi_get_link_topology_nearest#
Description: Retrieve the set of GPUs that are nearest to a given device at a specific interconnectivity level.
Input parameters:
processor_handleThe identifier of the given device.link_typeThe AmdSmiLinkType level to search for nearest devices
AmdSmiLinkType enum:
Field |
Description |
|---|---|
|
Unknown |
|
XGMI link type |
|
PCIe link type |
|
Link not applicable |
|
Unknown |
Output: Dictionary holding the following fields.
processor_listlist of all nearest processor handles found
Exceptions that can be thrown by amdsmi_get_link_topology_nearest function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
nearest_gpus = amdsmi_get_link_topology_nearest(processor, AmdSmiLinkType.PCIE)
if (len(nearest_gpus['processor_list'])) == 0:
print("No nearest GPUs found on machine")
else:
print("Nearest GPUs")
for gpu in nearest_gpus['processor_list']:
print(amdsmi_get_gpu_device_uuid(gpu))
except AmdSmiException as e:
print(e)
amdsmi_get_xgmi_fb_sharing_caps#
Description: Gets XGMI capabilities
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
|---|---|
|
flag that indicates if custom mode is supported (Not supported yet) |
|
flag that indicates if mode_1 is supported |
|
flag that indicates if mode_2 is supported |
|
flag that indicates if mode_4 is supported |
|
flag that indicates if mode_8 is supported |
Exceptions that can be thrown by amdsmi_get_xgmi_fb_sharing_caps function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
caps = amdsmi_get_xgmi_fb_sharing_caps(processor)
print(caps)
except AmdSmiException as e:
print(e)
amdsmi_get_xgmi_fb_sharing_mode_info#
Description: Gets XGMI framebuffer sharing information between two GPUs
Input parameters:
source processor handlePF of a source GPU devicedestination processor handlePF of a destination GPU devicemodeframebuffer sharing mode fromAmdSmiXgmiFbSharingModeenum
AmdSmiXgmiFbSharingMode enum:
Field |
Description |
|---|---|
|
custom framebuffer sharing mode (Not supported yet) |
|
framebuffer sharing mode_1 |
|
framebuffer sharing mode_2 |
|
framebuffer sharing mode_4 |
|
framebuffer sharing mode_8 |
Output: Value indicating whether framebuffer sharing is enabled between two GPUs
Exceptions that can be thrown by amdsmi_get_xgmi_fb_sharing_mode_info function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for src_processor in processors:
for dst_processor in processors:
fb_sharing = amdsmi_get_xgmi_fb_sharing_mode_info(src_processor, dst_processor, AmdSmiXgmiFbSharingMode.MODE_4)
print(fb_sharing)
except AmdSmiException as e:
print(e)
amdsmi_set_xgmi_fb_sharing_mode#
Description: Sets framebuffer sharing mode
Note: This API will only work if there’s no guest VM running.
Input parameters:
processor handlePF of a GPU devicemodeframebuffer sharing mode fromAmdSmiXgmiFbSharingModeenum
AmdSmiXgmiFbSharingMode enum:
Field |
Description |
|---|---|
|
custom framebuffer sharing mode (Not supported yet) |
|
framebuffer sharing mode_1 |
|
framebuffer sharing mode_2 |
|
framebuffer sharing mode_4 |
|
framebuffer sharing mode_8 |
Output:
None
Exceptions that can be thrown by amdsmi_set_xgmi_fb_sharing_mode function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_xgmi_fb_sharing_mode(processor, AmdSmiXgmiFbSharingMode.MODE_4)
except AmdSmiException as e:
print(e)
amdsmi_set_xgmi_fb_sharing_mode_v2#
Description: Sets framebuffer sharing mode
Note: This API will only work if there’s no guest VM running. This api can be used for custom and auto setting of xgmi frame buffer sharing. In case of custom mode: - All processors in the list must be on the same NUMA node. Otherwise, api will return error. - If any processor from the list already belongs to an existing group, the existing group will be released automatically. In case of auto mode(MODE_X): - The input parameter processor_list[0] should be valid. Only the first element of processor_list is taken into account and it can be any gpu0,gpu1,…
Input parameters:
processor_listlist of PFs of a GPU devicesmodeframebuffer sharing mode fromAmdSmiXgmiFbSharingModeenum
AmdSmiXgmiFbSharingMode enum:
Field |
Description |
|---|---|
|
custom framebuffer sharing mode |
|
framebuffer sharing mode_1 |
|
framebuffer sharing mode_2 |
|
framebuffer sharing mode_4 |
|
framebuffer sharing mode_8 |
Output:
None
Exceptions that can be thrown by amdsmi_set_xgmi_fb_sharing_mode_v2 function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
processors_custom_mode = []
if len(processors) == 0:
print("No GPUs on machine")
else:
if len(processors) > 3:
processors_custom_mode.append(processors[0])
processors_custom_mode.append(processors[2])
else:
processors_custom_mode = processors
amdsmi_set_xgmi_fb_sharing_mode_v2(processors_custom_mode, AmdSmiXgmiFbSharingMode.CUSTOM)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_metrics#
Description: Gets GPU metric information
Input parameters:
processor handlePF of a GPU device
Output: list of dictionaries with fields for each metric
Field |
Description |
|---|---|
|
value of the metric |
|
unit of the metric from |
|
name of the metric from |
|
category of the metric from |
|
list of types of the metric from |
|
mask of all active VFs + PF that this metric applies to |
|
resource group from |
|
resource group from |
|
resource instance number |
AmdSmiMetricUnit enum:
Field |
Description |
|---|---|
|
counter |
|
unsigned integer |
|
boolean |
|
megahertz |
|
percentage |
|
millivolt |
|
celsius |
|
watt |
|
joule |
|
gigabyte per second |
|
megabit per second |
|
PCIe generation |
|
PCIe lanes |
|
millijoule |
|
unknown unit |
AmdSmiMetricName enum:
Field |
Description |
|---|---|
|
accumulated counter |
|
firmware timestamp |
|
gfx clock |
|
socket clock |
|
memory clock |
|
vclk clock |
|
dclk clock |
|
gfx usage |
|
memory usage |
|
mm usage |
|
vcn usage |
|
jpeg usage |
|
gfx voltage |
|
socket voltage |
|
memory voltage |
|
current hotspot temperature |
|
hotspot temperature limit |
|
current memory temperature |
|
memory temperature limit |
|
current vr temperature |
|
shutdown temperature |
|
current power |
|
power limit |
|
socket energy |
|
ccd energy |
|
xcd energy |
|
aid energy |
|
memory energy |
|
active socket throttle |
|
active vr throttle |
|
active memory throttle |
|
pcie bandwidth |
|
pcie l0 recovery count |
|
pcie replay count |
|
pcie replay rollover count |
|
pcie nak sent count |
|
pcie nak received count |
|
maximum gfx clock limit |
|
maximum socket clock limit |
|
maximum memory clock limit |
|
maximum vclk clock limit |
|
maximum dclk clock limit |
|
minimum gfx clock limit |
|
minimum socket clock limit |
|
minimum memory clock limit |
|
minimum vclk clock limit |
|
minimum dclk clock limit |
|
gfx clock locked |
|
gfx deep sleep |
|
memory deep sleep |
|
socket deep sleep |
|
vclk deep sleep |
|
dclk deep sleep |
|
pcie link speed |
|
pcie link width |
|
dram bandwidth |
|
maximum dram bandwidth |
|
gfx clock below host limit ppt |
|
gfx clock below host limit thermal |
|
gfx clock below host limit total |
|
gfx clock low utilization |
|
input telemetry voltage |
|
pldm version |
|
xcd temperature |
|
aid temperature |
|
hbm temperature |
|
system metric accumulated counter |
|
system temperature ubb fpga |
|
system temperature ubb front |
|
system temperature ubb back |
|
system temperature ubb oam7 |
|
system temperature ubb ibc |
|
system temperature ubb ufpga |
|
system temperature ubb oam1 |
|
system temperature oam 0 1 hsc |
|
system temperature oam 2 3 hsc |
|
system temperature oam 4 5 hsc |
|
system temperature oam 6 7 hsc |
|
system temperature ubb fpga 0v72 vr |
|
system temperature ubb fpga 3v3 vr |
|
system temperature retimer 0 1 2 3 1v2 vr |
|
system temperature retimer 4 5 6 7 1v2 vr |
|
system temperature retimer 0 1 0v9 vr |
|
system temperature retimer 4 5 0v9 vr |
|
system temperature retimer 2 3 0v9 vr |
|
system temperature retimer 6 7 0v9 vr |
|
system temperature oam 0 1 2 3 3v3 vr |
|
system temperature oam 4 5 6 7 3v3 vr |
|
system temperature ibc hsc |
|
system temperature ibc |
|
node temperature retimer |
|
node temperature ibc temp |
|
node temperature ibc 2 temp |
|
node temperature vdd18 vr temp |
|
node temperature 04 hbm b vr temp |
|
node temperature 04 hbm d vr temp |
|
vr temperature vddcr vdd0 |
|
vr temperature vddcr vdd1 |
|
vr temperature vddcr vdd2 |
|
vr temperature vddcr vdd3 |
|
vr temperature vddcr soc a |
|
vr temperature vddcr soc c |
|
vr temperature vddcr socio a |
|
vr temperature vddcr socio c |
|
vr temperature vdd 085 hbm |
|
vr temperature vddcr 11 hbm b |
|
vr temperature vddcr 11 hbm d |
|
vr temperature vdd usr |
|
vr temperature vddio 11 e32 |
|
unknown name |
AmdSmiMetricCategory enum:
Field |
Description |
|---|---|
|
counter |
|
frequency |
|
activity |
|
temperature |
|
power |
|
energy |
|
throttle |
|
pcie |
|
static |
|
system accumulated counter |
|
system baseboard temperature |
|
system gpu board temperature |
|
unknown category |
AmdSmiMetricType enum:
Field |
Description |
|---|---|
|
counter |
|
chiplet |
|
instantaneous data |
|
accumulated data |
AmdSmiMetricResGroup enum:
Field |
Description |
|---|---|
|
resource group is not applicable |
|
gpu resource group |
|
xcp resource group |
|
aid resource group |
|
mid resource group |
|
system resource group |
|
unknown resource group |
AmdSmiMetricResSubgroup enum:
Field |
Description |
|---|---|
|
resource subgroup is not applicable |
|
xcc resource subgroup |
|
engine resource subgroup |
|
hbm resource subgroup |
|
baseboard resource subgroup |
|
gpuboard resource subgroup |
|
unknown resource subgroup |
Exceptions that can be thrown by amdsmi_get_gpu_metrics function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
metrics = amdsmi_get_gpu_metrics(processor)
print(metrics)
except AmdSmiException as e:
print(e)
amdsmi_get_soc_pstate#
Description: Gets the soc pstate policy for the processor
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
|---|---|
|
current policy index |
|
List of policies |
Each policies list entry is a dictionary with following fields:
Field |
Description |
|---|---|
|
policy id |
|
policy description |
Exceptions that can be thrown by amdsmi_get_soc_pstate function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
dpm_policy = amdsmi_get_soc_pstate(processor)
print(dpm_policy)
except AmdSmiException as e:
print(e)
amdsmi_set_soc_pstate#
Description: Sets the soc pstate policy for the processor
Input parameters:
processor handlePF of a GPU devicepolicy_idpolicy id represents one of the values we get from the policies list from amdsmi_get_soc_pstate.
Output:
None
Exceptions that can be thrown by amdsmi_set_soc_pstate function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_soc_pstate(processor, 0)
except AmdSmiException as e:
print(e)
AmdSmiEventReader class#
Description: Providing methods for event monitoring
Methods:
constructor#
Description: Allocates a new event reader notifier to monitor different types of issues with the GPU
Input parameters:
processor handle listlist of GPU device handle objects(PFs od Vfs) for which to create event readerevent category listlist of the different event categories that the event reader will monitor in GPU
Event category is AmdSmiEventCategory enum object with values
Category |
Description |
|---|---|
|
not used category |
|
driver events(allocation, failures of APIs, debug errors) |
|
events/notifications regarding RESET executed by the GPU |
|
scheduling events(world switch fail …) |
|
VBIOS events(security failures, vbios corruption…) |
|
ecc events |
|
pp events(slave not present, dpm fail …) |
|
events regarding the configuration of VF resources |
|
vf events(no vbios, gpu reset fail…) |
|
events related with FW loading or FW operations |
|
gpu fatal conditions |
|
guard events |
|
gpumon events(fb issues …) |
|
mmsch events |
|
xgmi events |
|
monitor all categories |
severityof events that can be monitored
Severity is AmdSmiEventSeverity enum object with values
Severity |
Description |
|---|---|
|
critical error |
|
significant error |
|
trivial error |
|
warning |
|
info |
|
monitor all severity levels |
severity parameter is optional. If nothing is set, events with LOW severity will be monitored by default.
Output:
created object of AmdSmiEventReader class
Exceptions that can be thrown by AmdSmiEventReader constructor function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
event_reader = AmdSmiEventReader(processors,{AmdSmiEventCategory.RESET})
except SmiException as e:
print(e)
read#
Description: Reads and return one event from event reader
Input parameters:
timestampnumber of microseconds to wait for an event to occur. If event does not happen monitoring is finished
Output: Dictionary with fields
Field |
Description |
|---|---|
|
VF handle |
|
GPU device handle |
|
UTC time (in microseconds) when the error happened |
|
data value associated with the specific event |
|
event category |
|
event subcategory |
|
event severity |
|
UTC date and time when the error happend |
|
message describing the event |
event category is AmdSmiEventCategory enum object with values
Category |
Description |
|---|---|
|
not used category |
|
driver events(allocation, failures of APIs, debug errors) |
|
events/notifications regarding RESET executed by the GPU |
|
scheduling events(world switch fail …) |
|
VBIOS events(security failures, vbios corruption…) |
|
ecc events |
|
pp events(slave not present, dpm fail …) |
|
events regarding the configuration of VF resources |
|
vf events(no vbios, gpu reset fail…) |
|
events related with FW loading or FW operations |
|
gpu fatal conditions |
|
guard events |
|
gpumon events(fb issues …) |
|
mmsch events |
|
xgmi events |
|
monitor all categories |
every AmdSmiEventCategory has its corresponding enum subcategory,
subcategories are:
Subcategory |
Field |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Severity is AmdSmiEventSeverity enum object with values
Severity |
Description |
|---|---|
|
critical error |
|
significant error |
|
trivial error |
|
warning |
|
info |
|
monitor all severity levels |
Exceptions that can be thrown by read function:
AmdSmiParameterExceptionAmdSmiTimeoutExceptionAmdSmiLibraryException
stop#
Description: Any resources used by event notification for the the given device will be freed with this function. This can be used explicitly or
automatically using with statement, like in the examples below. This should be called either manually or automatically for every created AmdSmiEventReader object.
Input parameters: None
Example with manual cleanup of AmdSmiEventReader:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
event_reader = AmdSmiEventReader(processors,{AmdSmiEventCategory.RESET}, AmdSmiEventSeverity.ALL)
while True:
event = event_reader.read(10*1000*1000)
gpu_bdf = amdsmi_get_gpu_device_bdf(event['dev_id'])
vf_bdf = amdsmi_get_gpu_device_bdf(event['fcn_id'])
print("=============== Event ================")
print(" Time {}".format(event['timestamp']))
print(" Category {}".format(event['category'].name))
print(" Subcategory {}".format(event['subcode'].name))
print(" Level {}".format(event['level'].name))
print(" Data {}".format(event['data']))
print(" VF BDF {}".format(vf_bdf))
print(" GPU BDF {}".format(gpu_bdf))
print(" Date {}".format(event['date']))
print(" Message {}".format(event['message']))
print("======================================")
except AmdSmiTimeoutException:
print("No more events")
finally:
event_reader.stop()
Example with automatic cleanup using with statement:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
with AmdSmiEventReader(processors,{AmdSmiEventCategory.RESET}, AmdSmiEventSeverity.ALL) as event_reader:
while True:
event = event_reader.read(10*1000*1000)
gpu_bdf = amdsmi_get_gpu_device_bdf(event['dev_id'])
vf_bdf = amdsmi_get_gpu_device_bdf(event['fcn_id'])
print("=============== Event ================")
print(" Time {}".format(event['timestamp']))
print(" Category {}".format(event['category'].name))
print(" Subcategory {}".format(event['subcode'].name))
print(" Level {}".format(event['level'].name))
print(" Data {}".format(event['data']))
print(" VF BDF {}".format(vf_bdf))
print(" GPU BDF {}".format(gpu_bdf))
print(" Date {}".format(event['date']))
print(" Message {}".format(event['message']))
print("======================================")
except AmdSmiTimeoutException:
print("No more events")
amdsmi_get_lib_version#
Description: Get the build version information for the currently running build of AMDSMI.
Output: amdsmi build version
Exceptions that can be thrown by amdsmi_get_lib_version function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
devices = amdsmi_get_processor_handles()
if len(devices) == 0:
print("No GPUs on machine")
else:
for device in devices:
version = amdsmi_get_lib_version()
print(version)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_accelerator_partition_profile_config#
Description: Returns gpu accelerator partition caps as currently configured in the system
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||
|
index of the default profile |
||||||||||||
|
|
AmdSmiAcceleratorPartitionResource enum:
Field |
Description |
|---|---|
|
xcc resource capabilities |
|
encoder resource capabilities |
|
decoder resource capabilities |
|
dma resource capabilities |
|
jpeg resource capabilities |
AmdSmiAcceleratorPartitionSetting enum:
Field |
Description |
|---|---|
|
invalid compute partition |
|
compute partition with all xccs in group (8/1) |
|
compute partition with four xccs in group (8/2) |
|
compute partition with two xccs in group (6/3) |
|
compute partition with two xccs in group (8/4) |
|
compute partition with one xcc in group (8/8) |
Exceptions that can be thrown by amdsmi_get_gpu_accelerator_partition_profile_config function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
accelerator_partition_config = amdsmi_get_gpu_accelerator_partition_profile_config(processor)
print(accelerator_partition_config)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_accelerator_partition_profile_config_global#
Description: Returns all GPU accelerator partition capabilities which can be configured on the system
Input parameters:
processor_handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
List of dictionaries, each describing a supported accelerator partition profile. Each dictionary contains:
|
||||||||||||||
|
Index of the default profile used if no custom configuration is set |
AmdSmiAcceleratorPartitionSetting enum:
Field |
Description |
|---|---|
|
Invalid compute partition |
|
Compute partition with all xccs in group (8/1) |
|
Compute partition with four xccs in group (8/2) |
|
Compute partition with two xccs in group (6/3) |
|
Compute partition with two xccs in group (8/4) |
|
Compute partition with one xcc in group (8/8) |
AmdSmiAcceleratorPartitionResource enum:
Field |
Description |
|---|---|
|
xcc resource capabilities |
|
encoder resource capabilities |
|
decoder resource capabilities |
|
dma resource capabilities |
|
jpeg resource capabilities |
Exceptions that can be thrown by amdsmi_get_gpu_accelerator_partition_profile_config_global function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
config = amdsmi_get_gpu_accelerator_partition_profile_config_global(processor)
print(config)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_accelerator_partition_profile#
Description: Returns current gpu accelerator partition cap
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||
|
array of ids for current accelerator profile |
AmdSmiComputePartitionResource enum:
Field |
Description |
|---|---|
|
xcc resource capabilities |
|
encoder resource capabilities |
|
decoder resource capabilities |
|
dma resource capabilities |
|
jpeg resource capabilities |
Exceptions that can be thrown by amdsmi_get_gpu_accelerator_partition_profile function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
accelerator_partition_profile = amdsmi_get_gpu_accelerator_partition_profile(processor)
print(accelerator_partition_profile)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_memory_partition_config#
Description: Returns current gpu memory partition config and mode capabilities
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
||||||||
|---|---|---|---|---|---|---|---|---|---|
|
memory partition capabilities |
||||||||
|
memory partition mode from |
||||||||
|
|
‘AmdSmiMemoryPartitionSetting’ enum:
Field |
Description |
|---|---|
|
unknown memory partition |
|
memory partition with 1 number per socket |
|
memory partition with 2 numbers per socket |
|
memory partition with 4 numbers per socket |
|
memory partition with 8 numbers per socket |
Exceptions that can be thrown by amdsmi_get_gpu_memory_partition_config function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
memory_partition_config = amdsmi_get_gpu_memory_partition_config(processor)
print(memory_partition_config)
except AmdSmiException as e:
print(e)
amdsmi_set_gpu_accelerator_partition_profile#
Description: Sets accelerator partition setting based on profile_index from amdsmi_get_gpu_accelerator_partition_profile_config
Input parameters:
processor handlePF of a GPU deviceprofile_indexRepresents index of a partition user wants to set
Output:
None
Exceptions that can be thrown by amdsmi_set_gpu_accelerator_partition_profile function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_gpu_accelerator_partition_profile(processor, 1)
except AmdSmiException as e:
print(e)
amdsmi_set_gpu_memory_partition_mode#
Description: Sets memory partition mode
Input parameters:
processor handlePF of a GPU devicesettingEnum fromAmdSmiMemoryPartitionSettingrepresenting memory partitioning mode to set
AmdSmiMemoryPartitionSetting enum:
Field |
Description |
|---|---|
|
unknown memory partition |
|
memory partition with 1 number per socket |
|
memory partition with 2 numbers per socket |
|
memory partition with 4 numbers per socket |
|
memory partition with 8 numbers per socket |
Output:
None
Exceptions that can be thrown by amdsmi_set_gpu_memory_partition_mode function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_gpu_memory_partition_mode(processor, AmdSmiMemoryPartitionSetting.NPS1)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_cper_entries#
Description: Get gpu ras cper entries
Input parameters:
processor handlePF of a GPU deviceseverity_maskRepresents different severity masks from ‘AmdSmiCperErrorSeverity’ enum on which filtering of cpers is based.
Field |
Description |
|---|---|
|
filters non-fatal-uncorrected cpers |
|
filters fatal cpers |
|
filters non_fatal_corrected cpers |
|
shows all cper types |
Output:
List of all cper errors. Each list element contains binary raw data
Exceptions that can be thrown by amdsmi_get_gpu_cper_entries function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
cper_list = amdsmi_get_gpu_cper_entries(processor, AmdSmiCperErrorSeverity.NUM)
except AmdSmiException as e:
print(e)
amdsmi_reset_gpu#
Description: Reset the GPU associated with the device with provided processor handle.
Input parameters: GPU device handle
processor_handle
Output:
None
Exceptions that can be thrown by amdsmi_reset_gpu function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_reset_gpu(processor)
except AmdSmiException as e:
print(e)
amdsmi_get_cpu_affinity_with_scope#
Description: Returns list of bitmask information for the given GPU.
Input parameters:
processor_handledevice which to queryscopeenum value for numa or socket affinity
Output: bitmask of CPU cores that this processor affinities with
Exceptions that can be thrown by amdsmi_get_cpu_affinity_with_scope function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
devices = amdsmi_get_processor_handles()
if len(devices) == 0:
print("No GPUs on machine")
else:
for device in devices:
bitmask = amdsmi_get_cpu_affinity_with_scope(device, AmdSmiAffinityScope.NUMA_SCOPE)
print(bitmask)
except AmdSmiException as e:
print(e)
amdsmi_topo_get_numa_node_number#
Description: Get the NUMA node associated with a device
Input parameters:
processor_handledevice which to query
Output: NUMA node value
Exceptions that can be thrown by amdsmi_topo_get_numa_node_number function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
devices = amdsmi_get_processor_handles()
if len(devices) == 0:
print("No GPUs on machine")
else:
for device in devices:
numa_node = amdsmi_topo_get_numa_node_number(device)
print(numa_node)
except AmdSmiException as e:
print(e)