AMD SMI Python API reference#
Python interface – Consists of Python function declarations, which directly call the C interface. The client can use the Python interface to build applications in Python.
Requirements#
python 3.10+ 64-bit
Overview#
Folder structure#
File Name |
Note |
|---|---|
|
Python package initialization file |
|
Amdsmi library python interface |
|
Python wrapper around amdsmi binary |
|
Amdsmi exceptions python file |
|
Documentation |
Build steps#
Navigate to project’s root folder and run Makefile command:
make package
Build process will create a folder build/amdsmi/package/BUILD_MODE/amdsmi, where BUILD_MODE can be Release or Debug.
The folder will contain the following files:
__init__.pyamdsmi_interface.pyamdsmi_wrapper.pyamdsmi_exception.pylibamdsmi.soREADME.md
Amdsmi usage#
Generated amdsmi folder should be copied and placed next to importing script. It should be imported as:
from amdsmi import *
try:
amdsmi_init()
# amdsmi calls ...
except AmdSmiException as e:
print(e)
finally:
try:
amdsmi_shut_down()
except AmdSmiException as e:
print(e)
To initialize amdsmi lib, amdsmi_init() must be called before all other calls to amdsmi lib.
To close connection to driver, amdsmi_shut_down() must be the last call.
Amdsmi Exceptions#
All exceptions are in amdsmi_exception.py file.
Exceptions that can be thrown are:
AmdSmiException: base smi exception classAmdSmiLibraryException: derives baseAmdSmiExceptionclass and represents errors that can occur in smi-lib. When this exception is thrown,err_codeanderr_infoare set.err_codeis an integer that corresponds to errors that can occur in smi-lib anderr_infois a string that explains the error that occurred. Example:
try:
amdsmi_init()
processors = amdsmi_get_processor_handles()
num_of_GPUs = len(processors)
if num_of_GPUs == 0:
print("No GPUs on machine")
except AmdSmiException as e:
print("Error code: {}".format(e.err_code))
if e.err_code == AmdSmiRetCode.AMDSMI_STATUS_RETRY:
print("Error info: {}".format(e.err_info))
finally:
try:
amdsmi_shut_down()
except AmdSmiException as e:
print(e)
AmdSmiRetryException: DerivesAmdSmiLibraryExceptionclass and signals processor is busy and call should be retried.AmdSmiTimeoutException: DerivesAmdSmiLibraryExceptionclass and represents that call had timed out.AmdSmiParameterException: Derives baseAmdSmiExceptionclass and represents errors related to invaild parameters passed to functions. When this exception is thrown, err_msg is set and it explains what is the actual and expected type of the parameters.AmdSmiBdfFormatException: Derives baseAmdSmiExceptionclass and represents invalid bdf format.
Amdsmi API#
amdsmi_init#
Description: Initialize smi lib and connect to driver
Input parameters: init_flags (Optional parameter, if no value provided, default value is AMDSMI_INIT_ALL_PROCESSORS value)
init_flags is AmdSmiInitFlags enum:
Field |
Description |
|---|---|
|
all processors |
|
amd cpus |
|
amd gpus |
|
non amd cpus |
|
non amd gpus |
|
amd apus |
Output: None
Exceptions that can be thrown by amdsmi_init function:
AmdSmiLibraryException
Example:
try:
amdsmi_init()
# continue with amdsmi
except AmdSmiException as e:
print("Init failed")
print(e)
amdsmi_shut_down#
Description: Finalize and close connection to driver
Input parameters: None
Output: None
Exceptions that can be thrown by amdsmi_shut_down function:
AmdSmiLibraryException
Example:
try:
amdsmi_shut_down()
except AmdSmiException as e:
print("Fini failed")
print(e)
amdsmi_get_processor_handles#
Description: Returns list of GPU device handle objects on current machine
Note: This function currently supports only AMD GPUs. To enumerate other devices, such as AMD NICs, use amdsmi_get_processor_handles_by_type().
Input parameters: None
Output: List of GPU device handles
Exceptions that can be thrown by amdsmi_get_processor_handles function:
AmdSmiLibraryException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
print(amdsmi_get_gpu_device_uuid(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_processor_handles_by_type#
Description: Returns a list of processor handles of the specified type in the system.
Input parameters:
processor_typeone ofAmdSmiProcessorTypeenum values:
Field |
Description |
|---|---|
|
Unknown processor type |
|
AMD GPU device |
|
AMD CPU device (Not supported yet) |
|
Non-AMD GPU device (Not supported yet) |
|
Non-AMD CPU device (Not supported yet) |
|
AMD CPU core (Not supported yet) |
|
AMD Accelerated Processing Unit (APU) (Not supported yet) |
|
AMD Network Interface Card (NIC) |
Output: List of processor handles of the chosen type
Exceptions that can be thrown by amdsmi_get_processor_handles_by_type function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processor_handles = amdsmi_get_processor_handles_by_type(AmdSmiProcessorType.AMD_GPU)
if len(processor_handles) == 0:
print("No GPUs on machine")
else:
for processor in processor_handles:
print(amdsmi_get_gpu_device_uuid(processor))
nic_handles = amdsmi_get_processor_handles_by_type(AmdSmiProcessorType.AMD_NIC)
if len(nic_handles) == 0:
print("No NICs on machine")
else:
for nic in nic_handles:
print(amdsmi_get_nic_asic_info(nic))
except AmdSmiException as e:
print(e)
amdsmi_get_processor_handle_from_bdf#
Description: Returns processor handle from the given BDF
Input parameters: bdf string in form of either <domain>:<bus>:<device>.<function> or <bus>:<device>.<function> in hexcode format.
Where:
<domain>is 4 hex digits long from 0000-FFFF interval<bus>is 2 hex digits long from 00-FF interval<device>is 2 hex digits long from 00-1F interval<function>is 1 hex digit long from 0-7 interval
Output: processor handle object
Exceptions that can be thrown by amdsmi_get_processor_handle_from_bdf function:
AmdSmiLibraryExceptionAmdSmiBdfFormatException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
print(amdsmi_get_gpu_device_uuid(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_device_bdf#
Description: Returns BDF of the given processor (GPU PF).
Input parameters:
GPU device for which to query
Output: BDF string in form of <domain>:<bus>:<device>.<function> in hexcode format.
Where:
<domain>is 4 hex digits long from 0000-FFFF interval<bus>is 2 hex digits long from 00-FF interval<device>is 2 hex digits long from 00-1F interval<function>is 1 hex digit long from 0-7 interval
Exceptions that can be thrown by amdsmi_get_gpu_device_bdf function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
print("GPU bdf:", amdsmi_get_gpu_device_bdf(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_nic_device_bdf#
Description: Returns BDF of the given processor (NIC).
Input parameters:
NIC device for which to query
Output: BDF string in form of <domain>:<bus>:<device>.<function> in hexcode format.
Where:
<domain>is 4 hex digits long from 0000-FFFF interval<bus>is 2 hex digits long from 00-FF interval<device>is 2 hex digits long from 00-1F interval<function>is 1 hex digit long from 0-7 interval
Exceptions that can be thrown by amdsmi_get_nic_device_bdf function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
print("NIC bdf:", amdsmi_get_nic_device_bdf(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_processor_bdf#
Description: Returns BDF of the given processor.
Input parameters:
Processor for which to query
Output: BDF string in form of <domain>:<bus>:<device>.<function> in hexcode format.
Where:
<domain>is 4 hex digits long from 0000-FFFF interval<bus>is 2 hex digits long from 00-FF interval<device>is 2 hex digits long from 00-1F interval<function>is 1 hex digit long from 0-7 interval
Exceptions that can be thrown by amdsmi_get_processor_bdf function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
print("processor bdf:", amdsmi_get_processor_bdf(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_index_from_processor_handle#
Description: Returns the index of the given processor handle
Input parameters:
GPU device for which to query
Output: GPU device index
Exceptions that can be thrown by amdsmi_get_index_from_processor_handle function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
print("Processor's index:", amdsmi_get_index_from_processor_handle(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_processor_handle_from_index#
Description: Returns the processor handle from the given processor index
Input parameters:
Function processor index to query
Output: processor handle object
Exceptions that can be thrown by amdsmi_get_processor_handle_from_index function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processors = amdsmi_get_processor_handles()
num_of_GPUs = len(processors)
if num_of_GPUs == 0:
print("No GPUs on machine")
else:
for index in range(num_of_GPUs):
print("Processor handle:", amdsmi_get_processor_handle_from_index(index))
except AmdSmiException as e:
print(e)
amdsmi_get_vf_bdf#
Description: Returns BDF of the given VF
Input parameters:
VF for which to query
Output: BDF string in form of <domain>:<bus>:<device>.<function> in hexcode format.
Where:
<domain>is 4 hex digits long from 0000-FFFF interval<bus>is 2 hex digits long from 00-FF interval<device>is 2 hex digits long from 00-1F interval<function>is 1 hex digit long from 0-7 interval
Exceptions that can be thrown by amdsmi_get_vf_bdf function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
vf = amdsmi_get_vf_handle_from_bdf("0000:23:02.0")
print("VF's bdf:", amdsmi_get_vf_bdf(vf))
except AmdSmiException as e:
print(e)
amdsmi_get_vf_handle_from_bdf#
Description: Returns processor handle (VF) from the given BDF
Input parameters: bdf string in form of either <domain>:<bus>:<device>.<function> or <bus>:<device>.<function> in hexcode format.
Where:
<domain>is 4 hex digits long from 0000-FFFF interval<bus>is 2 hex digits long from 00-FF interval<device>is 2 hex digits long from 00-1F interval<function>is 1 hex digit long from 0-7 interval
Output: processor handle object
Exceptions that can be thrown by amdsmi_get_vf_handle_from_bdf function:
AmdSmiLibraryExceptionAmdSmiBdfFormatException
Example:
try:
vf = amdsmi_get_vf_handle_from_bdf("0000:23:02.0")
print(amdsmi_get_vf_uuid(vf))
except AmdSmiException as e:
print(e)
amdsmi_get_processor_handle_from_uuid#
Description: Returns processor handle from the given UUID
Input parameters: uuid string Output: processor handle object
Exceptions that can be thrown by amdsmi_get_processor_handle_from_uuid function:
AmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_uuid("fcff7460-0000-1000-80e9-b388cfe84658")
print("Processor's UUID: ", amdsmi_get_gpu_device_uuid(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_vf_handle_from_uuid#
Description: Returns the handle of a virtual function (VF) from the given UUID
Input parameters: uuid string Output: vf object
Exceptions that can be thrown by amdsmi_get_vf_handle_from_uuid function:
AmdSmiLibraryException
Example:
try:
vf = amdsmi_get_vf_handle_from_uuid("87007460-0000-1000-8059-3ae746ab9206")
print("VF's UUID: ", amdsmi_get_vf_uuid(vf))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_device_uuid#
Description: Returns the UUID of the device
Input parameters:
GPU device for which to query
Output: UUID string unique to the device
Exceptions that can be thrown by amdsmi_get_gpu_device_uuid function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
print("Device UUID: ", amdsmi_get_gpu_device_uuid(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_vf_uuid#
Description: Returns the UUID of the device
Input parameters:
VF handle for which to query
Output: UUID string unique to the device
Exceptions that can be thrown by amdsmi_get_vf_uuid function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
vf_id = amdsmi_get_vf_handle_from_vf_index(processor_handle, 0)
print("VF UUID: ", amdsmi_get_vf_uuid(vf_id))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_driver_info#
Description: Returns the version string of the driver
Input parameters:
processor_handleGPU device for which to query
Output:
driver_nameDriver name string that is handling the GPU devicedriver_versionDriver version string that is handling the GPU devicedriver_dateDriver date string that is handling the GPU device
Exceptions that can be thrown by amdsmi_get_gpu_driver_info function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
driver_info = amdsmi_get_gpu_driver_info(processor)
print("Driver name: ", driver_info.driver_name)
print("Driver version: ", driver_info.driver_version)
print("Driver date: ", driver_info.driver_date)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_driver_model#
Description: Returns driver model information
Input parameters:
processor_handleGPU device for which to query
Output:
current driver model from
AmdSmiDriverModelTypeenum
AmdSmiDriverModelType enum:
Field |
Description |
|---|---|
|
Windows Display Driver Model |
|
Windows Driver Model |
|
Compute Driver Model |
Exceptions that can be thrown by amdsmi_get_gpu_driver_model function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
driver_model = amdsmi_get_gpu_driver_model(processor)
print("Driver model: ", driver_model)
except AmdSmiException as e:
print(e)
amdsmi_get_vf_handle_from_vf_index#
Description: Returns VF id of the VF referenced by its index (in partitioning info)
Input parameters:
processor handlePF or child VF of a GPU device for which to queryVF's indexIndex of VF (0-31) in GPU’s partitioning info
Output:
VF id
Exceptions that can be thrown by amdsmi_get_vf_handle_from_vf_index function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
vf_id = amdsmi_get_vf_handle_from_vf_index(processor_handle, 0)
print(amdsmi_get_vf_info(vf_id))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_total_ecc_count#
Description: Returns the number of ECC errors on the GPU device
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
Count of ECC correctable errors |
|
Count of ECC uncorrectable errors |
|
Count of ECC deferred errors |
Exceptions that can be thrown by amdsmi_get_gpu_total_ecc_count function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
ecc_errors = amdsmi_get_gpu_total_ecc_count(processor)
print(ecc_errors['correctable_count'])
print(ecc_errors['uncorrectable_count'])
print(ecc_errors['deferred_count'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_ecc_count#
Description: Returns the number of ECC errors on the GPU device for the given block
Input parameters:
processor_handleGPU device which to queryblockThe block for which error counts should be retrieved
block is AmdSmiGpuBlock enum:
Field |
Description |
|---|---|
|
UMC block |
|
SDMA block |
|
GFX block |
|
MMHUB block |
|
ATHUB block |
|
PCIE_BIF block |
|
HDP block |
|
XGMI_WAFL block |
|
DF block |
|
SMN block |
|
SEM block |
|
MP0 block |
|
MP1 block |
|
FUSE block |
|
MCA block |
|
VCN block |
|
JPEG block |
|
IH block |
|
MPIO block |
Output: Dictionary with fields
Field |
Description |
|---|---|
|
Count of ECC correctable errors |
|
Count of ECC uncorrectable errors |
|
Count of ECC deferred errors |
Exceptions that can be thrown by amdsmi_get_gpu_ecc_count function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
ecc_errors = amdsmi_get_gpu_ecc_count(processor, AmdSmiGpuBlock.UMC)
print(ecc_errors['correctable_count'])
print(ecc_errors['uncorrectable_count'])
print(ecc_errors['deferred_count'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_ecc_enabled#
Description: Returns ECC capabilities (disable/enable) for each GPU block.
Input parameters:
processor_handleGPU device which to query
Output: Dictionary of each GPU block and its value (False if the block is not enabled, True if the block is enabled)
Each GPU block in the dictionary is from AmdSmiGpuBlock enum:
Field |
Description |
|---|---|
|
UMC block |
|
SDMA block |
|
GFX block |
|
MMHUB block |
|
ATHUB block |
|
PCIE_BIF block |
|
HDP block |
|
XGMI_WAFL block |
|
DF block |
|
SMN block |
|
SEM block |
|
MP0 block |
|
MP1 block |
|
FUSE block |
|
MCA block |
|
VCN block |
|
JPEG block |
|
IH block |
|
MPIO block |
Exceptions that can be thrown by amdsmi_get_gpu_ecc_enabled function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
ecc_status = amdsmi_get_gpu_ecc_enabled(processor, AmdSmiGpuBlock.UMC)
print(ecc_status)
except AmdSmiException as e:
print(e)
amdsmi_status_code_to_string#
Description: Get a description of a provided AMDSMI error status
Input parameters:
statusThe error status for which a description is desired
Output: String description of the provided error code
Exceptions that can be thrown by amdsmi_status_code_to_string function:
AmdSmiParameterException
Example:
try:
status_str = amdsmi_status_code_to_string(AmdSmiRetCode.SUCCESS)
print(status_str)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_ras_feature_info#
Description: Returns RAS feature info
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
RAS EEPROM version |
|
ecc correction schema mask used with |
Exceptions that can be thrown by amdsmi_get_gpu_ras_feature_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
ras_feature = amdsmi_get_gpu_ras_feature_info(processor)
print(ras_feature['ras_eeprom_version'])
print(ras_feature['ecc_correction_schema'])
except AmdSmiException as e:
print(e)
amdsmi_get_bad_page_threshold#
Description: Returns bad page threshold
Input parameters:
processor_handleGPU device which to query
Output: Bad page threshold value
Exceptions that can be thrown by amdsmi_get_bad_page_threshold function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
bad_page_threshold = amdsmi_get_bad_page_threshold(processor)
print(bad_page_threshold)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_bad_page_info#
Description: Returns bad page info.
Input parameters:
processor handle objectPF of a GPU device to query
Output: list of dictionaries with fields for each bad page
Field |
Description |
|---|---|
|
64K/4K Driver managed location that is blocked from further use |
|
Marks the last time when the RAS event was observed |
|
this value identifies the memory channel the issue has been reported on |
|
this value identifies the memory controller the issue has been reported on |
Exceptions that can be thrown by amdsmi_get_gpu_bad_page_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23.00.0")
bad_page_info = amdsmi_get_gpu_bad_page_info(processor)
if len(bad_page_info) == 0:
print("no bad pages")
else:
for table_record in bad_page_info:
print(hex(table_record["retired_page"]))
print(datetime.fromtimestamp(table_record['ts']).strftime('%Y/%m/%d:%H/%M/%S'))
print(table_record['mem_channel'])
print(table_record['mcumc_id'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_asic_info#
Description: Returns asic information for the given GPU
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Content |
|---|---|
|
market name |
|
vendor id |
|
vendor name |
|
subsystem vendor id |
|
unique id of a GPU |
|
revision id |
|
asic serial |
|
xgmi physical id |
|
num of compute units (Not supported yet, currently hardcoded to 0) |
|
target graphics version (Not supported yet, currently hardcoded to 0) |
|
subsystem device id |
Exceptions that can be thrown by amdsmi_get_gpu_asic_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
asic_info = amdsmi_get_gpu_asic_info(processor)
print(asic_info['market_name'])
print(asic_info['vendor_id'])
print(asic_info['vendor_name'])
print(asic_info['subvendor_id'])
print(asic_info['device_id'])
print(asic_info['subsystem_id'])
print(asic_info['rev_id'])
print(asic_info['asic_serial'])
print(asic_info['oam_id'])
print(asic_info['num_of_compute_units'])
print(asic_info['target_graphics_version'])
except AmdSmiException as e:
print(e)
amdsmi_get_pcie_info#
Description: Returns static and metric information about PCIe link for the given GPU
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Content |
||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||||||||||
|
|
Exceptions that can be thrown by amdsmi_get_pcie_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
pcie_info = amdsmi_get_pcie_info(processor)
print(pcie_info['pcie_static']['max_pcie_width'])
print(pcie_info['pcie_static']['max_pcie_speed'])
print(pcie_info['pcie_static']['slot_type'])
print(pcie_info['pcie_static']['max_pcie_interface_version'])
print(pcie_info['pcie_metric']['pcie_speed'])
print(pcie_info['pcie_metric']['pcie_width'])
print(pcie_info['pcie_metric']['pcie_bandwidth'])
print(pcie_info['pcie_metric']['pcie_interface_version'])
print(pcie_info['pcie_metric']['pcie_replay_count'])
print(pcie_info['pcie_metric']['pcie_l0_to_recovery_count'])
print(pcie_info['pcie_metric']['pcie_replay_roll_over_count'])
print(pcie_info['pcie_metric']['pcie_nak_sent_count'])
print(pcie_info['pcie_metric']['pcie_nak_received_count'])
print(pcie_info['pcie_metric']['pcie_lc_perf_other_end_recovery_count'])
except AmdSmiException as e:
print(e)
amdsmi_get_power_cap_info#
Description: Returns dictionary of power capabilities as currently configured on the given GPU
Input parameters:
processor_handleGPU device which to querysensor_indsensor index. Normally, this will be 0. If a processor has more than one sensor, it could be greater than 0. Parameter sensor_ind is unused on @platform{host}. It is an optional parameter and is set to 0 by default.
Output: Dictionary with fields
Field |
Description |
|---|---|
|
power capability |
|
default power capability |
|
dynamic power management capability |
|
minimum power capability |
|
maximum power capability |
Exceptions that can be thrown by amdsmi_get_power_cap_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
power_info = amdsmi_get_power_cap_info(processor)
print(power_info['power_cap'])
print(power_info['default_power_cap'])
print(power_info['dpm_cap'])
print(power_info['min_power_cap'])
print(power_info['max_power_cap'])
except AmdSmiException as e:
print(e)
amdsmi_get_fb_layout#
Description: Returns framebuffer related information for the given GPU
Input parameters:
processor handlePF of a GPU device for which to query
Output: Dictionary with field
Field |
Description |
|---|---|
|
total framebuffer size in MB |
|
framebuffer reserved space in MB |
|
framebuffer offset in MB |
|
framebuffer alignment in MB |
|
maximum usable framebuffer size in MB |
|
minimum usable framebuffer size in MB |
Exceptions that can be thrown by amdsmi_get_fb_layout function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
fb_info = amdsmi_get_fb_layout(processor)
print(fb_info['total_fb_size'])
print(fb_info['pf_fb_reserved'])
print(fb_info['pf_fb_offset'])
print(fb_info['fb_alignment'])
print(fb_info['max_vf_fb_usable'])
print(fb_info['min_vf_fb_usable'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_activity#
Description: Returns the engine usage for the given GPU.
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
graphics engine usage/activity percentage (0 - 100) |
|
memory/UMC engine usage/activity percentage (0 - 100) |
|
average multimedia engine usages/activities in percentage (0 - 100) |
Exceptions that can be thrown by amdsmi_get_gpu_activity function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
engine_activity = amdsmi_get_gpu_activity(processor)
print(engine_activity['gfx_activity'])
print(engine_activity['umc_activity'])
print(engine_activity['mm_activity'])
except AmdSmiException as e:
print(e)
amdsmi_get_power_info#
Description: Returns the current power, power limit, and voltage for the given GPU
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
Note: socket_power can rarely spike above the socket power limit in some cases |
|
|
socket power |
|
gfx voltage |
|
socket voltage |
|
memory voltage |
Exceptions that can be thrown by amdsmi_get_power_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
power_info = amdsmi_get_power_info(processor)
print(power_info['socket_power'])
print(power_info['gfx_voltage'])
print(power_info['soc_voltage'])
print(power_info['mem_voltage'])
except AmdSmiException as e:
print(e)
amdsmi_set_power_cap#
Description: Sets GPU power cap.
Input parameters:
processor handleprocessor handlesensor_indsensor index. Normally, this will be 0. If a processor has more than one sensor, it could be greater than 0. Parameter sensor_ind is unused on @platform{host}.capvalue representing power cap to set. The value must be between the minimum (min_power_cap) and maximum (max_power_cap) power cap values, which can be obtained from ::amdsmi_power_cap_info_t.
Output:
None
Exceptions that can be thrown by amdsmi_set_power_cap function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
sensor_ind = 0
for processor in processors:
power_info = amdsmi_get_power_cap_info(processor)
power_limit = random.randint(power_info['min_power_cap'], power_info['max_power_cap'])
amdsmi_set_power_cap(processor, sensor_ind, power_limit)
except AmdSmiException as e:
print(e)
amdsmi_is_gpu_power_management_enabled#
Description: Returns is power management enabled
Input parameters:
processor_handleGPU device which to query
Output: Bool true if power management enabled else false
Exceptions that can be thrown by amdsmi_is_gpu_power_management_enabled function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
is_power_management_enabled = amdsmi_is_gpu_power_management_enabled(processor)
print(is_power_management_enabled)
except AmdSmiException as e:
print(e)
amdsmi_get_temp_metric#
Description: Returns the current temperature or limit temperature for the given processor
Input parameters:
processor_handleGPU device which to querythermal_domainone ofAmdSmiTemperatureTypeenum values:
Field |
Description |
|---|---|
|
edge thermal domain |
|
hotspot/junction thermal domain |
|
memory/vram thermal domain |
|
plx thermal domain (Not supported yet) |
|
HBM 0 thermal domain (Not supported yet) |
|
HBM 1 thermal domain (Not supported yet) |
|
HBM 2 thermal domain (Not supported yet) |
|
HBM 3 thermal domain (Not supported yet) |
thermal_metricone ofAmdSmiTemperatureMetricenum values:
Field |
Description |
|---|---|
|
current thermal metric |
|
max thermal metric (Not supported yet) |
|
min thermal metric (Not supported yet) |
|
max hyst thermal metric (Not supported yet) |
|
min hyst thermal metric (Not supported yet) |
|
limit thermal metric |
|
critical hyst metric (Not supported yet) |
|
emergency thermal metric (Not supported yet) |
|
emergency hyst thermal metric (Not supported yet) |
|
critical min thermal metric (Not supported yet) |
|
critical min hyst thermal metric (Not supported yet) |
|
offset thermal metric (Not supported yet) |
|
lowest thermal metric (Not supported yet) |
|
highest thermal metric (Not supported yet) |
|
shutdown thermal metric |
Output: Temperature value
Exceptions that can be thrown by amdsmi_get_temp_metric function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for device in processors:
print("=============== EDGE THERMAL DOMAIN ================")
thermal_measure = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.EDGE, AmdSmiTemperatureMetric.CURRENT)
print("Current temperature:")
print(thermal_measure)
thermal_measure = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.EDGE, AmdSmiTemperatureMetric.CRITICAL)
print("Limit temperature:")
print(thermal_measure)
print("=============== HOTSPOT THERMAL DOMAIN ================")
thermal_measure = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.HOTSPOT, AmdSmiTemperatureMetric.CURRENT)
print("Current temperature:")
print(thermal_measure)
thermal_measure = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.HOTSPOT, AmdSmiTemperatureMetric.CRITICAL)
print("Limit temperature:")
print(thermal_measure)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_cache_info#
Description: Returns the cache info for the given processor
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
AmdSmiCacheProperty enum values
Field |
Description |
|---|---|
|
Cache enabled |
|
Data cache |
|
Inst cache |
|
CPU cache |
|
SIMD cache |
Exceptions that can be thrown by amdsmi_get_gpu_cache_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for device in processors:
cache_info = amdsmi_get_gpu_cache_info(device)
for cache in cache_info["cache"]:
print(cache["cache_properties"])
print(cache["cache_size"])
print(cache["cache_level"])
print(cache["max_num_cu_shared"])
print(cache["num_cache_instance"])
except AmdSmiException as e:
print(e)
amdsmi_get_clock_info#
Description: Returns the clock measurements for the given GPU
Input parameters:
processor_handleGPU device which to queryclock_domainone ofAmdSmiClkTypeenum values:
Field |
Description |
|---|---|
|
system clock domain |
|
gfx clock domain |
|
Data Fabric clock (for ASICs running on a separate clock) domain (Not supported yet) |
|
Display Controller Engine clock domain (Not supported yet) |
|
SOC clock domain (Not supported yet) |
|
memory clock domain |
|
PCIe clock domain (Not supported yet) |
|
first multimedia engine (VCLK0) clock domain |
|
second multimedia engine (VCLK1) clock domain |
|
DCLK0 clock domain |
|
DCLK1 clock domain |
Output: Dictionary with fields
Field |
Description |
|---|---|
|
current clock value for the given domain |
|
minimum clock value for the given domain |
|
maximum clock value for the given domain |
|
clock locked flag only supported on GFX clock domain |
|
clock deep sleep mode flag |
Exceptions that can be thrown by amdsmi_get_clock_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
print("=============== GFX CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.GFX)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_locked'])
print(clock_measure['clk_deep_sleep'])
print("=============== MEM CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.MEM)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== SYS CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.SYS)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== VCLK0 CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.VCLK0)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== VCLK1 CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.VCLK1)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== DCLK0 CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.DCLK0)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== DCLK1 CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.DCLK1)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_vram_info#
Description: Returns the static information for the VRAM info
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
VRAM type from |
|
VRAM vendor from |
|
VRAM size in MB |
|
VRAM bit width |
AmdSmiVramType enum:
Field |
Description |
|---|---|
|
UNKNOWN VRAM type |
|
HBM VRAM type |
|
HBM2 VRAM type |
|
HBM2E VRAM type |
|
HBM3 VRAM type |
|
HBM3E VRAM type |
|
DDR2 VRAM type |
|
DDR3 VRAM type |
|
DDR4 VRAM type |
|
GDDR1 VRAM type |
|
GDDR2 VRAM type |
|
GDDR3 VRAM type |
|
GDDR4 VRAM type |
|
GDDR5 VRAM type |
|
GDDR6 VRAM type |
|
GDDR7 VRAM type |
AmdSmiVramVendor enum:
Field |
Description |
|---|---|
|
SAMSUNG VRAM vendor |
|
INFINEON VRAM vendor |
|
ELPIDA VRAM vendor |
|
ETRON VRAM vendor |
|
NANYA VRAM vendor |
|
HYNIX VRAM vendor |
|
MOSEL VRAM vendor |
|
WINBOND VRAM vendor |
|
ESMT VRAM vendor |
|
MICRON VRAM vendor |
|
UNKNOWN VRAM vendor |
Exceptions that can be thrown by amdsmi_get_gpu_vram_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
vram_info = amdsmi_get_gpu_vram_info(processor)
print(vram_info['vram_type'])
print(vram_info['vram_vendor'])
print(vram_info['vram_size'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_vbios_info#
Description: Returns the static information for the VBIOS on the GPU device.
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
vbios name |
|
vbios build date |
|
vbios part number |
|
vbios version string |
|
boot firmware info |
Exceptions that can be thrown by amdsmi_get_gpu_vbios_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
vbios_info = amdsmi_get_gpu_vbios_info(processor)
print(vbios_info['name'])
print(vbios_info['build_date'])
print(vbios_info['part_number'])
print(vbios_info['version'])
print(vbios_info['boot_firmware'])
except AmdSmiException as e:
print(e)
amdsmi_get_fw_info#
Description: Returns the firmware information for the given GPU.
Input parameters:
processor_handleGPU device which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
List of dictionaries that contain information about a certain firmware block |
Exceptions that can be thrown by amdsmi_get_fw_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
firmware_list = amdsmi_get_fw_info(processor)
for firmware_block in firmware_list:
print(firmware_block['fw_id'])
print(firmware_block['fw_version'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_board_info#
Description: Returns board related information for the given GPU
Input parameters:
GPU device handle object
Output: Dictionary with fields
Field |
Description |
|---|---|
|
board model number |
|
board product serial number |
|
fru (field-replaceable unit) id |
|
board product name |
|
board manufacturer name |
Exceptions that can be thrown by amdsmi_get_gpu_board_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
board_info = amdsmi_get_gpu_board_info(processor)
print(board_info['model_number'])
print(board_info['product_serial'])
print(board_info['fru_id'])
print(board_info['manufacturer_name'])
print(board_info['product_name'])
except AmdSmiException as e:
print(e)
amdsmi_get_num_vf#
Description: Returns number of enabled VFs and number of supported VFs for the given GPU
Input parameters:
processor handlePF of a GPU device for which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
number of enabled VFs |
|
number of supported VFs |
Exceptions that can be thrown by amdsmi_get_num_vf function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
num_vf = amdsmi_get_num_vf(processor)
print(num_vf['num_vf_enabled'])
print(num_vf['num_vf_supported'])
except AmdSmiException as e:
print(e)
amdsmi_get_vf_partition_info#
Description: Returns array of the current framebuffer partitioning structures on the given GPU
Input parameters:
processor handle objectPF of a GPU device for which to query
Output: Array of dictionary with fields
Field |
Description |
||||||
|---|---|---|---|---|---|---|---|
|
VF handle |
||||||
|
|
Exceptions that can be thrown by amdsmi_get_vf_partition_info function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
partitions = amdsmi_get_vf_partition_info(processor)
print(partitions[0]['fb']['fb_size'])
# partitions[0]['fb']['fb_size'] is frame buffer size of the first VF on the given GPU
# we can access any VF from the array via its index in partitions list
except AmdSmiException as e:
print(e)
amdsmi_set_num_vf#
Description: Set number of enabled VFs for the given GPU
Input parameters:
processor_handleGPU device which to querynumber of enabled VFs to be set
Output: None
Exceptions that can be thrown by amdsmi_set_num_vf function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_num_vf(processor,2)
except AmdSmiException as e:
print(e)
amdsmi_clear_vf_fb#
Description: Clears framebuffer of the given VF on the given GPU.
If trying to clear the framebuffer of an active function,
the call will fail
Input parameters:
VF device handle
Output: None
Exceptions that can be thrown by amdsmi_clear_vf_fb function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for device in devices:
partitions = amdsmi_get_vf_partition_info(device)
amdsmi_clear_vf_fb(partitions[0]['vf_id'])
# partitions[0]['vf_id'] is handle of the first VF on the given GPU
except AmdSmiException as e:
print(e)
amdsmi_get_vf_data#
Description: Returns the scheduler information and guard structure for the given VF.
Input parameters:
VF handle
Output: Dictionary with fields
Field |
Description |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
function level reset counter |
||||||||||||
|
boot up time in microseconds |
||||||||||||
|
shutdown time in microseconds |
||||||||||||
|
reset time in microseconds |
||||||||||||
|
vf state |
||||||||||||
|
last boot start time |
||||||||||||
|
last boot end time |
||||||||||||
|
last shutdown start time |
||||||||||||
|
last shutdown end time |
||||||||||||
|
last reset start time |
||||||||||||
|
last reset end time |
||||||||||||
|
current session active time, reset after guest reload |
||||||||||||
|
current session running time, reset after guest reload |
||||||||||||
|
total active time, reset after host reload |
||||||||||||
|
total running time, reset after host reload |
||||||||||||
|
show if guard info is enabled for VF |
||||||||||||
|
|
AmdSmiGuardType enum values are keys in guard dictionary
Field |
Description |
|---|---|
|
function level reset status |
|
exclusive access mode status |
|
exclusive access time out status |
|
generic interrupt status |
State is AmdSmiGuardState enum object with values
Field |
Description |
|---|---|
|
the event number is within the threshold |
|
the event number hits the threshold |
|
the event number is bigger than the threshold |
State is AmdSmiVfState enum object with values
Field |
Description |
|---|---|
|
vf state unavailable |
|
vf state available |
|
vf state active |
|
vf state suspended |
|
vf state fullaccess |
|
same as available, indicates this is a default VF |
Exceptions that can be thrown by amdsmi_get_vf_data function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
partitions = amdsmi_get_vf_partition_info(processor)
vf_data = amdsmi_get_vf_data(partitions[0]['vf_id'])
sched_info = vf_data['sched']
guard_info = vf_data['guard']
print(sched_info['boot_up_time'])
print(sched_info['flr_count'])
print(sched_info['state'].name)
print(sched_info['last_boot_start'])
print(sched_info['last_boot_end'])
print(sched_info['last_shutdown_start'])
print(sched_info['last_shutdown_end'])
print(sched_info['shutdown_time'])
print(sched_info['last_reset_start'])
print(sched_info['last_reset_end'])
print(sched_info['reset_time'])
print(sched_info['current_active_time'])
print(sched_info['current_running_time'])
print(sched_info['total_active_time'])
print(sched_info['total_running_time'])
print(guard_info['enabled'])
for guard_type in guard_info['guard']:
print("type: {} ".format(guard_type))
print("state: {}".format(guard_info['guard'][guard_type]['state']))
print("amount: {}".format(guard_info['guard'][guard_type]['amount']))
print("interval: {}".format(guard_info['guard'][guard_type]['interval']))
print("threshold: {}".format(guard_info['guard'][guard_type]['threshold']))
print("active: {}".format(guard_info['guard'][guard_type]['active']))
print("==================")
except AmdSmiException as e:
print(e)
amdsmi_get_vf_info#
Description: Returns the configuration structure for a given VF
Input parameters:
VF handle
Output: Dictionary with fields
Field |
Description |
||||||
|---|---|---|---|---|---|---|---|
|
|
||||||
|
gfx timeslice in us |
Exceptions that can be thrown by amdsmi_get_vf_info function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
partitions = amdsmi_get_vf_partition_info(processor)
# partitions[0]['vf_id'] is handle of the first VF on the given GPU
config = amdsmi_get_vf_info(partitions[0]['vf_id'])
print("fb_offset: {}".format(config['fb']['fb_offset']))
print("fb_size: {}".format(config['fb']['fb_size']))
print("gfx_timeslice : {}".format(config['gfx_timeslice']))
except AmdSmiException as e:
print(e)
amdsmi_get_guest_data#
Description: Gets guest OS information of the queried VF
Input parameters:
processor handleVF of a GPU device
Output: Dictionary with fields
Field |
Description |
|---|---|
|
driver version |
|
fb usage in MB |
Exceptions that can be thrown by amdsmi_get_guest_data function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
num_vf_enabled = amdsmi_get_num_vf(processor)['num_vf_enabled']
partitions = amdsmi_get_vf_partition_info(processor)
for i in range(0, num_vf_enabled):
guest_data = amdsmi_get_guest_data(partitions[i]['vf_id'])
print(guest_data)
except AmdSmiException as e:
print(e)
amdsmi_get_fw_error_records#
Description: Gets firmware error records
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with field err_records, which is list of elements
Field |
Description |
|---|---|
|
system time in seconds |
|
vf index |
|
firmware id |
|
firmware load status |
Exceptions that can be thrown by amdsmi_get_fw_error_records function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
err_records = amdsmi_get_fw_error_records(processor)
print(err_records)
except AmdSmiException as e:
print(e)
amdsmi_get_dfc_fw_table#
Description: Gets dfc firmware table
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with field header, and data which is a list of elements
Each header is a dictionary with following fields:
Field |
Description |
|---|---|
|
dfc firmware version |
|
number of entries in the dfc table |
|
gart wr guest min |
|
gart wr guest max |
Each data entry is a dictionary with following fields:
Field |
Description |
|---|---|
|
dfc firmware type |
|
verification enabled |
|
customer ordinal |
|
white list |
|
black list |
Each white list entry is a dictionary with following fields:
Field |
Description |
|---|---|
|
latest |
|
oldest |
Exceptions that can be thrown by amdsmi_get_dfc_fw_table function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
dfc_table = amdsmi_get_dfc_fw_table(processor)
print(dfc_table)
except AmdSmiException as e:
print(e)
amdsmi_get_vf_fw_info#
Description: Returns GPU firmware related information.
Input parameters:
processor handleVF of a GPU device for which to query
Output: Dictionary with field fw_list, which is list of elements
If microcode of certain type is not loaded, version will be 0.
Field |
Description |
||||||
|---|---|---|---|---|---|---|---|
|
|
Exceptions that can be thrown by amdsmi_get_vf_fw_info function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
partitions = amdsmi_get_vf_partition_info(processor)
fw_info = amdsmi_get_vf_fw_info(partitions[0]['vf_id'])
fw_num = len(fw_info['fw_list'])
for j in range(0, fw_num):
fw = fw_info['fw_list'][j]
print(fw['fw_name'].name)
print(fw['fw_version'])
except AmdSmiException as e:
print(e)
amdsmi_get_partition_profile_info#
Description: Gets partition profile info
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields, current_profile and profiles
Field |
Description |
|---|---|
|
current profile index |
|
list of all profiles |
Where, profiles is a list containing
Field |
Description |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
number of vfs |
||||||||||||
|
|
Keys for profile_caps dictionary are in AmdSmiProfileCapabilityType enum
Field |
Description |
|---|---|
|
memory |
|
encode engine |
|
decode engine |
|
compute engine |
Exceptions that can be thrown by amdsmi_get_partition_profile_info function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
profile_info = amdsmi_get_partition_profile_info(processor)
print(profile_info)
except AmdSmiException as e:
print(e)
amdsmi_get_link_metrics#
Description: Gets link metric information
Input parameters:
processor handlePF of a GPU device
Output: links list of dictionaries with fields for each link
Field |
Description |
|---|---|
|
BDF of the given processor |
|
current link speed in Gb/s |
|
max bandwidth of the link |
|
type of the link from |
|
total data received for each link in KB |
|
total data transferred for each link in KB |
AmdSmiLinkType enum:
Field |
Description |
|---|---|
|
Unknown |
|
XGMI link type |
|
PCIe link type |
|
Link not applicable |
|
Unknown |
Exceptions that can be thrown by amdsmi_get_link_metrics function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
link_metrics = amdsmi_get_link_metrics(processor)
print(link_metrics)
except AmdSmiException as e:
print(e)
amdsmi_get_link_topology#
Description: Gets link topology information between two connected processors
Input parameters:
source processor handlePF of a source GPU devicedestination processor handlePF of a destination GPU device
Output: Dictionary with fields
Field |
Description |
|---|---|
|
link weight between two GPUs |
|
HW status of the link |
|
type of the link from |
|
number of hops between two GPUs |
|
framebuffer sharing between two GPUs |
AmdSmiLinkType enum:
Field |
Description |
|---|---|
|
Unknown |
|
XGMI link type |
|
PCIe link type |
|
Link not applicable |
|
Unknown |
Exceptions that can be thrown by amdsmi_get_link_topology function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for src_processor in processors:
for dst_processor in processors:
link_topology = amdsmi_get_link_topology(src_processor, dst_processor)
print(link_topology)
except AmdSmiException as e:
print(e)
amdsmi_get_link_topology_nearest#
Description: Retrieve the set of GPUs that are nearest to a given device at a specific interconnectivity level.
Input parameters:
processor_handleThe identifier of the given device.link_typeThe AmdSmiLinkType level to search for nearest devices
AmdSmiLinkType enum:
Field |
Description |
|---|---|
|
Unknown |
|
XGMI link type |
|
PCIe link type |
|
Link not applicable |
|
Unknown |
Output: Dictionary holding the following fields.
processor_listlist of all nearest processor handles found
Exceptions that can be thrown by amdsmi_get_link_topology_nearest function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
nearest_gpus = amdsmi_get_link_topology_nearest(processor, AmdSmiLinkType.PCIE)
if (len(nearest_gpus['processor_list'])) == 0:
print("No nearest GPUs found on machine")
else:
print("Nearest GPUs")
for gpu in nearest_gpus['processor_list']:
print(amdsmi_get_gpu_device_uuid(gpu))
except AmdSmiException as e:
print(e)
amdsmi_get_xgmi_fb_sharing_caps#
Description: Gets XGMI capabilities
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
|---|---|
|
flag that indicates if custom mode is supported (Not supported yet) |
|
flag that indicates if mode_1 is supported |
|
flag that indicates if mode_2 is supported |
|
flag that indicates if mode_4 is supported |
|
flag that indicates if mode_8 is supported |
Exceptions that can be thrown by amdsmi_get_xgmi_fb_sharing_caps function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
caps = amdsmi_get_xgmi_fb_sharing_caps(processor)
print(caps)
except AmdSmiException as e:
print(e)
amdsmi_get_xgmi_fb_sharing_mode_info#
Description: Gets XGMI framebuffer sharing information between two GPUs
Input parameters:
source processor handlePF of a source GPU devicedestination processor handlePF of a destination GPU devicemodeframebuffer sharing mode fromAmdSmiXgmiFbSharingModeenum
AmdSmiXgmiFbSharingMode enum:
Field |
Description |
|---|---|
|
custom framebuffer sharing mode (Not supported yet) |
|
framebuffer sharing mode_1 |
|
framebuffer sharing mode_2 |
|
framebuffer sharing mode_4 |
|
framebuffer sharing mode_8 |
Output: Value indicating whether framebuffer sharing is enabled between two GPUs
Exceptions that can be thrown by amdsmi_get_xgmi_fb_sharing_mode_info function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for src_processor in processors:
for dst_processor in processors:
fb_sharing = amdsmi_get_xgmi_fb_sharing_mode_info(src_processor, dst_processor, AmdSmiXgmiFbSharingMode.MODE_4)
print(fb_sharing)
except AmdSmiException as e:
print(e)
amdsmi_set_xgmi_fb_sharing_mode#
Description: Sets framebuffer sharing mode
Note: This API will only work if there’s no guest VM running.
Input parameters:
processor handlePF of a GPU devicemodeframebuffer sharing mode fromAmdSmiXgmiFbSharingModeenum
AmdSmiXgmiFbSharingMode enum:
Field |
Description |
|---|---|
|
custom framebuffer sharing mode (Not supported yet) |
|
framebuffer sharing mode_1 |
|
framebuffer sharing mode_2 |
|
framebuffer sharing mode_4 |
|
framebuffer sharing mode_8 |
Output:
None
Exceptions that can be thrown by amdsmi_set_xgmi_fb_sharing_mode function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_xgmi_fb_sharing_mode(processor, AmdSmiXgmiFbSharingMode.MODE_4)
except AmdSmiException as e:
print(e)
amdsmi_set_xgmi_fb_sharing_mode_v2#
Description: Sets framebuffer sharing mode
Note: This API will only work if there’s no guest VM running. This api can be used for custom and auto setting of xgmi frame buffer sharing. In case of custom mode: - All processors in the list must be on the same NUMA node. Otherwise, api will return error. - If any processor from the list already belongs to an existing group, the existing group will be released automatically. In case of auto mode(MODE_X): - The input parameter processor_list[0] should be valid. Only the first element of processor_list is taken into account and it can be any gpu0,gpu1,…
Input parameters:
processor_listlist of PFs of a GPU devicesmodeframebuffer sharing mode fromAmdSmiXgmiFbSharingModeenum
AmdSmiXgmiFbSharingMode enum:
Field |
Description |
|---|---|
|
custom framebuffer sharing mode |
|
framebuffer sharing mode_1 |
|
framebuffer sharing mode_2 |
|
framebuffer sharing mode_4 |
|
framebuffer sharing mode_8 |
Output:
None
Exceptions that can be thrown by amdsmi_set_xgmi_fb_sharing_mode_v2 function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
processors_custom_mode = []
if len(processors) == 0:
print("No GPUs on machine")
else:
if len(processors) > 3:
processors_custom_mode.append(processors[0])
processors_custom_mode.append(processors[2])
else:
processors_custom_mode = processors
amdsmi_set_xgmi_fb_sharing_mode_v2(processors_custom_mode, AmdSmiXgmiFbSharingMode.CUSTOM)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_metrics#
Description: Gets GPU metric information
Input parameters:
processor handlePF of a GPU device
Output: list of dictionaries with fields for each metric
Field |
Description |
|---|---|
|
value of the metric |
|
unit of the metric from |
|
name of the metric from |
|
category of the metric from |
|
list of types of the metric from |
|
mask of all active VFs + PF that this metric applies to |
|
resource group from |
|
resource group from |
|
resource instance number |
AmdSmiMetricUnit enum:
Field |
Description |
|---|---|
|
counter |
|
unsigned integer |
|
boolean |
|
megahertz |
|
percentage |
|
millivolt |
|
celsius |
|
watt |
|
joule |
|
gigabyte per second |
|
megabit per second |
|
PCIe generation |
|
PCIe lanes |
|
millijoule |
|
unknown unit |
AmdSmiMetricName enum:
Field |
Description |
|---|---|
|
accumulated counter |
|
firmware timestamp |
|
gfx clock |
|
socket clock |
|
memory clock |
|
vclk clock |
|
dclk clock |
|
gfx usage |
|
memory usage |
|
mm usage |
|
vcn usage |
|
jpeg usage |
|
gfx voltage |
|
socket voltage |
|
memory voltage |
|
current hotspot temperature |
|
hotspot temperature limit |
|
current memory temperature |
|
memory temperature limit |
|
current vr temperature |
|
shutdown temperature |
|
current power |
|
power limit |
|
socket energy |
|
ccd energy |
|
xcd energy |
|
aid energy |
|
memory energy |
|
active socket throttle |
|
active vr throttle |
|
active memory throttle |
|
pcie bandwidth |
|
pcie l0 recovery count |
|
pcie replay count |
|
pcie replay rollover count |
|
pcie nak sent count |
|
pcie nak received count |
|
maximum gfx clock limit |
|
maximum socket clock limit |
|
maximum memory clock limit |
|
maximum vclk clock limit |
|
maximum dclk clock limit |
|
minimum gfx clock limit |
|
minimum socket clock limit |
|
minimum memory clock limit |
|
minimum vclk clock limit |
|
minimum dclk clock limit |
|
gfx clock locked |
|
gfx deep sleep |
|
memory deep sleep |
|
socket deep sleep |
|
vclk deep sleep |
|
dclk deep sleep |
|
pcie link speed |
|
pcie link width |
|
dram bandwidth |
|
maximum dram bandwidth |
|
gfx clock below host limit ppt |
|
gfx clock below host limit thermal |
|
gfx clock below host limit total |
|
gfx clock low utilization |
|
input telemetry voltage |
|
pldm version |
|
xcd temperature |
|
aid temperature |
|
hbm temperature |
|
system metric accumulated counter |
|
system temperature ubb fpga |
|
system temperature ubb front |
|
system temperature ubb back |
|
system temperature ubb oam7 |
|
system temperature ubb ibc |
|
system temperature ubb ufpga |
|
system temperature ubb oam1 |
|
system temperature oam 0 1 hsc |
|
system temperature oam 2 3 hsc |
|
system temperature oam 4 5 hsc |
|
system temperature oam 6 7 hsc |
|
system temperature ubb fpga 0v72 vr |
|
system temperature ubb fpga 3v3 vr |
|
system temperature retimer 0 1 2 3 1v2 vr |
|
system temperature retimer 4 5 6 7 1v2 vr |
|
system temperature retimer 0 1 0v9 vr |
|
system temperature retimer 4 5 0v9 vr |
|
system temperature retimer 2 3 0v9 vr |
|
system temperature retimer 6 7 0v9 vr |
|
system temperature oam 0 1 2 3 3v3 vr |
|
system temperature oam 4 5 6 7 3v3 vr |
|
system temperature ibc hsc |
|
system temperature ibc |
|
node temperature retimer |
|
node temperature ibc temp |
|
node temperature ibc 2 temp |
|
node temperature vdd18 vr temp |
|
node temperature 04 hbm b vr temp |
|
node temperature 04 hbm d vr temp |
|
vr temperature vddcr vdd0 |
|
vr temperature vddcr vdd1 |
|
vr temperature vddcr vdd2 |
|
vr temperature vddcr vdd3 |
|
vr temperature vddcr soc a |
|
vr temperature vddcr soc c |
|
vr temperature vddcr socio a |
|
vr temperature vddcr socio c |
|
vr temperature vdd 085 hbm |
|
vr temperature vddcr 11 hbm b |
|
vr temperature vddcr 11 hbm d |
|
vr temperature vdd usr |
|
vr temperature vddio 11 e32 |
|
unknown name |
AmdSmiMetricCategory enum:
Field |
Description |
|---|---|
|
counter |
|
frequency |
|
activity |
|
temperature |
|
power |
|
energy |
|
throttle |
|
pcie |
|
static |
|
system accumulated counter |
|
system baseboard temperature |
|
system gpu board temperature |
|
unknown category |
AmdSmiMetricType enum:
Field |
Description |
|---|---|
|
counter |
|
chiplet |
|
instantaneous data |
|
accumulated data |
AmdSmiMetricResGroup enum:
Field |
Description |
|---|---|
|
resource group is not applicable |
|
gpu resource group |
|
xcp resource group |
|
aid resource group |
|
mid resource group |
|
system resource group |
|
unknown resource group |
AmdSmiMetricResSubgroup enum:
Field |
Description |
|---|---|
|
resource subgroup is not applicable |
|
xcc resource subgroup |
|
engine resource subgroup |
|
hbm resource subgroup |
|
baseboard resource subgroup |
|
gpuboard resource subgroup |
|
unknown resource subgroup |
Exceptions that can be thrown by amdsmi_get_gpu_metrics function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
metrics = amdsmi_get_gpu_metrics(processor)
print(metrics)
except AmdSmiException as e:
print(e)
amdsmi_get_soc_pstate#
Description: Gets the soc pstate policy for the processor
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
|---|---|
|
current policy index |
|
List of policies |
Each policies list entry is a dictionary with following fields:
Field |
Description |
|---|---|
|
policy id |
|
policy description |
Exceptions that can be thrown by amdsmi_get_soc_pstate function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
dpm_policy = amdsmi_get_soc_pstate(processor)
print(dpm_policy)
except AmdSmiException as e:
print(e)
amdsmi_set_soc_pstate#
Description: Sets the soc pstate policy for the processor
Input parameters:
processor handlePF of a GPU devicepolicy_idpolicy id represents one of the values we get from the policies list from amdsmi_get_soc_pstate.
Output:
None
Exceptions that can be thrown by amdsmi_set_soc_pstate function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_soc_pstate(processor, 0)
except AmdSmiException as e:
print(e)
amdsmi_get_xgmi_plpd#
Description: Gets the xgmi per-link power down policy parameter for the processor
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
|---|---|
|
current policy index |
|
List of policies |
Each policies list entry is a dictionary with following fields:
Field |
Description |
|---|---|
|
policy id |
|
policy description |
Exceptions that can be thrown by amdsmi_get_xgmi_plpd function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
dpm_policy = amdsmi_get_xgmi_plpd(processor)
print(dpm_policy)
except AmdSmiException as e:
print(e)
amdsmi_set_xgmi_plpd#
Description: Sets the xgmi per-link power down policy parameter for the processor
Input parameters:
processor handlePF of a GPU devicepolicy_idpolicy id represents one of the values we get from the policies list from amdsmi_get_soc_pstate.
Output:
None
Exceptions that can be thrown by amdsmi_set_xgmi_plpd function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_xgmi_plpd(processor, 0)
except AmdSmiException as e:
print(e)
AmdSmiEventReader class#
Description: Providing methods for event monitoring
Methods:
constructor#
Description: Allocates a new event reader notifier to monitor different types of issues with the GPU
Input parameters:
processor handle listlist of GPU device handle objects(PFs od Vfs) for which to create event readerevent category listlist of the different event categories that the event reader will monitor in GPU
Event category is AmdSmiEventCategory enum object with values
Category |
Description |
|---|---|
|
not used category |
|
driver events(allocation, failures of APIs, debug errors) |
|
events/notifications regarding RESET executed by the GPU |
|
scheduling events(world switch fail …) |
|
VBIOS events(security failures, vbios corruption…) |
|
ecc events |
|
pp events(slave not present, dpm fail …) |
|
events regarding the configuration of VF resources |
|
vf events(no vbios, gpu reset fail…) |
|
events related with FW loading or FW operations |
|
gpu fatal conditions |
|
guard events |
|
gpumon events(fb issues …) |
|
mmsch events |
|
xgmi events |
|
monitor all categories |
severityof events that can be monitored
Severity is AmdSmiEventSeverity enum object with values
Severity |
Description |
|---|---|
|
critical error |
|
significant error |
|
trivial error |
|
warning |
|
info |
|
monitor all severity levels |
severity parameter is optional. If nothing is set, events with LOW severity will be monitored by default.
Output:
created object of AmdSmiEventReader class
Exceptions that can be thrown by AmdSmiEventReader constructor function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
event_reader = AmdSmiEventReader(processors,{AmdSmiEventCategory.RESET})
except SmiException as e:
print(e)
read#
Description: Reads and return one event from event reader
Input parameters:
timestampnumber of microseconds to wait for an event to occur. If event does not happen monitoring is finished
Output: Dictionary with fields
Field |
Description |
|---|---|
|
VF handle |
|
GPU device id |
|
UTC time (in microseconds) when the error happened |
|
data value associated with the specific event |
|
event category |
|
event subcategory |
|
event severity |
|
UTC date and time when the error happend |
|
message describing the event |
|
processor handle object where the event happened |
event category is AmdSmiEventCategory enum object with values
Category |
Description |
|---|---|
|
not used category |
|
driver events(allocation, failures of APIs, debug errors) |
|
events/notifications regarding RESET executed by the GPU |
|
scheduling events(world switch fail …) |
|
VBIOS events(security failures, vbios corruption…) |
|
ecc events |
|
pp events(slave not present, dpm fail …) |
|
events regarding the configuration of VF resources |
|
vf events(no vbios, gpu reset fail…) |
|
events related with FW loading or FW operations |
|
gpu fatal conditions |
|
guard events |
|
gpumon events(fb issues …) |
|
mmsch events |
|
xgmi events |
|
monitor all categories |
every AmdSmiEventCategory has its corresponding enum subcategory,
subcategories are:
Subcategory |
Field |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Severity is AmdSmiEventSeverity enum object with values
Severity |
Description |
|---|---|
|
critical error |
|
significant error |
|
trivial error |
|
warning |
|
info |
|
monitor all severity levels |
Exceptions that can be thrown by read function:
AmdSmiParameterExceptionAmdSmiTimeoutExceptionAmdSmiLibraryException
stop#
Description: Any resources used by event notification for the the given device will be freed with this function. This can be used explicitly or
automatically using with statement, like in the examples below. This should be called either manually or automatically for every created AmdSmiEventReader object.
Input parameters: None
Example with manual cleanup of AmdSmiEventReader:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
event_reader = AmdSmiEventReader(processors,{AmdSmiEventCategory.RESET}, AmdSmiEventSeverity.ALL)
while True:
event = event_reader.read(10*1000*1000)
gpu_bdf = amdsmi_get_gpu_device_bdf(event['dev_id'])
vf_bdf = amdsmi_get_gpu_device_bdf(event['fcn_id'])
print("=============== Event ================")
print(" Time {}".format(event['timestamp']))
print(" Category {}".format(event['category'].name))
print(" Subcategory {}".format(event['subcode'].name))
print(" Level {}".format(event['level'].name))
print(" Data {}".format(event['data']))
print(" VF BDF {}".format(vf_bdf))
print(" GPU BDF {}".format(gpu_bdf))
print(" Date {}".format(event['date']))
print(" Message {}".format(event['message']))
print("======================================")
except AmdSmiTimeoutException:
print("No more events")
finally:
event_reader.stop()
Example with automatic cleanup using with statement:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
with AmdSmiEventReader(processors,{AmdSmiEventCategory.RESET}, AmdSmiEventSeverity.ALL) as event_reader:
while True:
event = event_reader.read(10*1000*1000)
gpu_bdf = amdsmi_get_gpu_device_bdf(event['dev_id'])
vf_bdf = amdsmi_get_gpu_device_bdf(event['fcn_id'])
print("=============== Event ================")
print(" Time {}".format(event['timestamp']))
print(" Category {}".format(event['category'].name))
print(" Subcategory {}".format(event['subcode'].name))
print(" Level {}".format(event['level'].name))
print(" Data {}".format(event['data']))
print(" VF BDF {}".format(vf_bdf))
print(" GPU BDF {}".format(gpu_bdf))
print(" Date {}".format(event['date']))
print(" Message {}".format(event['message']))
print("======================================")
except AmdSmiTimeoutException:
print("No more events")
amdsmi_get_lib_version#
Description: Get the build version information for the currently running build of AMDSMI.
Output: amdsmi build version
Exceptions that can be thrown by amdsmi_get_lib_version function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
devices = amdsmi_get_processor_handles()
if len(devices) == 0:
print("No GPUs on machine")
else:
for device in devices:
version = amdsmi_get_lib_version()
print(version)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_virtualization_mode#
Description: Retrieve the current GPU virtualization mode.
Input parameters:
processor handleThe handle to the GPU device for which the virtualization mode is being queried.
Output: Enum object representing the current virtualization mode.
AmdSmiVirtualizationMode enum:
Field |
Description |
|---|---|
|
unknown virtualization mode |
|
none virtualization mode |
|
host virtualization mode |
|
guest virtualization mode |
|
passthrough virtualization mode |
Exceptions that can be thrown by amdsmi_get_gpu_virtualization_mode function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
virtualization_mode = amdsmi_get_gpu_virtualization_mode(processor)
print("Virtualization mode: {}".format(virtualization_mode))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_accelerator_partition_profile_config#
Description: Returns gpu accelerator partition caps as currently configured in the system
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||
|
index of the default profile |
||||||||||||
|
|
AmdSmiAcceleratorPartitionResource enum:
Field |
Description |
|---|---|
|
xcc resource capabilities |
|
encoder resource capabilities |
|
decoder resource capabilities |
|
dma resource capabilities |
|
jpeg resource capabilities |
AmdSmiAcceleratorPartitionSetting enum:
Field |
Description |
|---|---|
|
invalid compute partition |
|
compute partition with all xccs in group (8/1) |
|
compute partition with four xccs in group (8/2) |
|
compute partition with two xccs in group (6/3) |
|
compute partition with two xccs in group (8/4) |
|
compute partition with one xcc in group (8/8) |
Exceptions that can be thrown by amdsmi_get_gpu_accelerator_partition_profile_config function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
accelerator_partition_config = amdsmi_get_gpu_accelerator_partition_profile_config(processor)
print(accelerator_partition_config)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_accelerator_partition_profile_config_global#
Description: Returns all GPU accelerator partition capabilities which can be configured on the system
Input parameters:
processor_handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
List of dictionaries, each describing a supported accelerator partition profile. Each dictionary contains:
|
||||||||||||||
|
Index of the default profile used if no custom configuration is set |
AmdSmiAcceleratorPartitionSetting enum:
Field |
Description |
|---|---|
|
Invalid compute partition |
|
Compute partition with all xccs in group (8/1) |
|
Compute partition with four xccs in group (8/2) |
|
Compute partition with two xccs in group (6/3) |
|
Compute partition with two xccs in group (8/4) |
|
Compute partition with one xcc in group (8/8) |
AmdSmiAcceleratorPartitionResource enum:
Field |
Description |
|---|---|
|
xcc resource capabilities |
|
encoder resource capabilities |
|
decoder resource capabilities |
|
dma resource capabilities |
|
jpeg resource capabilities |
Exceptions that can be thrown by amdsmi_get_gpu_accelerator_partition_profile_config_global function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
config = amdsmi_get_gpu_accelerator_partition_profile_config_global(processor)
print(config)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_accelerator_partition_profile#
Description: Returns current gpu accelerator partition cap
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||
|
array of ids for current accelerator profile |
AmdSmiComputePartitionResource enum:
Field |
Description |
|---|---|
|
xcc resource capabilities |
|
encoder resource capabilities |
|
decoder resource capabilities |
|
dma resource capabilities |
|
jpeg resource capabilities |
Exceptions that can be thrown by amdsmi_get_gpu_accelerator_partition_profile function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
accelerator_partition_profile = amdsmi_get_gpu_accelerator_partition_profile(processor)
print(accelerator_partition_profile)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_memory_partition_config#
Description: Returns current gpu memory partition config and mode capabilities
Input parameters:
processor handlePF of a GPU device
Output: Dictionary with fields
Field |
Description |
||||||||
|---|---|---|---|---|---|---|---|---|---|
|
memory partition capabilities |
||||||||
|
memory partition mode from |
||||||||
|
|
‘AmdSmiMemoryPartitionSetting’ enum:
Field |
Description |
|---|---|
|
unknown memory partition |
|
memory partition with 1 number per socket |
|
memory partition with 2 numbers per socket |
|
memory partition with 4 numbers per socket |
|
memory partition with 8 numbers per socket |
Exceptions that can be thrown by amdsmi_get_gpu_memory_partition_config function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
memory_partition_config = amdsmi_get_gpu_memory_partition_config(processor)
print(memory_partition_config)
except AmdSmiException as e:
print(e)
amdsmi_set_gpu_accelerator_partition_profile#
Description: Sets accelerator partition setting based on profile_index from amdsmi_get_gpu_accelerator_partition_profile_config
Input parameters:
processor handlePF of a GPU deviceprofile_indexRepresents index of a partition user wants to set
Output:
None
Exceptions that can be thrown by amdsmi_set_gpu_accelerator_partition_profile function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_gpu_accelerator_partition_profile(processor, 1)
except AmdSmiException as e:
print(e)
amdsmi_set_gpu_memory_partition_mode#
Description: Sets memory partition mode
Input parameters:
processor handlePF of a GPU devicesettingEnum fromAmdSmiMemoryPartitionSettingrepresenting memory partitioning mode to set
AmdSmiMemoryPartitionSetting enum:
Field |
Description |
|---|---|
|
unknown memory partition |
|
memory partition with 1 number per socket |
|
memory partition with 2 numbers per socket |
|
memory partition with 4 numbers per socket |
|
memory partition with 8 numbers per socket |
Output:
None
Exceptions that can be thrown by amdsmi_set_gpu_memory_partition_mode function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_gpu_memory_partition_mode(processor, AmdSmiMemoryPartitionSetting.NPS1)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_cper_entries#
Description: Dump CPER entries for a given GPU in a file using from CPER header file from RAS tool.
Input parameters:
processor_handledevice which to queryseverity_maskRepresents different severity masks from ‘AmdSmiCperErrorSeverity’ enum on which filtering of cpers is based.buffer_sizepointer to a variable that specifies the size of the cper_datacursorpointer to a variable that will contain the cursor for the next call
AmdSmiCperErrorSeverity
Field |
Description |
|---|---|
|
filters non-fatal-uncorrected cpers |
|
filters fatal cpers |
|
filters non_fatal_corrected cpers |
|
shows all cper types |
Output: Dictionary with fields
Field |
Description |
|---|---|
|
The severity of the CPER error ex: |
|
The notification type associated with the CPER entry. |
|
The time when the CPER entry was recorded, formatted as |
|
A 4-byte signature identifying the entry, typically |
|
The revision number of the CPER record format. |
|
A marker value (typically |
|
The count of sections included in the CPER entry. |
|
The total length in bytes of the CPER entry. |
|
A character array identifying the GPU or platform. |
|
A character array indicating the creator of the CPER entry. |
|
A unique identifier for the CPER entry. |
|
Reserved flags related to the CPER entry. |
|
Reserved information related to persistence. |
Exceptions that can be thrown by amdsmi_get_gpu_cper_entries function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
for device in devices:
entries, new_cursor, cper_data = amdsmi_get_gpu_cper_entries(device, severity_mask, buffer_size, initial_cursor)
print("CPER entries for device", device)
for key, entry in entries.items():
print("Entry", key)
print(" Error Severity:", entry.get("error_severity", "Unknown"))
print(" Notify Type:", entry.get("notify_type", "Unknown"))
print(" Timestamp:", entry.get("timestamp", ""))
print()
print("New Cursor Position:", new_cursor)
except AmdSmiException as e:
print(e)
amdsmi_get_p2p_status#
Description: Retrieve the connection type and P2P capabilities between 2 GPUs
Input parameters:
processor_handle_srcthe source device handleprocessor_handle_destthe destination device handle
Output: Dictionary with fields:
Fields |
Description |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
AmdSmiLinkType |
||||||||||||
|
|
Exceptions that can be thrown by amdsmi_get_p2p_status function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
processor_handle_src = processors[0]
processor_handle_dest = processors[1]
link_type = amdsmi_get_p2p_status(processor_handle_src, processor_handle_dest)
print(link_type['type'])
print(link_type['caps'])
except AmdSmiException as e:
print(e)
amdsmi_get_afids_from_cper#
Description: Get AFID for cper error
Input parameters: buffer which contains one cper
cper_buffer
Output:
List of AFID
Exceptions that can be thrown by amdsmi_get_afids_from_cper function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
entries, new_cursor, cper_data = amdsmi_get_gpu_cper_entries(processor, severity_mask, buffer_size, initial_cursor)
afid = amdsmi_get_afids_from_cper(cper_data[0]['bytes'])
print(afid)
except AmdSmiException as e:
print(e)
amdsmi_reset_gpu#
Description: Reset the GPU associated with the device with provided processor handle.
Input parameters: GPU device handle
processor_handle
Output:
None
Exceptions that can be thrown by amdsmi_reset_gpu function:
AmdSmiLibraryExceptionAmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_reset_gpu(processor)
except AmdSmiException as e:
print(e)
amdsmi_get_cpu_affinity_with_scope#
Description: Returns list of bitmask information for the given GPU.
Input parameters:
processor_handledevice which to queryscopeenum value for numa or socket affinity
Output: bitmask of CPU cores that this processor affinities with
Exceptions that can be thrown by amdsmi_get_cpu_affinity_with_scope function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
devices = amdsmi_get_processor_handles()
if len(devices) == 0:
print("No GPUs on machine")
else:
for device in devices:
bitmask = amdsmi_get_cpu_affinity_with_scope(device, AmdSmiAffinityScope.NUMA_SCOPE)
print(bitmask)
except AmdSmiException as e:
print(e)
amdsmi_topo_get_numa_node_number#
Description: Get the NUMA node associated with a device
Input parameters:
processor_handledevice which to query
Output: NUMA node value
Exceptions that can be thrown by amdsmi_topo_get_numa_node_number function:
AmdSmiLibraryExceptionAmdSmiRetryExceptionAmdSmiParameterException
Example:
try:
devices = amdsmi_get_processor_handles()
if len(devices) == 0:
print("No GPUs on machine")
else:
for device in devices:
numa_node = amdsmi_topo_get_numa_node_number(device)
print(numa_node)
except AmdSmiException as e:
print(e)
amdsmi_get_nic_driver_info#
Description: Retrieves information about the NIC driver
Input parameters:
processor_handleNIC for which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
driver name |
|
driver version |
Exceptions that can be thrown by amdsmi_get_nic_driver_info function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
nic_handles = amdsmi_get_processor_handles_by_type(AmdSmiProcessorType.AMD_NIC)
if len(nic_handles) == 0:
print("No NICs on machine")
else:
for nic in nic_handles:
print(amdsmi_get_nic_driver_info(nic))
except AmdSmiException as e:
print(e)
amdsmi_get_nic_asic_info#
Description: Retrieves ASIC information for the NIC
Input parameters:
processor_handleNIC for which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
vendor id |
|
subsystem vendor id |
|
device id |
|
subsystem device id |
|
revision id |
|
permanent mac address |
|
product name |
|
part number |
|
serial number |
|
vendor name |
Exceptions that can be thrown by amdsmi_get_nic_asic_info function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
nic_handles = amdsmi_get_processor_handles_by_type(AmdSmiProcessorType.AMD_NIC)
if len(nic_handles) == 0:
print("No NICs on machine")
else:
for nic in nic_handles:
print(amdsmi_get_nic_asic_info(nic))
except AmdSmiException as e:
print(e)
amdsmi_get_nic_bus_info#
Description: Retrieves BUS information for the NIC
Input parameters:
processor_handleNIC for which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
bus |
|
maximum supported PCIe link width |
|
maximum supported PCIe link speed |
|
PCIe interface version (Not supported yet, currently hardcoded to 0) |
|
physical slot type (Not supported yet, currently hardcoded to 0) |
Exceptions that can be thrown by amdsmi_get_nic_bus_info function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
nic_handles = amdsmi_get_processor_handles_by_type(AmdSmiProcessorType.AMD_NIC)
if len(nic_handles) == 0:
print("No NICs on machine")
else:
for nic in nic_handles:
print(amdsmi_get_nic_bus_info(nic))
except AmdSmiException as e:
print(e)
amdsmi_get_nic_numa_info#
Description: Retrieves NUMA information for the NIC
Input parameters:
processor_handleNIC for which to query
Output: Dictionary with fields
Field |
Content |
|---|---|
|
NUMA (Non-Uniform Memory Access) node identifier associated with the device. |
|
CPU affinity mask for the device |
Exceptions that can be thrown by amdsmi_get_nic_numa_info function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
nic_handles = amdsmi_get_processor_handles_by_type(AmdSmiProcessorType.AMD_NIC)
if len(nic_handles) == 0:
print("No NICs on machine")
else:
for nic in nic_handles:
print(amdsmi_get_nic_numa_info(nic))
except AmdSmiException as e:
print(e)
amdsmi_get_nic_port_info#
Description: Retrieves PORT information for the NIC
Input parameters:
processor_handleNIC for which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
List of port information dictionaries |
Each port dictionary contains:
Field |
Content |
|---|---|
|
BDF of the port |
|
Port number |
|
Type of the port |
|
Port flavour |
|
Associated network device name |
|
Interface index of the port |
|
MAC address assigned to the port |
|
Carrier state |
|
Maximum Transmission Unit size |
|
Current link state |
|
Link speed in Mbps |
|
Currently active Forward Error Correction mode |
|
Auto-negotiation status |
|
Pause frame auto-negotiation status |
|
Receive pause frame status |
|
Transmit pause frame status |
Exceptions that can be thrown by amdsmi_get_nic_port_info function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
nic_handles = amdsmi_get_processor_handles_by_type(AmdSmiProcessorType.AMD_NIC)
if len(nic_handles) == 0:
print("No NICs on machine")
else:
for nic in nic_handles:
port_info = amdsmi_get_nic_port_info(nic)
print(f"Number of ports: {len(port_info['ports'])}")
for i, port in enumerate(port_info['ports']):
print(f"Port {i}:")
print(f" BDF: {port['bdf']}")
print(f" Port Number: {port['port_num']}")
print(f" Type: {port['type']}")
print(f" Flavour: {port['flavour']}")
print(f" Network Device: {port['netdev']}")
print(f" Interface Index: {port['ifindex']}")
print(f" MAC Address: {port['mac_address']}")
print(f" Carrier: {port['carrier']}")
print(f" MTU: {port['mtu']}")
print(f" Link State: {port['link_state']}")
print(f" Link Speed: {port['link_speed']}")
print(f" Active FEC: {port['active_fec']}")
print(f" Auto-negotiation: {port['autoneg']}")
print(f" Pause Auto-negotiation: {port['pause_autoneg']}")
print(f" Pause RX: {port['pause_rx']}")
print(f" Pause TX: {port['pause_tx']}")
except AmdSmiException as e:
print(e)
amdsmi_get_nic_rdma_dev_info#
Description: Retrieves RDMA devices information for the NIC
Input parameters:
processor_handleNIC for which to query
Output: Dictionary with fields
Field |
Description |
|---|---|
|
List of RDMA device information dictionaries |
Each RDMA device info dictionary contains:
Field |
Description |
|---|---|
|
RDMA device name |
|
Node GUID |
|
Node type |
|
System image GUID |
|
Firmware version |
|
List of RDMA port information dictionaries |
Each RDMA port info dictionary contains:
Field |
Description |
|---|---|
|
Associated network device name |
|
Port state (DOWN, INIT, ARMED, ACTIVE, ACTIVE_DEFER) |
|
RDMA port number |
|
Maximum MTU size |
|
Currently active MTU size |
Exceptions that can be thrown by amdsmi_get_nic_rdma_dev_info function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
nic_handles = amdsmi_get_processor_handles_by_type(AmdSmiProcessorType.AMD_NIC)
if len(nic_handles) == 0:
print("No NICs on machine")
else:
for nic in nic_handles:
rdma_info = amdsmi_get_nic_rdma_dev_info(nic)
print(f"Number of RDMA devices: {len(rdma_info['rdma_dev_info'])}")
for i, rdma_dev in enumerate(rdma_info['rdma_dev_info']):
print(f"RDMA Device {i}:")
print(f" Device: {rdma_dev['rdma_dev']}")
print(f" Node GUID: {rdma_dev['node_guid']}")
print(f" Node Type: {rdma_dev['node_type']}")
print(f" System Image GUID: {rdma_dev['sys_image_guid']}")
print(f" Firmware Version: {rdma_dev['fw_ver']}")
print(f" Number of RDMA ports: {len(rdma_dev['rdma_port_info'])}")
for j, port in enumerate(rdma_dev['rdma_port_info']):
print(f" Port {j}:")
print(f" Network Device: {port['netdev']}")
print(f" State: {port['state']}")
print(f" RDMA Port: {port['rdma_port']}")
print(f" Max MTU: {port['max_mtu']}")
print(f" Active MTU: {port['active_mtu']}")
except AmdSmiException as e:
print(e)
amdsmi_get_nic_port_statistics#
Description: Retrieve all available PORT statistics for the specified NIC port
Input parameters:
processor_handleNIC for which to queryport_indexindex of the NIC port to query
Output: Returns a list of dictionaries, each containing:
namevalue
Exceptions that can be thrown by amdsmi_get_nic_port_statistics function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
nic_handles = amdsmi_get_processor_handles_by_type(AmdSmiProcessorType.AMD_NIC)
if len(nic_handles) == 0:
print("No NICs on machine")
else:
for nic in nic_handles:
port_stats = amdsmi_get_nic_port_statistics(nic, 0)
for stat in port_stats:
print(f"{stat['name']}: {stat['value']}")
except AmdSmiException as e:
print(e)
amdsmi_get_nic_vendor_statistics#
Description: Retrieve vendor specific statistics for the NIC port
Input parameters:
processor_handleNIC for which to queryport_indexindex of the NIC port to query
Output: Returns a list of dictionaries, each containing:
namevalue
This API provides access to vendor/driver specific statistics that may vary between different NIC vendors and driver/firmware versions. The statistic names are preserved as provided by the underlying driver implementation.
Note: The exact statistics available depend on the NIC vendor, hardware and driver version.
Exceptions that can be thrown by amdsmi_get_nic_vendor_statistics function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
nic_handles = amdsmi_get_processor_handles_by_type(AmdSmiProcessorType.AMD_NIC)
if len(nic_handles) == 0:
print("No NICs on machine")
else:
for nic in nic_handles:
vendor_stats = amdsmi_get_nic_vendor_statistics(nic, 0)
for stat in vendor_stats:
print(f"{stat['name']}: {stat['value']}")
except AmdSmiException as e:
print(e)
amdsmi_get_nic_rdma_port_statistics#
Description: Retrieve RDMA port statistics for the NIC
Input parameters:
processor_handleNIC for which to queryrdma_port_indexindex of the NIC RDMA port to query
Output: Returns a list of dictionaries, each containing:
namevalue
Exceptions that can be thrown by amdsmi_get_nic_rdma_port_statistics function:
AmdSmiParameterExceptionAmdSmiLibraryException
Example:
try:
nic_handles = amdsmi_get_processor_handles_by_type(AmdSmiProcessorType.AMD_NIC)
if len(nic_handles) == 0:
print("No NICs on machine")
else:
rdma_port_index = 0
for nic in nic_handles:
rdma_stats = amdsmi_get_nic_rdma_port_statistics(nic, rdma_port_index)
for stat in rdma_stats:
print(f"{stat['name']}: {stat['value']}")
except AmdSmiException as e:
print(e)