AMD SMI Python API reference#
Python interface – Consists of Python function declarations, which directly call the C interface. The client can use the Python interface to build applications in Python.
Requirements#
python 3.10+ 64-bit
Overview#
Folder structure#
File Name |
Note |
---|---|
|
Python package initialization file |
|
Amdsmi library python interface |
|
Python wrapper around amdsmi binary |
|
Amdsmi exceptions python file |
|
Documentation |
Build steps#
Navigate to project’s root folder and run Makefile command:
make package
Build process will create a folder build/package/BUILD_MODE/amdsmi
, where BUILD_MODE
can be Release or Debug.
The folder will contain the following files:
__init__.py
amdsmi_interface.py
amdsmi_wrapper.py
amdsmi_exception.py
libamdsmi.so
README.md
Amdsmi usage#
Generated amdsmi
folder should be copied and placed next to importing script. It should be imported as:
from amdsmi import *
try:
amdsmi_init()
# amdsmi calls ...
except AmdSmiException as e:
print(e)
finally:
try:
amdsmi_shut_down()
except AmdSmiException as e:
print(e)
To initialize amdsmi lib, amdsmi_init() must be called before all other calls to amdsmi lib.
To close connection to driver, amdsmi_shut_down() must be the last call.
Amdsmi Exceptions#
All exceptions are in amdsmi_exception.py
file.
Exceptions that can be thrown are:
AmdSmiException
: base smi exception classAmdSmiLibraryException
: derives baseAmdSmiException
class and represents errors that can occur in smi-lib. When this exception is thrown,err_code
anderr_info
are set.err_code
is an integer that corresponds to errors that can occur in smi-lib anderr_info
is a string that explains the error that occurred. Example:
try:
amdsmi_init()
processors = amdsmi_get_processor_handles()
num_of_GPUs = len(processors)
if num_of_GPUs == 0:
print("No GPUs on machine")
except AmdSmiException as e:
print("Error code: {}".format(e.err_code))
if e.err_code == AmdSmiRetCode.AMDSMI_STATUS_RETRY:
print("Error info: {}".format(e.err_info))
finally:
try:
amdsmi_shut_down()
except AmdSmiException as e:
print(e)
AmdSmiRetryException
: DerivesAmdSmiLibraryException
class and signals processor is busy and call should be retried.AmdSmiTimeoutException
: DerivesAmdSmiLibraryException
class and represents that call had timed out.AmdSmiParameterException
: Derives baseAmdSmiException
class and represents errors related to invaild parameters passed to functions. When this exception is thrown, err_msg is set and it explains what is the actual and expected type of the parameters.AmdSmiBdfFormatException
: Derives baseAmdSmiException
class and represents invalid bdf format.
Amdsmi API#
amdsmi_init#
Description: Initialize smi lib and connect to driver
Input parameters: init_flags
(Optional parameter, if no value provided, default value is AMDSMI_INIT_ALL_PROCESSORS
value)
init_flags
is AmdSmiInitFlags
enum:
Field |
Description |
---|---|
|
all processors |
|
amd cpus |
|
amd gpus |
|
non amd cpus |
|
non amd gpus |
|
amd apus |
Output: None
Exceptions that can be thrown by amdsmi_init
function:
AmdSmiLibraryException
Example:
try:
amdsmi_init()
# continue with amdsmi
except AmdSmiException as e:
print("Init failed")
print(e)
amdsmi_shut_down#
Description: Finalize and close connection to driver
Input parameters: None
Output: None
Exceptions that can be thrown by amdsmi_shut_down
function:
AmdSmiLibraryException
Example:
try:
amdsmi_shut_down()
except AmdSmiException as e:
print("Fini failed")
print(e)
amdsmi_get_processor_handles#
Description: Returns list of GPU device handle objects on current machine
Input parameters: None
Output: List of GPU device handles
Exceptions that can be thrown by amdsmi_get_processor_handles
function:
AmdSmiLibraryException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
print(amdsmi_get_gpu_device_uuid(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_processor_handle_from_bdf#
Description: Returns processor handle from the given BDF
Input parameters: bdf string in form of either <domain>:<bus>:<device>.<function>
or <bus>:<device>.<function>
in hexcode format.
Where:
<domain>
is 4 hex digits long from 0000-FFFF interval<bus>
is 2 hex digits long from 00-FF interval<device>
is 2 hex digits long from 00-1F interval<function>
is 1 hex digit long from 0-7 interval
Output: processor handle object
Exceptions that can be thrown by amdsmi_get_processor_handle_from_bdf
function:
AmdSmiLibraryException
AmdSmiBdfFormatException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
print(amdsmi_get_gpu_device_uuid(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_device_bdf#
Description: Returns BDF of the given device
Input parameters:
GPU device for which to query
Output: BDF string in form of <domain>:<bus>:<device>.<function>
in hexcode format.
Where:
<domain>
is 4 hex digits long from 0000-FFFF interval<bus>
is 2 hex digits long from 00-FF interval<device>
is 2 hex digits long from 00-1F interval<function>
is 1 hex digit long from 0-7 interval
Exceptions that can be thrown by amdsmi_get_gpu_device_bdf
function:
AmdSmiParameterException
AmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
print("Processor's bdf:", amdsmi_get_gpu_device_bdf(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_index_from_processor_handle#
Description: Returns the index of the given processor handle
Input parameters:
GPU device for which to query
Output: GPU device index
Exceptions that can be thrown by amdsmi_get_index_from_processor_handle
function:
AmdSmiParameterException
AmdSmiLibraryException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
print("Processor's index:", amdsmi_get_index_from_processor_handle(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_processor_handle_from_index#
Description: Returns the processor handle from the given processor index
Input parameters:
Function processor index to query
Output: processor handle object
Exceptions that can be thrown by amdsmi_get_processor_handle_from_index
function:
AmdSmiParameterException
AmdSmiLibraryException
Example:
try:
processors = amdsmi_get_processor_handles()
num_of_GPUs = len(processors)
if num_of_GPUs == 0:
print("No GPUs on machine")
else:
for index in range(num_of_GPUs):
print("Processor handle:", amdsmi_get_processor_handle_from_index(index))
except AmdSmiException as e:
print(e)
amdsmi_get_vf_bdf#
Description: Returns BDF of the given VF
Input parameters:
VF for which to query
Output: BDF string in form of <domain>:<bus>:<device>.<function>
in hexcode format.
Where:
<domain>
is 4 hex digits long from 0000-FFFF interval<bus>
is 2 hex digits long from 00-FF interval<device>
is 2 hex digits long from 00-1F interval<function>
is 1 hex digit long from 0-7 interval
Exceptions that can be thrown by amdsmi_get_vf_bdf
function:
AmdSmiParameterException
AmdSmiLibraryException
Example:
try:
vf = amdsmi_get_vf_handle_from_bdf("0000:23:02.0")
print("VF's bdf:", amdsmi_get_vf_bdf(vf))
except AmdSmiException as e:
print(e)
amdsmi_get_vf_handle_from_bdf#
Description: Returns processor handle (VF) from the given BDF
Input parameters: bdf string in form of either <domain>:<bus>:<device>.<function>
or <bus>:<device>.<function>
in hexcode format.
Where:
<domain>
is 4 hex digits long from 0000-FFFF interval<bus>
is 2 hex digits long from 00-FF interval<device>
is 2 hex digits long from 00-1F interval<function>
is 1 hex digit long from 0-7 interval
Output: processor handle object
Exceptions that can be thrown by amdsmi_get_vf_handle_from_bdf
function:
AmdSmiLibraryException
AmdSmiBdfFormatException
Example:
try:
vf = amdsmi_get_vf_handle_from_bdf("0000:23:02.0")
print(amdsmi_get_vf_uuid(vf))
except AmdSmiException as e:
print(e)
amdsmi_get_processor_handle_from_uuid#
Description: Returns processor handle from the given UUID
Input parameters: uuid string Output: processor handle object
Exceptions that can be thrown by amdsmi_get_processor_handle_from_uuid
function:
AmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_uuid("fcff7460-0000-1000-80e9-b388cfe84658")
print("Processor's UUID: ", amdsmi_get_gpu_device_uuid(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_vf_handle_from_uuid#
Description: Returns the handle of a virtual function (VF) from the given UUID
Input parameters: uuid string Output: vf object
Exceptions that can be thrown by amdsmi_get_vf_handle_from_uuid
function:
AmdSmiLibraryException
Example:
try:
vf = amdsmi_get_vf_handle_from_uuid("87007460-0000-1000-8059-3ae746ab9206")
print("VF's UUID: ", amdsmi_get_vf_uuid(vf))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_device_uuid#
Description: Returns the UUID of the device
Input parameters:
GPU device for which to query
Output: UUID string unique to the device
Exceptions that can be thrown by amdsmi_get_gpu_device_uuid
function:
AmdSmiParameterException
AmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
print("Device UUID: ", amdsmi_get_gpu_device_uuid(processor))
except AmdSmiException as e:
print(e)
amdsmi_get_vf_uuid#
Description: Returns the UUID of the device
Input parameters:
VF handle for which to query
Output: UUID string unique to the device
Exceptions that can be thrown by amdsmi_get_vf_uuid
function:
AmdSmiParameterException
AmdSmiLibraryException
Example:
try:
vf_id = amdsmi_get_vf_handle_from_vf_index(processor_handle, 0)
print("VF UUID: ", amdsmi_get_vf_uuid(vf_id))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_driver_info#
Description: Returns the version string of the driver
Input parameters:
processor_handle
GPU device for which to query
Output:
driver_name
Driver name string that is handling the GPU devicedriver_version
Driver version string that is handling the GPU devicedriver_date
Driver date string that is handling the GPU device
Exceptions that can be thrown by amdsmi_get_gpu_driver_info
function:
AmdSmiParameterException
AmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
driver_info = amdsmi_get_gpu_driver_info(processor)
print("Driver name: ", driver_info.driver_name)
print("Driver version: ", driver_info.driver_version)
print("Driver date: ", driver_info.driver_date)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_driver_model#
Description: Returns driver model information
Input parameters:
processor_handle
GPU device for which to query
Output:
current driver model from
AmdSmiDriverModelType
enum
‘AmdSmiDriverModelType’ enum:
Field |
Description |
---|---|
|
Windows Display Driver Model |
|
Windows Driver Model |
|
Microsoft Compute Driver Model |
Exceptions that can be thrown by amdsmi_get_gpu_driver_model
function:
AmdSmiParameterException
AmdSmiLibraryException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23:00.0")
driver_model = amdsmi_get_gpu_driver_model(processor)
print("Driver model: ", driver_model)
except AmdSmiException as e:
print(e)
amdsmi_get_vf_handle_from_vf_index#
Description: Returns VF id of the VF referenced by it’s index (in partitioning info)
Input parameters:
processor handle
PF or child VF of a GPU device for which to queryVF's index
Index of VF (0-31) in GPU’s partitioning info
Output:
VF id
Exceptions that can be thrown by amdsmi_get_vf_handle_from_vf_index
function:
AmdSmiParameterException
AmdSmiLibraryException
Example:
try:
vf_id = amdsmi_get_vf_handle_from_vf_index(processor_handle, 0)
print(amdsmi_get_vf_info(vf_id))
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_total_ecc_count#
Description: Returns the number of ECC errors on the GPU device
Input parameters:
processor_handle
GPU device which to query
Output: Dictionary with fields
Field |
Description |
---|---|
|
Count of ECC correctable errors |
|
Count of ECC uncorrectable errors |
|
Count of ECC deferred errors |
Exceptions that can be thrown by amdsmi_get_gpu_total_ecc_count
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
ecc_errors = amdsmi_get_gpu_total_ecc_count(processor)
print(ecc_errors['correctable_count'])
print(ecc_errors['uncorrectable_count'])
print(ecc_errors['deferred_count'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_ecc_count#
Description: Returns the number of ECC errors on the GPU device for the given block
Input parameters:
processor_handle
GPU device which to queryblock
The block for which error counts should be retrieved
block
is AmdSmiGpuBlock
enum:
Field |
Description |
---|---|
|
UMC block |
|
SDMA block |
|
GFX block |
|
MMHUB block |
|
ATHUB block |
|
PCIE_BIF block |
|
HDP block |
|
XGMI_WAFL block |
|
DF block |
|
SMN block |
|
SEM block |
|
MP0 block |
|
MP1 block |
|
FUSE block |
|
MCA block |
|
VCN block |
|
JPEG block |
|
IH block |
|
MPIO block |
Output: Dictionary with fields
Field |
Description |
---|---|
|
Count of ECC correctable errors |
|
Count of ECC uncorrectable errors |
|
Count of ECC deferred errors |
Exceptions that can be thrown by amdsmi_get_gpu_ecc_count
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
ecc_errors = amdsmi_get_gpu_ecc_count(processor, AmdSmiGpuBlock.UMC)
print(ecc_errors['correctable_count'])
print(ecc_errors['uncorrectable_count'])
print(ecc_errors['deferred_count'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_ecc_enabled#
Description: Returns ECC capabilities (disable/enable) for each GPU block.
Input parameters:
processor_handle
GPU device which to query
Output: Dictionary of each GPU block and its value (False
if the block is not enabled, True
if the block is enabled)
Each GPU block in the dictionary is from AmdSmiGpuBlock
enum:
Field |
Description |
---|---|
|
UMC block |
|
SDMA block |
|
GFX block |
|
MMHUB block |
|
ATHUB block |
|
PCIE_BIF block |
|
HDP block |
|
XGMI_WAFL block |
|
DF block |
|
SMN block |
|
SEM block |
|
MP0 block |
|
MP1 block |
|
FUSE block |
|
MCA block |
|
VCN block |
|
JPEG block |
|
IH block |
|
MPIO block |
Exceptions that can be thrown by amdsmi_get_gpu_ecc_enabled
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
ecc_status = amdsmi_get_gpu_ecc_enabled(processor, AmdSmiGpuBlock.UMC)
print(ecc_status)
except AmdSmiException as e:
print(e)
amdsmi_status_code_to_string#
Description: Get a description of a provided AMDSMI error status
Input parameters:
status
The error status for which a description is desired
Output: String description of the provided error code
Exceptions that can be thrown by amdsmi_status_code_to_string
function:
AmdSmiParameterException
Example:
try:
status_str = amdsmi_status_code_to_string(AmdSmiRetCode.SUCCESS)
print(status_str)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_ras_feature_info#
Description: Returns RAS feature info
Input parameters:
processor_handle
GPU device which to query
Output: Dictionary with fields
Field |
Description |
---|---|
|
RAS EEPROM version |
|
ecc correction schema mask used with |
Exceptions that can be thrown by amdsmi_get_gpu_ras_feature_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
ras_feature = amdsmi_get_gpu_ras_feature_info(processor)
print(ras_feature['ras_eeprom_version'])
print(ras_feature['supported_ecc_correction_schema'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_bad_page_info#
Description: Returns bad page info.
Input parameters:
processor handle object
PF of a GPU device to query
Output: list of dictionaries with fields for each bad page
Field |
Description |
---|---|
|
64K/4K Driver managed location that is blocked from further use |
|
Marks the last time when the RAS event was observed |
|
this value identifies the memory channel the issue has been reported on |
|
this value identifies the memory controller the issue has been reported on |
Exceptions that can be thrown by amdsmi_get_gpu_bad_page_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processor = amdsmi_get_processor_handle_from_bdf("0000:23.00.0")
bad_page_info = amdsmi_get_gpu_bad_page_info(processor)
if len(bad_page_info) == 0:
print("no bad pages")
else:
for table_record in bad_page_info:
print(hex(table_record["retired_page"]))
print(datetime.fromtimestamp(table_record['ts']).strftime('%Y/%m/%d:%H/%M/%S'))
print(table_record['mem_channel'])
print(table_record['mcumc_id'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_asic_info#
Description: Returns asic information for the given GPU
Input parameters:
processor_handle
GPU device which to query
Output: Dictionary with fields
Field |
Content |
---|---|
|
market name |
|
vendor id |
|
vendor name |
|
subsystem vendor id |
|
unique id of a GPU |
|
revision id |
|
asic serial |
|
xgmi physical id |
|
num of compute units (Not supported yet, currently hardcoded to 0) |
|
target graphics version (Not supported yet, currently hardcoded to 0) |
|
subsystem device id |
Exceptions that can be thrown by amdsmi_get_gpu_asic_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
asic_info = amdsmi_get_gpu_asic_info(processor)
print(asic_info['market_name'])
print(asic_info['vendor_id'])
print(asic_info['vendor_name'])
print(asic_info['subvendor_id'])
print(asic_info['device_id'])
print(asic_info['subsystem_id'])
print(asic_info['rev_id'])
print(asic_info['asic_serial'])
print(asic_info['oam_id'])
print(asic_info['num_of_compute_units'])
print(asic_info['target_graphics_version'])
except AmdSmiException as e:
print(e)
amdsmi_get_pcie_info#
Description: Returns static and metric information about PCIe link for the given GPU
Input parameters:
processor_handle
GPU device which to query
Output: Dictionary with fields
Field |
Content |
||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||||||||||
|
|
Exceptions that can be thrown by amdsmi_get_pcie_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
pcie_info = amdsmi_get_pcie_info(processor)
print(pcie_info['pcie_static']['max_pcie_width'])
print(pcie_info['pcie_static']['max_pcie_speed'])
print(pcie_info['pcie_static']['slot_type'])
print(pcie_info['pcie_static']['max_pcie_interface_version'])
print(pcie_info['pcie_metric']['pcie_speed'])
print(pcie_info['pcie_metric']['pcie_width'])
print(pcie_info['pcie_metric']['pcie_bandwidth'])
print(pcie_info['pcie_metric']['pcie_interface_version'])
print(pcie_info['pcie_metric']['pcie_replay_count'])
print(pcie_info['pcie_metric']['pcie_l0_to_recovery_count'])
print(pcie_info['pcie_metric']['pcie_replay_roll_over_count'])
print(pcie_info['pcie_metric']['pcie_nak_sent_count'])
print(pcie_info['pcie_metric']['pcie_nak_received_count'])
print(pcie_info['pcie_metric']['pcie_lc_perf_other_end_recovery_count'])
except AmdSmiException as e:
print(e)
amdsmi_get_power_cap_info#
Description: Returns dictionary of power capabilities as currently configured on the given GPU
Input parameters:
processor_handle
GPU device which to querysensor_ind
sensor index. Normally, this will be 0. If a processor has more than one sensor, it could be greater than 0. Parameter sensor_ind is unused on @platform{host}. It is an optional parameter and is set to 0 by default.
Output: Dictionary with fields
Field |
Description |
---|---|
|
power capability |
|
default power capability |
|
dynamic power management capability |
|
minimum power capability |
|
maximum power capability |
Exceptions that can be thrown by amdsmi_get_power_cap_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
power_info = amdsmi_get_power_cap_info(processor)
print(power_info['power_cap'])
print(power_info['default_power_cap'])
print(power_info['dpm_cap'])
print(power_info['min_power_cap'])
print(power_info['max_power_cap'])
except AmdSmiException as e:
print(e)
amdsmi_get_fb_layout#
Description: Returns framebuffer related information for the given GPU
Input parameters:
processor handle
PF of a GPU device for which to query
Output: Dictionary with field
Field |
Description |
---|---|
|
total framebuffer size in MB |
|
framebuffer reserved space in MB |
|
framebuffer offset in MB |
|
framebuffer alignment in MB |
|
maximum usable framebuffer size in MB |
|
minimum usable framebuffer size in MB |
Exceptions that can be thrown by amdsmi_get_fb_layout
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
fb_info = amdsmi_get_fb_layout(processor)
print(fb_info['total_fb_size'])
print(fb_info['pf_fb_reserved'])
print(fb_info['pf_fb_offset'])
print(fb_info['fb_alignment'])
print(fb_info['max_vf_fb_usable'])
print(fb_info['min_vf_fb_usable'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_activity#
Description: Returns the engine usage for the given GPU.
Input parameters:
processor_handle
GPU device which to query
Output: Dictionary with fields
Field |
Description |
---|---|
|
graphics engine usage/activity percentage (0 - 100) |
|
memory/UMC engine usage/activity percentage (0 - 100) |
|
average multimedia engine usages/activities in percentage (0 - 100) |
Exceptions that can be thrown by amdsmi_get_gpu_activity
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
engine_activity = amdsmi_get_gpu_activity(processor)
print(engine_activity['gfx_activity'])
print(engine_activity['umc_activity'])
print(engine_activity['mm_activity'])
except AmdSmiException as e:
print(e)
amdsmi_get_power_info#
Description: Returns the current power, power limit, and voltage for the given GPU
Input parameters:
processor_handle
GPU device which to querysensor_ind
sensor index. Normally, this will be 0. If a processor has more than one sensor, it could be greater than 0. Parameter sensor_ind is unused on @platform{host}. It is an optional parameter and is set to 0 by default.
Output: Dictionary with fields
Field |
Description |
---|---|
Note: socket_power can rarely spike above the socket power limit in some cases |
|
|
socket power |
|
gfx voltage |
|
socket voltage |
|
memory voltage |
Exceptions that can be thrown by amdsmi_get_power_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
power_info = amdsmi_get_power_info(processor)
print(power_info['socket_power'])
print(power_info['gfx_voltage'])
print(power_info['soc_voltage'])
print(power_info['mem_voltage'])
except AmdSmiException as e:
print(e)
amdsmi_set_power_cap#
Description: Sets GPU power cap.
Input parameters:
processor handle
processor handlesensor_ind
sensor index. Normally, this will be 0. If a processor has more than one sensor, it could be greater than 0. Parameter sensor_ind is unused on @platform{host}.cap
value representing power cap to set. The value must be between the minimum (min_power_cap) and maximum (max_power_cap) power cap values, which can be obtained from ::amdsmi_power_cap_info_t.
Output:
None
Exceptions that can be thrown by amdsmi_set_power_cap
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
sensor_ind = 0
for processor in processors:
power_info = amdsmi_get_power_cap_info(processor)
power_limit = random.randint(power_info['min_power_cap'], power_info['max_power_cap'])
amdsmi_set_power_cap(processor, sensor_ind, power_limit)
except AmdSmiException as e:
print(e)
amdsmi_is_gpu_power_management_enabled#
Description: Returns is power management enabled
Input parameters:
processor_handle
GPU device which to query
Output: Bool true if power management enabled else false
Exceptions that can be thrown by amdsmi_is_gpu_power_management_enabled
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
is_power_management_enabled = amdsmi_is_gpu_power_management_enabled(processor)
print(is_power_management_enabled)
except AmdSmiException as e:
print(e)
amdsmi_get_temp_metric#
Description: Returns the current temperature or limit temperature for the given processor
Input parameters:
processor_handle
GPU device which to querythermal_domain
one ofAmdSmiTemperatureType
enum values:
Field |
Description |
---|---|
|
edge thermal domain |
|
hotspot/junction thermal domain |
|
memory/vram thermal domain |
|
plx thermal domain (Not supported yet) |
|
HBM 0 thermal domain (Not supported yet) |
|
HBM 1 thermal domain (Not supported yet) |
|
HBM 2 thermal domain (Not supported yet) |
|
HBM 3 thermal domain (Not supported yet) |
thermal_metric
one ofAmdSmiTemperatureMetric
enum values:
Field |
Description |
---|---|
|
current thermal metric |
|
max thermal metric (Not supported yet) |
|
min thermal metric (Not supported yet) |
|
max hyst thermal metric (Not supported yet) |
|
min hyst thermal metric (Not supported yet) |
|
limit thermal metric |
|
critical hyst metric (Not supported yet) |
|
emergency thermal metric (Not supported yet) |
|
emergency hyst thermal metric (Not supported yet) |
|
critical min thermal metric (Not supported yet) |
|
critical min hyst thermal metric (Not supported yet) |
|
offset thermal metric (Not supported yet) |
|
lowest thermal metric (Not supported yet) |
|
highest thermal metric (Not supported yet) |
|
shutdown thermal metric |
Output: Temperature value
Exceptions that can be thrown by amdsmi_get_temp_metric
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for device in processors:
print("=============== EDGE THERMAL DOMAIN ================")
thermal_measure = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.EDGE, AmdSmiTemperatureMetric.CURRENT)
print("Current temperature:")
print(thermal_measure)
thermal_measure = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.EDGE, AmdSmiTemperatureMetric.CRITICAL)
print("Limit temperature:")
print(thermal_measure)
print("=============== HOTSPOT THERMAL DOMAIN ================")
thermal_measure = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.HOTSPOT, AmdSmiTemperatureMetric.CURRENT)
print("Current temperature:")
print(thermal_measure)
thermal_measure = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.HOTSPOT, AmdSmiTemperatureMetric.CRITICAL)
print("Limit temperature:")
print(thermal_measure)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_cache_info#
Description: Returns the cache info for the given processor
Input parameters:
processor_handle
GPU device which to query
Output: Dictionary with fields
Field |
Description |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
AmdSmiCacheProperty enum values
Field |
Description |
---|---|
|
Cache enabled |
|
Data cache |
|
Inst cache |
|
CPU cache |
|
SIMD cache |
Exceptions that can be thrown by amdsmi_get_gpu_cache_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for device in processors:
cache_info = amdsmi_get_gpu_cache_info(device)
for cache in cache_info["cache"]:
print(cache["cache_properties"])
print(cache["cache_size"])
print(cache["cache_level"])
print(cache["max_num_cu_shared"])
print(cache["num_cache_instance"])
except AmdSmiException as e:
print(e)
amdsmi_get_clock_info#
Description: Returns the clock measurements for the given GPU
Input parameters:
processor_handle
GPU device which to queryclock_domain
one ofAmdSmiClkType
enum values:
Field |
Description |
---|---|
|
system clock domain |
|
gfx clock domain |
|
Data Fabric clock (for ASICs running on a separate clock) domain (Not supported yet) |
|
Display Controller Engine clock domain (Not supported yet) |
|
SOC clock domain (Not supported yet) |
|
memory clock domain |
|
PCIe clock domain (Not supported yet) |
|
first multimedia engine (VCLK0) clock domain |
|
second multimedia engine (VCLK1) clock domain |
|
DCLK0 clock domain |
|
DCLK1 clock domain |
Output: Dictionary with fields
Field |
Description |
---|---|
|
current clock value for the given domain |
|
minimum clock value for the given domain |
|
maximum clock value for the given domain |
|
clock locked flag only supported on GFX clock domain |
|
clock deep sleep mode flag |
Exceptions that can be thrown by amdsmi_get_clock_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
print("=============== GFX CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.GFX)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_locked'])
print(clock_measure['clk_deep_sleep'])
print("=============== MEM CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.MEM)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== SYS CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.SYS)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== VCLK0 CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.VCLK0)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== VCLK1 CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.VCLK1)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== DCLK0 CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.DCLK0)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
print("=============== DCLK1 CLOCK DOMAIN ================")
clock_measure = amdsmi_get_clock_info(processor, AmdSmiClkType.DCLK1)
print(clock_measure['clk'])
print(clock_measure['min_clk'])
print(clock_measure['max_clk'])
print(clock_measure['clk_deep_sleep'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_vram_info#
Description: Returns the static information for the VRAM info
Input parameters:
processor_handle
GPU device which to query
Output: Dictionary with fields
Field |
Description |
---|---|
|
VRAM type from |
|
VRAM vendor from |
|
VRAM size in MB |
|
VRAM bit width |
AmdSmiVramType
enum:
Field |
Description |
---|---|
|
UNKNOWN VRAM type |
|
HBM VRAM type |
|
HBM2 VRAM type |
|
HBM2E VRAM type |
|
HBM3 VRAM type |
|
DDR2 VRAM type |
|
DDR3 VRAM type |
|
DDR4 VRAM type |
|
GDDR1 VRAM type |
|
GDDR2 VRAM type |
|
GDDR3 VRAM type |
|
GDDR4 VRAM type |
|
GDDR5 VRAM type |
|
GDDR6 VRAM type |
|
GDDR7 VRAM type |
AmdSmiVramVendor
enum:
Field |
Description |
---|---|
|
SAMSUNG VRAM vendor |
|
INFINEON VRAM vendor |
|
ELPIDA VRAM vendor |
|
ETRON VRAM vendor |
|
NANYA VRAM vendor |
|
HYNIX VRAM vendor |
|
MOSEL VRAM vendor |
|
WINBOND VRAM vendor |
|
ESMT VRAM vendor |
|
MICRON VRAM vendor |
|
UNKNOWN VRAM vendor |
Exceptions that can be thrown by amdsmi_get_gpu_vram_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
vram_info = amdsmi_get_gpu_vram_info(processor)
print(vram_info['vram_type'])
print(vram_info['vram_vendor'])
print(vram_info['vram_size'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_vbios_info#
Description: Returns the static information for the VBIOS on the GPU device.
Input parameters:
processor_handle
GPU device which to query
Output: Dictionary with fields
Field |
Description |
---|---|
|
vbios name |
|
vbios build date |
|
vbios part number |
|
vbios version string |
Exceptions that can be thrown by amdsmi_get_gpu_vbios_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
vbios_info = amdsmi_get_gpu_vbios_info(processor)
print(vbios_info['name'])
print(vbios_info['build_date'])
print(vbios_info['part_number'])
print(vbios_info['version'])
except AmdSmiException as e:
print(e)
amdsmi_get_fw_info#
Description: Returns the firmware information for the given GPU.
Input parameters:
processor_handle
GPU device which to query
Output: Dictionary with fields
Field |
Description |
---|---|
|
List of dictionaries that contain information about a certain firmware block |
Exceptions that can be thrown by amdsmi_get_fw_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
firmware_list = amdsmi_get_fw_info(processor)
for firmware_block in firmware_list:
print(firmware_block['fw_id'])
print(firmware_block['fw_version'])
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_board_info#
Description: Returns board related information for the given GPU
Input parameters:
GPU device handle object
Output: Dictionary with fields
Field |
Description |
---|---|
|
board model number |
|
board product serial number |
|
fru (field-replaceable unit) id |
|
board product name |
|
board manufacturer name |
Exceptions that can be thrown by amdsmi_get_gpu_board_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
board_info = amdsmi_get_gpu_board_info(processor)
print(board_info['model_number'])
print(board_info['product_serial'])
print(board_info['fru_id'])
print(board_info['manufacturer_name'])
print(board_info['product_name'])
except AmdSmiException as e:
print(e)
amdsmi_get_num_vf#
Description: Returns number of enabled VFs and number of supported VFs for the given GPU
Input parameters:
processor handle
PF of a GPU device for which to query
Output: Dictionary with fields
Field |
Description |
---|---|
|
number of enabled VFs |
|
number of supported VFs |
Exceptions that can be thrown by amdsmi_get_num_vf
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
num_vf = amdsmi_get_num_vf(processor)
print(num_vf['num_vf_enabled'])
print(num_vf['num_vf_supported'])
except AmdSmiException as e:
print(e)
amdsmi_get_vf_partition_info#
Description: Returns array of the current framebuffer partitioning structures on the given GPU
Input parameters:
processor handle object
PF of a GPU device for which to query
Output: Array of dictionary with fields
Field |
Description |
||||||
---|---|---|---|---|---|---|---|
|
VF handle |
||||||
|
|
Exceptions that can be thrown by amdsmi_get_vf_partition_info
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
partitions = amdsmi_get_vf_partition_info(processor)
print(partitions[0]['fb']['fb_size'])
# partitions[0]['fb']['fb_size'] is frame buffer size of the first VF on the given GPU
# we can access any VF from the array via its index in partitions list
except AmdSmiException as e:
print(e)
amdsmi_set_num_vf#
Description: Set number of enabled VFs for the given GPU
Input parameters:
processor_handle
GPU device which to querynumber of enabled VFs to be set
Output: None
Exceptions that can be thrown by amdsmi_set_num_vf
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_num_vf(processor,2)
except AmdSmiException as e:
print(e)
amdsmi_clear_vf_fb#
Description: Clears framebuffer of the given VF on the given GPU.
If trying to clear the framebuffer of an active function,
the call will fail
Input parameters:
VF device handle
Output: None
Exceptions that can be thrown by amdsmi_clear_vf_fb
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for device in devices:
partitions = amdsmi_get_vf_partition_info(device)
amdsmi_clear_vf_fb(partitions[0]['vf_id'])
# partitions[0]['vf_id'] is handle of the first VF on the given GPU
except AmdSmiException as e:
print(e)
amdsmi_get_vf_data#
Description: Returns the scheduler information and guard structure for the given VF.
Input parameters:
VF handle
Output: Dictionary with fields
Field |
Description |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
function level reset counter |
||||||||||||
|
boot up time in microseconds |
||||||||||||
|
shutdown time in microseconds |
||||||||||||
|
reset time in microseconds |
||||||||||||
|
vf state |
||||||||||||
|
last boot start time |
||||||||||||
|
last boot end time |
||||||||||||
|
last shutdown start time |
||||||||||||
|
last shutdown end time |
||||||||||||
|
last reset start time |
||||||||||||
|
last reset end time |
||||||||||||
|
current session active time, reset after guest reload |
||||||||||||
|
current session running time, reset after guest reload |
||||||||||||
|
total active time, reset after host reload |
||||||||||||
|
total running time, reset after host reload |
||||||||||||
|
show if guard info is enabled for VF |
||||||||||||
|
|
AmdSmiGuardType enum values are keys in guard dictionary
Field |
Description |
---|---|
|
function level reset status |
|
exclusive access mode status |
|
exclusive access time out status |
|
generic interrupt status |
State is AmdSmiGuardState enum object with values
Field |
Description |
---|---|
|
the event number is within the threshold |
|
the event number hits the threshold |
|
the event number is bigger than the threshold |
State is AmdSmiVfState enum object with values
Field |
Description |
---|---|
|
vf state unavailable |
|
vf state available |
|
vf state active |
|
vf state suspended |
|
vf state fullaccess |
|
same as available, indicates this is a default VF |
Exceptions that can be thrown by amdsmi_get_vf_data
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
partitions = amdsmi_get_vf_partition_info(processor)
vf_data = amdsmi_get_vf_data(partitions[0]['vf_id'])
sched_info = vf_data['sched']
guard_info = vf_data['guard']
print(sched_info['boot_up_time'])
print(sched_info['flr_count'])
print(sched_info['state'].name)
print(sched_info['last_boot_start'])
print(sched_info['last_boot_end'])
print(sched_info['last_shutdown_start'])
print(sched_info['last_shutdown_end'])
print(sched_info['shutdown_time'])
print(sched_info['last_reset_start'])
print(sched_info['last_reset_end'])
print(sched_info['reset_time'])
print(sched_info['current_active_time'])
print(sched_info['current_running_time'])
print(sched_info['total_active_time'])
print(sched_info['total_running_time'])
print(guard_info['enabled'])
for guard_type in guard_info['guard']:
print("type: {} ".format(guard_type))
print("state: {}".format(guard_info['guard'][guard_type]['state']))
print("amount: {}".format(guard_info['guard'][guard_type]['amount']))
print("interval: {}".format(guard_info['guard'][guard_type]['interval']))
print("threshold: {}".format(guard_info['guard'][guard_type]['threshold']))
print("active: {}".format(guard_info['guard'][guard_type]['active']))
print("==================")
except AmdSmiException as e:
print(e)
amdsmi_get_vf_info#
Description: Returns the configuration structure for a given VF
Input parameters:
VF handle
Output: Dictionary with fields
Field |
Description |
||||||
---|---|---|---|---|---|---|---|
|
|
||||||
|
gfx timeslice in us |
Exceptions that can be thrown by amdsmi_get_vf_info
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
partitions = amdsmi_get_vf_partition_info(processor)
# partitions[0]['vf_id'] is handle of the first VF on the given GPU
config = amdsmi_get_vf_info(partitions[0]['vf_id'])
print("fb_offset: {}".format(config['fb']['fb_offset']))
print("fb_size: {}".format(config['fb']['fb_size']))
print("gfx_timeslice : {}".format(config['gfx_timeslice']))
except AmdSmiException as e:
print(e)
amdsmi_get_guest_data#
Description: Gets guest OS information of the queried VF
Input parameters:
processor handle
VF of a GPU device
Output: Dictionary with fields
Field |
Description |
---|---|
|
driver version |
|
fb usage in MB |
Exceptions that can be thrown by amdsmi_get_guest_data
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
num_vf_enabled = amdsmi_get_num_vf(processor)['num_vf_enabled']
partitions = amdsmi_get_vf_partition_info(processor)
for i in range(0, num_vf_enabled):
guest_data = amdsmi_get_guest_data(partitions[i]['vf_id'])
print(guest_data)
except AmdSmiException as e:
print(e)
amdsmi_get_fw_error_records#
Description: Gets firmware error records
Input parameters:
processor handle
PF of a GPU device
Output: Dictionary with field err_records
, which is list of elements
Field |
Description |
---|---|
|
system time in seconds |
|
vf index |
|
firmware id |
|
firmware load status |
Exceptions that can be thrown by amdsmi_get_fw_error_records
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
err_records = amdsmi_get_fw_error_records(processor)
print(err_records)
except AmdSmiException as e:
print(e)
amdsmi_get_dfc_fw_table#
Description: Gets dfc firmware table
Input parameters:
processor handle
PF of a GPU device
Output: Dictionary with field header
, and data
which is a list of elements
Each header is a dictionary with following fields:
Field |
Description |
---|---|
|
dfc firmware version |
|
number of entries in the dfc table |
|
gart wr guest min |
|
gart wr guest max |
Each data entry is a dictionary with following fields:
Field |
Description |
---|---|
|
dfc firmware type |
|
verification enabled |
|
customer ordinal |
|
white list |
|
black list |
Each white list entry is a dictionary with following fields:
Field |
Description |
---|---|
|
latest |
|
oldest |
Exceptions that can be thrown by amdsmi_get_dfc_fw_table
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
dfc_table = amdsmi_get_dfc_fw_table(processor)
print(dfc_table)
except AmdSmiException as e:
print(e)
amdsmi_get_vf_fw_info#
Description: Returns GPU firmware related information.
Input parameters:
processor handle
VF of a GPU device for which to query
Output: Dictionary with field fw_list
, which is list of elements
If microcode of certain type is not loaded, version will be 0.
Field |
Description |
||||||
---|---|---|---|---|---|---|---|
|
|
Exceptions that can be thrown by amdsmi_get_vf_fw_info
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
partitions = amdsmi_get_vf_partition_info(processor)
fw_info = amdsmi_get_vf_fw_info(partitions[0]['vf_id'])
fw_num = len(fw_info['fw_list'])
for j in range(0, fw_num):
fw = fw_info['fw_list'][j]
print(fw['fw_name'].name)
print(fw['fw_version'])
except AmdSmiException as e:
print(e)
amdsmi_get_partition_profile_info#
Description: Gets partition profile info
Input parameters:
processor handle
PF of a GPU device
Output: Dictionary with fields, current_profile and profiles
Field |
Description |
---|---|
|
current profile index |
|
list of all profiles |
Where, profiles
is a list containing
Field |
Description |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
number of vfs |
||||||||||||
|
|
Keys for profile_caps
dictionary are in AmdSmiProfileCapabilityType
enum
Field |
Description |
---|---|
|
memory |
|
encode engine |
|
decode engine |
|
compute engine |
Exceptions that can be thrown by amdsmi_get_partition_profile_info
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
profile_info = amdsmi_get_partition_profile_info(processor)
print(profile_info)
except AmdSmiException as e:
print(e)
amdsmi_get_link_metrics#
Description: Gets link metric information
Input parameters:
processor handle
PF of a GPU device
Output: links
list of dictionaries with fields for each link
Field |
Description |
---|---|
|
BDF of the given processor |
|
current link speed in Gb/s |
|
max bandwidth of the link |
|
type of the link from |
|
total data received for each link in KB |
|
total data transferred for each link in KB |
AmdSmiLinkType
enum:
Field |
Description |
---|---|
|
Unknown |
|
XGMI link type |
|
PCIe link type |
|
Link not applicable |
|
Unknown |
Exceptions that can be thrown by amdsmi_get_link_metrics
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
link_metrics = amdsmi_get_link_metrics(processor)
print(link_metrics)
except AmdSmiException as e:
print(e)
amdsmi_get_link_topology#
Description: Gets link topology information between two connected processors
Input parameters:
source processor handle
PF of a source GPU devicedestination processor handle
PF of a destination GPU device
Output: Dictionary with fields
Field |
Description |
---|---|
|
link weight between two GPUs |
|
HW status of the link |
|
type of the link from |
|
number of hops between two GPUs |
|
framebuffer sharing between two GPUs |
AmdSmiLinkType
enum:
Field |
Description |
---|---|
|
Unknown |
|
XGMI link type |
|
PCIe link type |
|
Link not applicable |
|
Unknown |
Exceptions that can be thrown by amdsmi_get_link_topology
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for src_processor in processors:
for dst_processor in processors:
link_topology = amdsmi_get_link_topology(src_processor, dst_processor)
print(link_topology)
except AmdSmiException as e:
print(e)
amdsmi_get_xgmi_fb_sharing_caps#
Description: Gets XGMI capabilities
Input parameters:
processor handle
PF of a GPU device
Output: Dictionary with fields
Field |
Description |
---|---|
|
flag that indicates if custom mode is supported (Not supported yet) |
|
flag that indicates if mode_1 is supported |
|
flag that indicates if mode_2 is supported |
|
flag that indicates if mode_4 is supported |
|
flag that indicates if mode_8 is supported |
Exceptions that can be thrown by amdsmi_get_xgmi_fb_sharing_caps
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
caps = amdsmi_get_xgmi_fb_sharing_caps(processor)
print(caps)
except AmdSmiException as e:
print(e)
amdsmi_get_xgmi_fb_sharing_mode_info#
Description: Gets XGMI framebuffer sharing information between two GPUs
Input parameters:
source processor handle
PF of a source GPU devicedestination processor handle
PF of a destination GPU devicemode
framebuffer sharing mode fromAmdSmiXgmiFbSharingMode
enum
AmdSmiXgmiFbSharingMode
enum:
Field |
Description |
---|---|
|
custom framebuffer sharing mode (Not supported yet) |
|
framebuffer sharing mode_1 |
|
framebuffer sharing mode_2 |
|
framebuffer sharing mode_4 |
|
framebuffer sharing mode_8 |
Output: Value indicating whether framebuffer sharing is enabled between two GPUs
Exceptions that can be thrown by amdsmi_get_xgmi_fb_sharing_mode_info
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for src_processor in processors:
for dst_processor in processors:
fb_sharing = amdsmi_get_xgmi_fb_sharing_mode_info(src_processor, dst_processor, AmdSmiXgmiFbSharingMode.MODE_4)
print(fb_sharing)
except AmdSmiException as e:
print(e)
amdsmi_set_xgmi_fb_sharing_mode#
Description: Sets framebuffer sharing mode
Note: This API will only work if there’s no guest VM running.
Input parameters:
processor handle
PF of a GPU devicemode
framebuffer sharing mode fromAmdSmiXgmiFbSharingMode
enum
AmdSmiXgmiFbSharingMode
enum:
Field |
Description |
---|---|
|
custom framebuffer sharing mode (Not supported yet) |
|
framebuffer sharing mode_1 |
|
framebuffer sharing mode_2 |
|
framebuffer sharing mode_4 |
|
framebuffer sharing mode_8 |
Output:
None
Exceptions that can be thrown by amdsmi_set_xgmi_fb_sharing_mode
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_xgmi_fb_sharing_mode(processor, AmdSmiXgmiFbSharingMode.MODE_4)
except AmdSmiException as e:
print(e)
amdsmi_set_xgmi_fb_sharing_mode_v2#
Description: Sets framebuffer sharing mode
Note: This API will only work if there’s no guest VM running. This api can be used for custom and auto setting of xgmi frame buffer sharing. In case of custom mode: - All processors in the list must be on the same NUMA node. Otherwise, api will return error. - If any processor from the list already belongs to an existing group, the existing group will be released automatically. In case of auto mode(MODE_X): - The input parameter processor_list[0] should be valid. Only the first element of processor_list is taken into account and it can be any gpu0,gpu1,…
Input parameters:
processor_list
list of PFs of a GPU devicesmode
framebuffer sharing mode fromAmdSmiXgmiFbSharingMode
enum
AmdSmiXgmiFbSharingMode
enum:
Field |
Description |
---|---|
|
custom framebuffer sharing mode |
|
framebuffer sharing mode_1 |
|
framebuffer sharing mode_2 |
|
framebuffer sharing mode_4 |
|
framebuffer sharing mode_8 |
Output:
None
Exceptions that can be thrown by amdsmi_set_xgmi_fb_sharing_mode_v2
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
processors_custom_mode = []
if len(processors) == 0:
print("No GPUs on machine")
else:
if len(processors) > 3:
processors_custom_mode.append(processors[0])
processors_custom_mode.append(processors[2])
else:
processors_custom_mode = processors
amdsmi_set_xgmi_fb_sharing_mode_v2(processors_custom_mode, AmdSmiXgmiFbSharingMode.CUSTOM)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_metrics#
Description: Gets GPU metric information
Input parameters:
processor handle
PF of a GPU device
Output: list of dictionaries with fields for each metric
Field |
Description |
---|---|
|
value of the metric |
|
unit of the metric from |
|
name of the metric from |
|
category of the metric from |
|
list of types of the metric from |
|
mask of all active VFs + PF that this metric applies to |
AmdSmiMetricUnit
enum:
Field |
Description |
---|---|
|
counter |
|
unsigned integer |
|
boolean |
|
megahertz |
|
percentage |
|
millivolt |
|
celsius |
|
watt |
|
joule |
|
gigabyte per second |
|
megabit per second |
|
unknown unit |
AmdSmiMetricName
enum:
Field |
Description |
---|---|
|
accumulated counter |
|
firmware timestamp |
|
gfx clock |
|
socket clock |
|
memory clock |
|
vclk clock |
|
dclk clock |
|
gfx usage |
|
memory usage |
|
mm usage |
|
vcn usage |
|
jpeg usage |
|
gfx voltage |
|
socket voltage |
|
memory voltage |
|
current hotspot temperature |
|
hotspot temperature limit |
|
current memory temperature |
|
memory temperature limit |
|
current vr temperature |
|
shutdown temperature |
|
current power |
|
power limit |
|
socket energy |
|
ccd energy |
|
xcd energy |
|
aid energy |
|
memory energy |
|
active socket throttle |
|
active vr throttle |
|
active memory throttle |
|
pcie bandwidth |
|
pcie l0 recovery count |
|
pcie replay count |
|
pcie replay rollover count |
|
pcie nak sent count |
|
pcie nak received count |
|
maximum gfx clock limit |
|
maximum socket clock limit |
|
maximum memory clock limit |
|
maximum vclk clock limit |
|
maximum dclk clock limit |
|
minimum gfx clock limit |
|
minimum socket clock limit |
|
minimum memory clock limit |
|
minimum vclk clock limit |
|
minimum dclk clock limit |
|
gfx clock locked |
|
gfx deep sleep |
|
memory deep sleep |
|
socket deep sleep |
|
vclk deep sleep |
|
dclk deep sleep |
|
unknown name |
AmdSmiMetricCategory
enum:
Field |
Description |
---|---|
|
counter |
|
frequency |
|
activity |
|
temperature |
|
power |
|
energy |
|
celsius |
|
throttle |
|
pcie |
|
unknown category |
AmdSmiMetricType
enum:
Field |
Description |
---|---|
|
counter |
|
chiplet |
|
instantaneous data |
|
accumulated data |
Exceptions that can be thrown by amdsmi_get_gpu_metrics
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
metrics = amdsmi_get_gpu_metrics(processor)
print(metrics)
except AmdSmiException as e:
print(e)
amdsmi_get_soc_pstate#
Description: Gets the soc pstate policy for the processor
Input parameters:
processor handle
PF of a GPU device
Output: Dictionary with fields
Field |
Description |
---|---|
|
current policy index |
|
List of policies |
Each policies list entry is a dictionary with following fields:
Field |
Description |
---|---|
|
policy id |
|
policy description |
Exceptions that can be thrown by amdsmi_get_soc_pstate
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
dpm_policy = amdsmi_get_soc_pstate(processor)
print(dpm_policy)
except AmdSmiException as e:
print(e)
amdsmi_set_soc_pstate#
Description: Sets the soc pstate policy for the processor
Input parameters:
processor handle
PF of a GPU devicepolicy_id
policy id represents one of the values we get from the policies list from amdsmi_get_soc_pstate.
Output:
None
Exceptions that can be thrown by amdsmi_set_soc_pstate
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_soc_pstate(processor, 0)
except AmdSmiException as e:
print(e)
AmdSmiEventReader class#
Description: Providing methods for event monitoring
Methods:
constructor#
Description: Allocates a new event reader notifier to monitor different types of issues with the GPU
Input parameters:
processor handle list
list of GPU device handle objects(PFs od Vfs) for which to create event readerevent category list
list of the different event categories that the event reader will monitor in GPU
Event category is AmdSmiEventCategory enum object with values
Category |
Description |
---|---|
|
not used category |
|
driver events(allocation, failures of APIs, debug errors) |
|
events/notifications regarding RESET executed by the GPU |
|
scheduling events(world switch fail …) |
|
VBIOS events(security failures, vbios corruption…) |
|
ecc events |
|
pp events(slave not present, dpm fail …) |
|
events regarding the configuration of VF resources |
|
vf events(no vbios, gpu reset fail…) |
|
events related with FW loading or FW operations |
|
gpu fatal conditions |
|
guard events |
|
gpumon events(fb issues …) |
|
mmsch events |
|
xgmi events |
|
monitor all categories |
severity
of events that can be monitored
Severity is AmdSmiEventSeverity enum object with values
Severity |
Description |
---|---|
|
critical error |
|
significant error |
|
trivial error |
|
warning |
|
info |
|
monitor all severity levels |
severity
parameter is optional. If nothing is set, events with LOW
severity will be monitored by default.
Output:
created object of AmdSmiEventReader class
Exceptions that can be thrown by AmdSmiEventReader constructor
function:
AmdSmiParameterException
AmdSmiLibraryException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
event_reader = AmdSmiEventReader(processors,{AmdSmiEventCategory.RESET})
except SmiException as e:
print(e)
read#
Description: Reads and return one event from event reader
Input parameters:
timestamp
number of microseconds to wait for an event to occur. If event does not happen monitoring is finished
Output: Dictionary with fields
Field |
Description |
---|---|
|
VF handle |
|
GPU device handle |
|
UTC time (in microseconds) when the error happened |
|
data value associated with the specific event |
|
event category |
|
event subcategory |
|
event severity |
|
UTC date and time when the error happend |
|
message describing the event |
event category is AmdSmiEventCategory enum object with values
Category |
Description |
---|---|
|
not used category |
|
driver events(allocation, failures of APIs, debug errors) |
|
events/notifications regarding RESET executed by the GPU |
|
scheduling events(world switch fail …) |
|
VBIOS events(security failures, vbios corruption…) |
|
ecc events |
|
pp events(slave not present, dpm fail …) |
|
events regarding the configuration of VF resources |
|
vf events(no vbios, gpu reset fail…) |
|
events related with FW loading or FW operations |
|
gpu fatal conditions |
|
guard events |
|
gpumon events(fb issues …) |
|
mmsch events |
|
xgmi events |
|
monitor all categories |
every AmdSmiEventCategory has it's corresponding enum subcategory,
subcategories are:
Subcategory |
Field |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Severity is AmdSmiEventSeverity enum object with values
Severity |
Description |
---|---|
|
critical error |
|
significant error |
|
trivial error |
|
warning |
|
info |
|
monitor all severity levels |
Exceptions that can be thrown by read
function:
AmdSmiParameterException
AmdSmiTimeoutException
AmdSmiLibraryException
stop#
Description: Any resources used by event notification for the the given device will be freed with this function. This can be used explicitly or
automatically using with
statement, like in the examples below. This should be called either manually or automatically for every created AmdSmiEventReader object.
Input parameters: None
Example with manual cleanup of AmdSmiEventReader:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
event_reader = AmdSmiEventReader(processors,{AmdSmiEventCategory.RESET}, AmdSmiEventSeverity.ALL)
while True:
event = event_reader.read(10*1000*1000)
gpu_bdf = amdsmi_get_gpu_device_bdf(event['dev_id'])
vf_bdf = amdsmi_get_gpu_device_bdf(event['fcn_id'])
print("=============== Event ================")
print(" Time {}".format(event['timestamp']))
print(" Category {}".format(event['category'].name))
print(" Subcategory {}".format(event['subcode'].name))
print(" Level {}".format(event['level'].name))
print(" Data {}".format(event['data']))
print(" VF BDF {}".format(vf_bdf))
print(" GPU BDF {}".format(gpu_bdf))
print(" Date {}".format(event['date']))
print(" Message {}".format(event['message']))
print("======================================")
except AmdSmiTimeoutException:
print("No more events")
finally:
event_reader.stop()
Example with automatic cleanup using with
statement:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
with AmdSmiEventReader(processors,{AmdSmiEventCategory.RESET}, AmdSmiEventSeverity.ALL) as event_reader:
while True:
event = event_reader.read(10*1000*1000)
gpu_bdf = amdsmi_get_gpu_device_bdf(event['dev_id'])
vf_bdf = amdsmi_get_gpu_device_bdf(event['fcn_id'])
print("=============== Event ================")
print(" Time {}".format(event['timestamp']))
print(" Category {}".format(event['category'].name))
print(" Subcategory {}".format(event['subcode'].name))
print(" Level {}".format(event['level'].name))
print(" Data {}".format(event['data']))
print(" VF BDF {}".format(vf_bdf))
print(" GPU BDF {}".format(gpu_bdf))
print(" Date {}".format(event['date']))
print(" Message {}".format(event['message']))
print("======================================")
except AmdSmiTimeoutException:
print("No more events")
amdsmi_get_lib_version#
Description: Get the build version information for the currently running build of AMDSMI.
Output: amdsmi build version
Exceptions that can be thrown by amdsmi_get_lib_version
function:
AmdSmiLibraryException
AmdSmiRetryException
AmdSmiParameterException
Example:
try:
devices = amdsmi_get_processor_handles()
if len(devices) == 0:
print("No GPUs on machine")
else:
for device in devices:
version = amdsmi_get_lib_version()
print(version)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_accelerator_partition_profile_config#
Description: Returns gpu accelerator partition caps as currently configured in the system
Input parameters:
processor handle
PF of a GPU device
Output: Dictionary with fields
Field |
Description |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||
|
index of the default profile |
||||||||||||
|
|
AmdSmiAcceleratorPartitionResource
enum:
Field |
Description |
---|---|
|
xcc resource capabilities |
|
encoder resource capabilities |
|
decoder resource capabilities |
|
dma resource capabilities |
|
jpeg resource capabilities |
AmdSmiAcceleratorPartitionSetting
enum:
Field |
Description |
---|---|
|
invalid compute partition |
|
compute partition with all xccs in group (8/1) |
|
compute partition with four xccs in group (8/2) |
|
compute partition with two xccs in group (6/3) |
|
compute partition with two xccs in group (8/4) |
|
compute partition with one xcc in group (8/8) |
Exceptions that can be thrown by amdsmi_get_gpu_accelerator_partition_profile_config
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
accelerator_partition_config = amdsmi_get_gpu_accelerator_partition_profile_config(processor)
print(accelerator_partition_config)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_accelerator_partition_profile#
Description: Returns current gpu accelerator partition cap
Input parameters:
processor handle
PF of a GPU device
Output: Dictionary with fields
Field |
Description |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||
|
array of ids for current accelerator profile |
AmdSmiComputePartitionResource
enum:
Field |
Description |
---|---|
|
xcc resource capabilities |
|
encoder resource capabilities |
|
decoder resource capabilities |
|
dma resource capabilities |
|
jpeg resource capabilities |
Exceptions that can be thrown by amdsmi_get_gpu_accelerator_partition_profile
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
accelerator_partition_profile = amdsmi_get_gpu_accelerator_partition_profile(processor)
print(accelerator_partition_profile)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_memory_partition_config#
Description: Returns current gpu memory partition config and mode capabilities
Input parameters:
processor handle
PF of a GPU device
Output: Dictionary with fields
Field |
Description |
||||||||
---|---|---|---|---|---|---|---|---|---|
|
memory partition capabilities |
||||||||
|
memory partition mode from |
||||||||
|
|
‘AmdSmiMemoryPartitionSetting’ enum:
Field |
Description |
---|---|
|
unknown memory partition |
|
memory partition with 1 number per socket |
|
memory partition with 2 numbers per socket |
|
memory partition with 4 numbers per socket |
|
memory partition with 8 numbers per socket |
Exceptions that can be thrown by amdsmi_get_gpu_memory_partition_config
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
memory_partition_config = amdsmi_get_gpu_memory_partition_config(processor)
print(memory_partition_config)
except AmdSmiException as e:
print(e)
amdsmi_set_gpu_accelerator_partition_profile#
Description: Sets accelerator partition setting based on profile_index from amdsmi_get_gpu_accelerator_partition_profile_config
Input parameters:
processor handle
PF of a GPU deviceprofile_index
Represents index of a partition user wants to set
Output:
None
Exceptions that can be thrown by amdsmi_set_gpu_accelerator_partition_profile
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_gpu_accelerator_partition_profile(processor, 1)
except AmdSmiException as e:
print(e)
amdsmi_set_gpu_memory_partition_mode#
Description: Sets memory partition mode
Input parameters:
processor handle
PF of a GPU devicesetting
Enum fromAmdSmiMemoryPartitionSetting
representing memory partitioning mode to set
AmdSmiMemoryPartitionSetting
enum:
Field |
Description |
---|---|
|
unknown memory partition |
|
memory partition with 1 number per socket |
|
memory partition with 2 numbers per socket |
|
memory partition with 4 numbers per socket |
|
memory partition with 8 numbers per socket |
Output:
None
Exceptions that can be thrown by amdsmi_set_gpu_memory_partition_mode
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
amdsmi_set_gpu_memory_partition_mode(processor, AmdSmiMemoryPartitionSetting.NPS1)
except AmdSmiException as e:
print(e)
amdsmi_get_gpu_cper_entries#
Description: Get gpu ras cper entries
Input parameters:
processor handle
PF of a GPU deviceseverity_mask
Represents different severity masks from ‘AmdSmiCperErrorSeverity’ enum on which filerting of cpers is based.
Field |
Description |
---|---|
|
filters non-fatal-uncorrected cpers |
|
filters fatal cpers |
|
filters non_fatal_corrected cpers |
|
shows all cper types |
Output:
List of all cper errors. Each list element contains binary raw data
Exceptions that can be thrown by amdsmi_get_gpu_cper_entries
function:
AmdSmiLibraryException
AmdSmiParameterException
Example:
try:
processors = amdsmi_get_processor_handles()
if len(processors) == 0:
print("No GPUs on machine")
else:
for processor in processors:
cper_list = amdsmi_gpu_get_cper_entries(processor, AmdSmiCperErrorSeverity.NUM)
except AmdSmiException as e:
print(e)