ECC information#
ECC information
Functions | |
amdsmi_status_t | amdsmi_get_gpu_total_ecc_count (amdsmi_processor_handle processor_handle, amdsmi_error_count_t *ec) |
Returns the number of ECC errors (correctable, uncorrectable and deferred) in the given GPU. More... | |
amdsmi_status_t | amdsmi_get_gpu_ecc_count (amdsmi_processor_handle processor_handle, amdsmi_gpu_block_t block, amdsmi_error_count_t *ec) |
Returns the number of ECC errors (correctable, uncorrectable and deferred) for the given GPU block. More... | |
amdsmi_status_t | amdsmi_get_gpu_ecc_enabled (amdsmi_processor_handle processor_handle, uint64_t *enabled_blocks) |
Returns the enabled ECC bitmask. More... | |
amdsmi_status_t | amdsmi_get_gpu_bad_page_info (amdsmi_processor_handle processor_handle, uint32_t *bad_page_size, amdsmi_eeprom_table_record_t *bad_pages) |
Returns the bad page info. More... | |
amdsmi_status_t | amdsmi_get_gpu_ras_feature_info (amdsmi_processor_handle processor_handle, amdsmi_ras_feature_t *ras_feature) |
Returns RAS features info. More... | |
Detailed Description
Function Documentation
◆ amdsmi_get_gpu_total_ecc_count()
amdsmi_status_t amdsmi_get_gpu_total_ecc_count | ( | amdsmi_processor_handle | processor_handle, |
amdsmi_error_count_t * | ec | ||
) |
Returns the number of ECC errors (correctable, uncorrectable and deferred) in the given GPU.
- Parameters
-
[in] processor_handle PF of a processor for which to query [out] ec Reference to error count structure. Count of ecc uncorrectable and correctable errors since last time driver was loaded. Must be allocated by user.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail
◆ amdsmi_get_gpu_ecc_count()
amdsmi_status_t amdsmi_get_gpu_ecc_count | ( | amdsmi_processor_handle | processor_handle, |
amdsmi_gpu_block_t | block, | ||
amdsmi_error_count_t * | ec | ||
) |
Returns the number of ECC errors (correctable, uncorrectable and deferred) for the given GPU block.
- Parameters
-
[in] processor_handle PF of a processor for which to query [in] block The block for which error counts should be retrieved [out] ec Reference to error count structure. Count of ecc uncorrectable and correctable errors since last time driver was loaded. Must be allocated by user.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail
◆ amdsmi_get_gpu_ecc_enabled()
amdsmi_status_t amdsmi_get_gpu_ecc_enabled | ( | amdsmi_processor_handle | processor_handle, |
uint64_t * | enabled_blocks | ||
) |
Returns the enabled ECC bitmask.
- Parameters
-
[in] processor_handle PF of a processor for which to query [in,out] enabled_blocks Bitmask of the enabled gpu blocks. Blocks are listed in amdsmi_gpu_block_t enum.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail
◆ amdsmi_get_gpu_bad_page_info()
amdsmi_status_t amdsmi_get_gpu_bad_page_info | ( | amdsmi_processor_handle | processor_handle, |
uint32_t * | bad_page_size, | ||
amdsmi_eeprom_table_record_t * | bad_pages | ||
) |
Returns the bad page info.
- Parameters
-
[in] processor_handle PF of a processor for which to query [in,out] bad_page_size As input, the size of the provided buffer. As output, number of bad pages in the buffer. Parameter must be allocated by user. [out] bad_pages Reference to list of bad pages returned by the library. Buffer must be allocated by user.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail
◆ amdsmi_get_gpu_ras_feature_info()
amdsmi_status_t amdsmi_get_gpu_ras_feature_info | ( | amdsmi_processor_handle | processor_handle, |
amdsmi_ras_feature_t * | ras_feature | ||
) |
Returns RAS features info.
- Parameters
-
[in] processor_handle PF of a processor for which to query [out] ras_feature RAS features that are currently enabled and supported on the processor. Must be allocated by user.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail