ECC information

ECC information#

AMD SMI: ECC information
ECC information

Functions

amdsmi_status_t amdsmi_get_gpu_total_ecc_count (amdsmi_processor_handle processor_handle, amdsmi_error_count_t *ec)
 Returns the number of ECC errors (correctable, uncorrectable and deferred) in the given GPU. More...
 
amdsmi_status_t amdsmi_get_gpu_ecc_count (amdsmi_processor_handle processor_handle, amdsmi_gpu_block_t block, amdsmi_error_count_t *ec)
 Returns the number of ECC errors (correctable, uncorrectable and deferred) for the given GPU block. More...
 
amdsmi_status_t amdsmi_get_gpu_ecc_enabled (amdsmi_processor_handle processor_handle, uint64_t *enabled_blocks)
 Returns the enabled ECC bitmask. More...
 
amdsmi_status_t amdsmi_get_gpu_bad_page_info (amdsmi_processor_handle processor_handle, uint32_t *bad_page_size, amdsmi_eeprom_table_record_t *bad_pages)
 Returns the bad page info. More...
 
amdsmi_status_t amdsmi_get_gpu_ras_feature_info (amdsmi_processor_handle processor_handle, amdsmi_ras_feature_t *ras_feature)
 Returns RAS features info. More...
 

Detailed Description

Function Documentation

◆ amdsmi_get_gpu_total_ecc_count()

amdsmi_status_t amdsmi_get_gpu_total_ecc_count ( amdsmi_processor_handle  processor_handle,
amdsmi_error_count_t ec 
)

Returns the number of ECC errors (correctable, uncorrectable and deferred) in the given GPU.

Parameters
[in]processor_handlePF of a processor for which to query
[out]ecReference to error count structure. Count of ecc uncorrectable and correctable errors since last time driver was loaded. Must be allocated by user.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail

◆ amdsmi_get_gpu_ecc_count()

amdsmi_status_t amdsmi_get_gpu_ecc_count ( amdsmi_processor_handle  processor_handle,
amdsmi_gpu_block_t  block,
amdsmi_error_count_t ec 
)

Returns the number of ECC errors (correctable, uncorrectable and deferred) for the given GPU block.

Parameters
[in]processor_handlePF of a processor for which to query
[in]blockThe block for which error counts should be retrieved
[out]ecReference to error count structure. Count of ecc uncorrectable and correctable errors since last time driver was loaded. Must be allocated by user.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail

◆ amdsmi_get_gpu_ecc_enabled()

amdsmi_status_t amdsmi_get_gpu_ecc_enabled ( amdsmi_processor_handle  processor_handle,
uint64_t *  enabled_blocks 
)

Returns the enabled ECC bitmask.

Parameters
[in]processor_handlePF of a processor for which to query
[in,out]enabled_blocksBitmask of the enabled gpu blocks. Blocks are listed in amdsmi_gpu_block_t enum.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail

◆ amdsmi_get_gpu_bad_page_info()

amdsmi_status_t amdsmi_get_gpu_bad_page_info ( amdsmi_processor_handle  processor_handle,
uint32_t *  bad_page_size,
amdsmi_eeprom_table_record_t bad_pages 
)

Returns the bad page info.

Parameters
[in]processor_handlePF of a processor for which to query
[in,out]bad_page_sizeAs input, the size of the provided buffer. As output, number of bad pages in the buffer. Parameter must be allocated by user.
[out]bad_pagesReference to list of bad pages returned by the library. Buffer must be allocated by user.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail

◆ amdsmi_get_gpu_ras_feature_info()

amdsmi_status_t amdsmi_get_gpu_ras_feature_info ( amdsmi_processor_handle  processor_handle,
amdsmi_ras_feature_t ras_feature 
)

Returns RAS features info.

Parameters
[in]processor_handlePF of a processor for which to query
[out]ras_featureRAS features that are currently enabled and supported on the processor. Must be allocated by user.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail