RAS information#
Functions | |
| amdsmi_status_t | amdsmi_get_bad_page_threshold (amdsmi_processor_handle processor_handle, uint32_t *threshold) |
| Get the bad page threshold for a device. More... | |
| amdsmi_status_t | amdsmi_get_gpu_cper_entries (amdsmi_processor_handle processor_handle, uint32_t severity_mask, char *cper_data, uint64_t *buf_size, amdsmi_cper_hdr_t **cper_hdrs, uint64_t *entry_count, uint64_t *cursor) |
| Retrieve CPER entries cached in the driver. More... | |
| amdsmi_status_t | amdsmi_get_afids_from_cper (char *cper_buffer, uint32_t buf_size, uint64_t *afids, uint32_t *num_afids) |
| Get the AFIDs from CPER buffer. More... | |
| amdsmi_status_t | amdsmi_get_gpu_ras_feature_info (amdsmi_processor_handle processor_handle, amdsmi_ras_feature_t *ras_feature) |
| Returns RAS features info. More... | |
| amdsmi_status_t | amdsmi_get_gpu_ras_policy_info (amdsmi_processor_handle processor_handle, amdsmi_gpu_ras_policy_info_t *info) |
| Get the RAS policy info for a device. More... | |
| amdsmi_status_t | amdsmi_get_gpu_bad_page_info (amdsmi_processor_handle processor_handle, uint32_t *bad_page_size, amdsmi_eeprom_table_record_t *bad_pages) |
| Returns the bad page info. More... | |
Detailed Description
Function Documentation
◆ amdsmi_get_bad_page_threshold()
| amdsmi_status_t amdsmi_get_bad_page_threshold | ( | amdsmi_processor_handle | processor_handle, |
| uint32_t * | threshold | ||
| ) |
Get the bad page threshold for a device.
- Platform:
gpu_bm_linux
host
Given a processor handle processor_handle and a pointer to a uint32_t threshold, this function will retrieve the bad page threshold value associated with device processor_handle and store the value at location pointed to by threshold.
- Note
- This function requires the admin/sudo privileges on
- Platform:
- gpu_bm_linux
- Parameters
-
[in] processor_handle a processor handle [in,out] threshold pointer to location where bad page threshold value will be written. If this parameter is nullptr, this function will return AMDSMI_STATUS_INVAL if the function is supported with the provided, arguments and AMDSMI_STATUS_NOT_SUPPORTED if it is not supported with the provided arguments.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail
◆ amdsmi_get_gpu_cper_entries()
| amdsmi_status_t amdsmi_get_gpu_cper_entries | ( | amdsmi_processor_handle | processor_handle, |
| uint32_t | severity_mask, | ||
| char * | cper_data, | ||
| uint64_t * | buf_size, | ||
| amdsmi_cper_hdr_t ** | cper_hdrs, | ||
| uint64_t * | entry_count, | ||
| uint64_t * | cursor | ||
| ) |
Retrieve CPER entries cached in the driver.
- Platform:
gpu_bm_linux
host
guest_1vf
The user will pass buffers to hold the CPER data and CPER headers. The library will fill the buffer based on the severity_mask user passed. It will also parse the CPER header and stored in the cper_hdrs array. The user can use the cper_hdrs to get the timestamp and other header information. A cursor is also returned to the user, which can be used to get the next set of CPER entries.
If there are more data than any of the buffers user pass, the library will return AMDSMI_STATUS_MORE_DATA. User can call the API again with the cursor returned at previous call to get more data. If the buffer size is too small to even hold one entry, the library will return AMDSMI_STATUS_OUT_OF_RESOURCES.
Even if the API returns AMDSMI_STATUS_MORE_DATA, the 2nd call may still get the entry_count == 0 as the driver cache may not contain the serverity user is interested in. The API should return AMDSMI_STATUS_SUCCESS in this case so that user can ignore that call.
- Parameters
-
[in] processor_handle Handle to the processor for which CPER entries are to be retrieved. [in] severity_mask The severity mask of the entries to be retrieved. [in,out] cper_data Pointer to a buffer where the CPER data will be stored. User must allocate the buffer and set the buf_size correctly. [in,out] buf_size Pointer to a variable that specifies the size of the cper_data. On return, it will contain the actual size of the data written to the cper_data. [in,out] cper_hdrs Array of the parsed headers of the cper_data. The user must allocate the array of pointers to cper_hdr. The library will fill the array with the pointers to the parsed headers. The underlying data is in the cper_data buffer and only pointer is stored in this array. [in,out] entry_count Pointer to a variable that specifies the array length of the cper_hdrs user allocated. On return, it will contain the actual entries written to the cper_hdrs. [in,out] cursor Pointer to a variable that will contain the cursor for the next call.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail
◆ amdsmi_get_afids_from_cper()
| amdsmi_status_t amdsmi_get_afids_from_cper | ( | char * | cper_buffer, |
| uint32_t | buf_size, | ||
| uint64_t * | afids, | ||
| uint32_t * | num_afids | ||
| ) |
Get the AFIDs from CPER buffer.
- Platform:
gpu_bm_linux
host
guest_1vf
guest_mvf
A utility function which retrieves the AFIDs from the CPER record.
- Parameters
-
[in] cper_buffer a pointer to the buffer with one CPER record. The caller must make sure the whole CPER record is loaded into the buffer. [in] buf_size is the size of the cper_buffer. [out] afids a pointer to an array of uint64_t to which the AF IDs will be written [in,out] num_afids As input, the value passed through this parameter is the number of uint64_t that may be safely written to the memory pointed to by afids. This is the limit on how many AF IDs will be written toafids. On return,num_afidswill contain the number of AF IDs written toafids, or the number of AF IDs that could have been written if enough memory had been provided. It is suggest to pass MAX_NUMBER_OF_AFIDS_PER_RECORD for all AF Ids.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail
◆ amdsmi_get_gpu_ras_feature_info()
| amdsmi_status_t amdsmi_get_gpu_ras_feature_info | ( | amdsmi_processor_handle | processor_handle, |
| amdsmi_ras_feature_t * | ras_feature | ||
| ) |
Returns RAS features info.
- Platform:
gpu_bm_linux
host
guest_windows
- Parameters
-
[in] processor_handle Device handle which to query [out] ras_feature RAS features that are currently enabled and supported on the processor. Must be allocated by user.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail
◆ amdsmi_get_gpu_ras_policy_info()
| amdsmi_status_t amdsmi_get_gpu_ras_policy_info | ( | amdsmi_processor_handle | processor_handle, |
| amdsmi_gpu_ras_policy_info_t * | info | ||
| ) |
Get the RAS policy info for a device.
- Platform:
gpu_bm_linux
host
Given a processor handle processor_handle, this function will retrieve the RAS policy information for the device.
- Parameters
-
[in] processor_handle PF of a processor for which to query [out] policy_info RAS policy info for the device. Must be allocated by user.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail
◆ amdsmi_get_gpu_bad_page_info()
| amdsmi_status_t amdsmi_get_gpu_bad_page_info | ( | amdsmi_processor_handle | processor_handle, |
| uint32_t * | bad_page_size, | ||
| amdsmi_eeprom_table_record_t * | bad_pages | ||
| ) |
Returns the bad page info.
- Platform:
- host
- Parameters
-
[in] processor_handle PF of a processor to query. [in] array_length Length of the array where the library will copy the data. [out] bad_page_count Number of bad page records. [out] info Reference to the eeprom table record structure. Must be allocated by user.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail